subject:"Make COPY format extendable\: Extract COPY TO format implementations"

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-04-10 Thread Sutou Kouhei

Hi Andres,

Could you take a look at this? I think that you don't want
to touch the current text/csv/binary implementations. The
v17 patch approach doesn't touch the current text/csv/binary
implementations. What do you think about this approach?


Thanks,
-- 
kou

In <20240320.232732.488684985873786799@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 20 Mar 2024 23:27:32 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> Could someone review the v17 patch to proceed this?
> 
> The v17 patch:
> https://www.postgresql.org/message-id/flat/20240305.171808.667980402249336456.kou%40clear-code.com#d2ee079b75ebcf00c410300ecc4a357a
> 
> Some profiles by Michael:
> https://www.postgresql.org/message-id/flat/ZelfYatRdVZN3FbE%40paquier.xyz#eccfd1a0131af93c48026d691cc247f4
> 
> Thanks,
> -- 
> kou
> 
> In <20240308.092254.359611633589181574....@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 08 Mar 2024 09:22:54 +0900 (JST),
>   Sutou Kouhei  wrote:
> 
>> Hi,
>> 
>> In 
>>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
>> on Thu, 7 Mar 2024 15:32:01 +0900,
>>   Michael Paquier  wrote:
>> 
>>> While on it, here are some profiles based on HEAD and v17 with the
>>> previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
>>> 
>> ...
>>> 
>>> So, in short, and that's not really a surprise, there is no effect
>>> once we use the dispatching with the routines only when a format would
>>> want to plug-in with the APIs, but a custom format would still have a
>>> penalty of a few percents for both if bottlenecked on CPU.
>> 
>> Thanks for sharing these profiles!
>> I agree with you.
>> 
>> This shows that the v17 approach doesn't affect the current
>> text/csv/binary implementations. (The v17 approach just adds
>> 2 new structs, Copy{From,To}Rountine, without changing the
>> current text/csv/binary implementations.)
>> 
>> Can we push the v17 patch and proceed following
>> implementations? Could someone (especially a PostgreSQL
>> committer) take a look at this for double-check?
>> 
>> 
>> Thanks,
>> -- 
>> kou
>> 
>> 
> 
>

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-20 Thread Sutou Kouhei

Hi,

Could someone review the v17 patch to proceed this?

The v17 patch:
https://www.postgresql.org/message-id/flat/20240305.171808.667980402249336456.kou%40clear-code.com#d2ee079b75ebcf00c410300ecc4a357a

Some profiles by Michael:
https://www.postgresql.org/message-id/flat/ZelfYatRdVZN3FbE%40paquier.xyz#eccfd1a0131af93c48026d691cc247f4

Thanks,
-- 
kou

In <20240308.092254.359611633589181574@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 08 Mar 2024 09:22:54 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 7 Mar 2024 15:32:01 +0900,
>   Michael Paquier  wrote:
> 
>> While on it, here are some profiles based on HEAD and v17 with the
>> previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
>> 
> ...
>> 
>> So, in short, and that's not really a surprise, there is no effect
>> once we use the dispatching with the routines only when a format would
>> want to plug-in with the APIs, but a custom format would still have a
>> penalty of a few percents for both if bottlenecked on CPU.
> 
> Thanks for sharing these profiles!
> I agree with you.
> 
> This shows that the v17 approach doesn't affect the current
> text/csv/binary implementations. (The v17 approach just adds
> 2 new structs, Copy{From,To}Rountine, without changing the
> current text/csv/binary implementations.)
> 
> Can we push the v17 patch and proceed following
> implementations? Could someone (especially a PostgreSQL
> committer) take a look at this for double-check?
> 
> 
> Thanks,
> -- 
> kou
> 
>

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-15 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 13 Mar 2024 16:00:46 +0800,
  jian he  wrote:

>> More use cases will help us which callbacks are needed. We
>> will be able to collect more use cases by providing basic
>> callbacks.

> Let's say the customized format is "csv1", but it is just analogous to
> the csv format.
> people should be able to create an extension, with serval C functions,
> then they can do `copy (select 1 ) to stdout (format 'csv1');`
> but the output will be exact same as `copy (select 1 ) to stdout
> (format 'csv');`

Thanks for sharing one use case but I think that we need
real-world use cases to consider our APIs.

For example, JSON support that is currently discussing in
another thread is a real-world use case. My Apache Arrow
support is also another real-world use case.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-13 Thread jian he

On Mon, Mar 11, 2024 at 8:56 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 11 Mar 2024 08:00:00 +0800,
>   jian he  wrote:
>
> > Hi, here are my cents:
> > Currently in v17, we have 3 extra functions within DoCopyTo
> > CopyToStart, one time, start, doing some preliminary work.
> > CopyToOneRow, doing the repetitive work, called many times, row by row.
> > CopyToEnd, one time doing the closing work.
> >
> > seems to need a function pointer for processing the format and other 
> > options.
> > or maybe the reason is we need a one time function call before doing 
> > DoCopyTo,
> > like one time initialization.
>
> I know that JSON format wants it but can we defer it? We can
> add more options later. I want to proceed this improvement
> step by step.
>
> More use cases will help us which callbacks are needed. We
> will be able to collect more use cases by providing basic
> callbacks.

I guess one of the ultimate goals would be that COPY can export data
to a customized format.
Let's say the customized format is "csv1", but it is just analogous to
the csv format.
people should be able to create an extension, with serval C functions,
then they can do `copy (select 1 ) to stdout (format 'csv1');`
but the output will be exact same as `copy (select 1 ) to stdout
(format 'csv');`

In such a scenario, we require a function akin to ProcessCopyOptions
to handle situations
where CopyFormatOptions->csv_mode is true, while the format is "csv1".

but CopyToStart is already within the DoCopyTo function, so you do
need an extra function pointer?
I do agree with the incremental improvement method.

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-10 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 11 Mar 2024 08:00:00 +0800,
  jian he  wrote:

> Hi, here are my cents:
> Currently in v17, we have 3 extra functions within DoCopyTo
> CopyToStart, one time, start, doing some preliminary work.
> CopyToOneRow, doing the repetitive work, called many times, row by row.
> CopyToEnd, one time doing the closing work.
> 
> seems to need a function pointer for processing the format and other options.
> or maybe the reason is we need a one time function call before doing DoCopyTo,
> like one time initialization.

I know that JSON format wants it but can we defer it? We can
add more options later. I want to proceed this improvement
step by step.

More use cases will help us which callbacks are needed. We
will be able to collect more use cases by providing basic
callbacks.

> generally in v17, the code pattern looks like this.
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> maybe we need another bool flag like `bool buildin_format`.
> if the copy format is {csv|text|binary}  then buildin_format is true else 
> false.
> 
> so the code pattern would be:
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine && !buildin_format)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> 
> otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer
> to distinguish native copy format and extensible supported format,
> like I mentioned above?

Hmm. I may miss something but I think that we don't need the
bool flag. Because we don't set cstate->routine for native
copy formats. So we can distinguish native copy format and
extensible supported format by checking only
cstate->routine.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-10 Thread jian he

On Fri, Mar 8, 2024 at 8:23 AM Sutou Kouhei  wrote:
>
>
> This shows that the v17 approach doesn't affect the current
> text/csv/binary implementations. (The v17 approach just adds
> 2 new structs, Copy{From,To}Rountine, without changing the
> current text/csv/binary implementations.)
>
> Can we push the v17 patch and proceed following
> implementations? Could someone (especially a PostgreSQL
> committer) take a look at this for double-check?
>

Hi, here are my cents:
Currently in v17, we have 3 extra functions within DoCopyTo
CopyToStart, one time, start, doing some preliminary work.
CopyToOneRow, doing the repetitive work, called many times, row by row.
CopyToEnd, one time doing the closing work.

seems to need a function pointer for processing the format and other options.
or maybe the reason is we need a one time function call before doing DoCopyTo,
like one time initialization.

We can placed the function pointer after:
`
cstate = BeginCopyTo(pstate, rel, query, relid,
stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
`

generally in v17, the code pattern looks like this.
if (cstate->opts.binary)
{
/* handle binary format */
}
else if (cstate->routine)
{
/* custom code, make the copy format extensible */
}
else
{
/* handle non-binary, (csv or text) format */
}
maybe we need another bool flag like `bool buildin_format`.
if the copy format is {csv|text|binary}  then buildin_format is true else false.

so the code pattern would be:
if (cstate->opts.binary)
{
/* handle binary format */
}
else if (cstate->routine && !buildin_format)
{
/* custom code, make the copy format extensible */
}
else
{
/* handle non-binary, (csv or text) format */
}

otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer
to distinguish native copy format and extensible supported format,
like I mentioned above?

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-07 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 7 Mar 2024 15:32:01 +0900,
  Michael Paquier  wrote:

> While on it, here are some profiles based on HEAD and v17 with the
> previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
> 
...
> 
> So, in short, and that's not really a surprise, there is no effect
> once we use the dispatching with the routines only when a format would
> want to plug-in with the APIs, but a custom format would still have a
> penalty of a few percents for both if bottlenecked on CPU.

Thanks for sharing these profiles!
I agree with you.

This shows that the v17 approach doesn't affect the current
text/csv/binary implementations. (The v17 approach just adds
2 new structs, Copy{From,To}Rountine, without changing the
current text/csv/binary implementations.)

Can we push the v17 patch and proceed following
implementations? Could someone (especially a PostgreSQL
committer) take a look at this for double-check?

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-06 Thread Michael Paquier

On Wed, Mar 06, 2024 at 03:34:04PM +0900, Michael Paquier wrote:
> I am not sure that my schedule is on track to allow that for this
> release, unfortunately, especially with all the other items to review
> and discuss to make this thread feature-complete.  There should be
> a bit more than four weeks until the feature freeze (date not set in
> stone, should be around the 8th of April AoE), but I have less than
> the half due to personal issues.  Perhaps if somebody jumps on this
> thread, that will be possible..

While on it, here are some profiles based on HEAD and v17 with the
previous tests (COPY TO /dev/null, COPY FROM data sent to the void).

COPY FROM, text format with 30 attributes and HEAD:
-   66.53%16.33%  postgres  postgres[.] NextCopyFrom
- 50.20% NextCopyFrom
   - 30.83% NextCopyFromRawFields
  + 16.09% CopyReadLine
13.72% CopyReadAttributesText
   + 19.11% InputFunctionCallSafe
+ 16.33% _start
COPY FROM, text format with 30 attributes and v17:
-   66.60%16.10%  postgres  postgres[.] NextCopyFrom
- 50.50% NextCopyFrom
   - 30.44% NextCopyFromRawFields
  + 15.71% CopyReadLine
13.73% CopyReadAttributesText
   + 19.81% InputFunctionCallSafe
+ 16.10% _start

COPY TO, text format with 30 attributes and HEAD:
-   79.55%15.54%  postgres  postgres[.] CopyOneRowTo
- 64.01% CopyOneRowTo
   + 30.01% OutputFunctionCall
   + 11.71% appendBinaryStringInfo
 9.36% CopyAttributeOutText
   + 3.03% CopySendEndOfRow
 1.65% int4out
 1.01% 0x83e46be4
 0.93% 0x83e46be8
 0.93% memcpy@plt
 0.87% pgstat_progress_update_param
 0.78% enlargeStringInfo
 0.67% 0x83e46bb4
 0.66% 0x83e46bcc
 0.57% MemoryContextReset
+ 15.54% _start
COPY TO, text format with 30 attributes and v17:
-   79.35%16.08%  postgres  postgres[.] CopyOneRowTo
- 62.27% CopyOneRowTo
   + 28.92% OutputFunctionCall
   + 10.88% appendBinaryStringInfo
 9.54% CopyAttributeOutText
   + 3.03% CopySendEndOfRow
 1.60% int4out
 0.97% pgstat_progress_update_param
 0.95% 0x8c46cbe8
 0.89% memcpy@plt
 0.87% 0x8c46cbe4
 0.79% enlargeStringInfo
 0.64% 0x8c46cbcc
 0.61% 0x8c46cbb4
 0.58% MemoryContextReset
+ 16.08% _start

So, in short, and that's not really a surprise, there is no effect
once we use the dispatching with the routines only when a format would
want to plug-in with the APIs, but a custom format would still have a
penalty of a few percents for both if bottlenecked on CPU.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-05 Thread Michael Paquier

On Tue, Mar 05, 2024 at 05:18:08PM +0900, Sutou Kouhei wrote:
> I'll send the following patches after this patch is
> merged.

I am not sure that my schedule is on track to allow that for this
release, unfortunately, especially with all the other items to review
and discuss to make this thread feature-complete.  There should be
a bit more than four weeks until the feature freeze (date not set in
stone, should be around the 8th of April AoE), but I have less than
the half due to personal issues.  Perhaps if somebody jumps on this
thread, that will be possible..

> They are based on the v6 patch[1]:
> 
> 1. Add copy_handler
>* This also adds a pg_proc lookup for custom FORMAT
>* This also adds a test for copy_handler
> 2. Export CopyToStateData
>* We need it to implement custom copy TO handler
> 3. Add needed APIs to implement custom copy TO handler
>* Add CopyToStateData::opaque
>* Export CopySendEndOfRow()
> 4. Export CopyFromStateData
>* We need it to implement custom copy FROM handler
> 5. Add needed APIs to implement custom copy FROM handler
>* Add CopyFromStateData::opaque
>* Export CopyReadBinaryData()

Hmm.  Sounds like a good plan for a split.
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-05 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 5 Mar 2024 15:16:33 +0900,
  Michael Paquier  wrote:

> CopyOneRowTo() could do something like that to avoid the extra
> indentation:
> if (cstate->routine)
> {
> cstate->routine->CopyToOneRow(cstate, slot);
> MemoryContextSwitchTo(oldcontext);
> return;
> }

OK. The v17 patch uses this style. Others are same as the
v16.

> I didn't know this trick.  That's indeed nice..  I may use that for
> other stuff to make patches more presentable to the eyes.  And that's
> available as well with `git diff`.

:-)

> If we basically agree about this part, how would the rest work out
> with this set of APIs and the possibility to plug in a custom value
> for FORMAT to do a pg_proc lookup, including an example of how these
> APIs can be used?

I'll send the following patches after this patch is
merged. They are based on the v6 patch[1]:

1. Add copy_handler
   * This also adds a pg_proc lookup for custom FORMAT
   * This also adds a test for copy_handler
2. Export CopyToStateData
   * We need it to implement custom copy TO handler
3. Add needed APIs to implement custom copy TO handler
   * Add CopyToStateData::opaque
   * Export CopySendEndOfRow()
4. Export CopyFromStateData
   * We need it to implement custom copy FROM handler
5. Add needed APIs to implement custom copy FROM handler
   * Add CopyFromStateData::opaque
   * Export CopyReadBinaryData()

[1] 
https://www.postgresql.org/message-id/flat/20240124.144936.67229716500876806.kou%40clear-code.com#f1ad092fc5e81fe38d3c376559efd52c


Thanks,
-- 
kou
>From a78b8ee88575e2c2873afc3acf3c8c4e535becf0 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v17] Add CopyFromRoutine/CopyToRountine

They are for implementing custom COPY FROM/TO format. But this is not
enough to implement custom COPY FROM/TO format yet. We'll export some
APIs to receive/send data and add "format" option to COPY FROM/TO
later.

Existing text/csv/binary format implementations don't use
CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.

[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz

Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyfrom.c  |  24 +-
 src/backend/commands/copyfromparse.c |   5 ++
 src/backend/commands/copyto.c|  25 +-
 src/include/commands/copyapi.h   | 100 +++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 155 insertions(+), 5 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c3bc897028..9bf2f6497e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1623,12 +1623,22 @@ BeginCopyFrom(ParseState *pstate,
 
 		/* Fetch the input function and typioparam info */
 		if (cstate->opts.binary)
+		{
 			getTypeBinaryInputInfo(att->atttypid,
    _func_oid, [attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
+		else if (cstate->routine)
+			cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+			_functions[attnum - 1],
+			[attnum - 1]);
+
 		else
+		{
 			getTypeInputInfo(att->atttypid,
 			 _func_oid, [attnum - 1]);
-		fmgr_info(in_func_oid, _functions[attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
 
 		/* Get default info if available */
 		defexprs[attnum - 1] = NULL;
@@ -1768,10 +1778,13 @@ BeginCopyFrom(ParseState *pstate,
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
-
-	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	else if (cstate->routine)
 	{
+		cstate->routine->CopyFromStart(cstate, tupDesc);
+	}
+	else
+	{
+		/* create workspace for CopyReadAttributes results */
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
 		cstate->max_fields = attr_count;
@@ -1789,6 +1802,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+	if (cstate->routine)
+		cstate->routine->CopyFromEnd(cstate);
+
 	/* No COPY FROM related resources except memory. */
 	if (cstate->is_program)
 	{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..8b15080585 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -978,6 +978,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
 		Assert(fieldno == attr_count);
 	}
+	else if (cstate->routine)
+	{
+		if (!cstate->routin

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-04 Thread Michael Paquier

On Mon, Mar 04, 2024 at 02:11:08PM +0900, Sutou Kouhei wrote:
> If this is a blocker of making COPY format extendable, can
> we defer moving the existing text/csv/binary format
> implementations to Copy{From,To}Routine for now as Michael
> suggested to proceed making COPY format extendable? (Can we
> add Copy{From,To}Routine without changing the existing
> text/csv/binary format implementations?)

Yeah, I assume that it would be the way to go so as we don't do any
dispatching in default cases.  A different approach that could be done
is to hide some of the parts of binary and text/csv in inline static
functions that are equivalent to the routine callbacks.  That's
similar to the previous versions of the patch set, but if we come back
to the argument that there is a risk of blocking optimizations of more
of the local areas of the per-row processing in NextCopyFrom() and
CopyOneRowTo(), what you have sounds like a good balance.

CopyOneRowTo() could do something like that to avoid the extra
indentation:
if (cstate->routine)
{
cstate->routine->CopyToOneRow(cstate, slot);
MemoryContextSwitchTo(oldcontext);
return;
}

NextCopyFrom() does not need to be concerned by that.

> I attach a patch for it.

> There is a large hunk for CopyOneRowTo() that is caused by
> indent change. I also attach "...-w.patch" that uses "git
> -w" to remove space only changes. "...-w.patch" is only for
> review. We should use .patch without -w for push.

I didn't know this trick.  That's indeed nice..  I may use that for
other stuff to make patches more presentable to the eyes.  And that's
available as well with `git diff`.

If we basically agree about this part, how would the rest work out
with this set of APIs and the possibility to plug in a custom value
for FORMAT to do a pg_proc lookup, including an example of how these
APIs can be used?
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-03 Thread Sutou Kouhei

Hi,

In <20240301.154443.618034282613922707@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 01 Mar 2024 15:44:43 +0900 (JST),
  Sutou Kouhei  wrote:

>> I guess so.  It does not make much of a difference, though.  The thing
>> is that the dispatch caused by the custom callbacks called for each
>> row is noticeable in any profiles I'm taking (not that much in the
>> worst-case scenarios, still a few percents), meaning that this impacts
>> the performance for all the in-core formats (text, csv, binary) as
>> long as we refactor text/csv/binary to use the routines of copyapi.h.
>> I don't really see a way forward, except if we don't dispatch the
>> in-core formats to not impact the default cases.  That makes the code
>> a bit less elegant, but equally efficient for the existing formats.
> 
> It's an option based on your profile result but your
> execution result also shows that v15 is faster than HEAD [1]:
> 
>> I am getting faster runtimes with v15 (6232ms in average)
>> vs HEAD (6550ms) at 5M rows with COPY TO
> 
> [1] 
> https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889
> 
> I think that faster runtime is beneficial than mysterious
> profile for users. So I think that we can merge v15 to
> master.

If this is a blocker of making COPY format extendable, can
we defer moving the existing text/csv/binary format
implementations to Copy{From,To}Routine for now as Michael
suggested to proceed making COPY format extendable? (Can we
add Copy{From,To}Routine without changing the existing
text/csv/binary format implementations?)

I attach a patch for it.

There is a large hunk for CopyOneRowTo() that is caused by
indent change. I also attach "...-w.patch" that uses "git
-w" to remove space only changes. "...-w.patch" is only for
review. We should use .patch without -w for push.


Thanks,
-- 
kou
>From 6a5bfc8e104f0a339b421028e9fec69a4d092671 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v16] Add CopyFromRoutine/CopyToRountine

They are for implementing custom COPY FROM/TO format. But this is not
enough to implement custom COPY FROM/TO format yet. We'll export some
APIs to receive/send data and add "format" option to COPY FROM/TO
later.

Existing text/csv/binary format implementations don't use
CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.

[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz

Note that this doesn't change existing text/csv/binary format
implementations. There are many diffs for CopyOneRowTo() but they're
caused by indentation. They don't change implementations.
---
 src/backend/commands/copyfrom.c  |  24 +-
 src/backend/commands/copyfromparse.c |   5 ++
 src/backend/commands/copyto.c| 103 ++-
 src/include/commands/copyapi.h   | 100 ++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 193 insertions(+), 45 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c3bc897028..9bf2f6497e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1623,12 +1623,22 @@ BeginCopyFrom(ParseState *pstate,
 
 		/* Fetch the input function and typioparam info */
 		if (cstate->opts.binary)
+		{
 			getTypeBinaryInputInfo(att->atttypid,
    _func_oid, [attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
+		else if (cstate->routine)
+			cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+			_functions[attnum - 1],
+			[attnum - 1]);
+
 		else
+		{
 			getTypeInputInfo(att->atttypid,
 			 _func_oid, [attnum - 1]);
-		fmgr_info(in_func_oid, _functions[attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
 
 		/* Get default info if available */
 		defexprs[attnum - 1] = NULL;
@@ -1768,10 +1778,13 @@ BeginCopyFrom(ParseState *pstate,
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
-
-	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	else if (cstate->routine)
 	{
+		cstate->routine->CopyFromStart(cstate, tupDesc);
+	}
+	else
+	{
+		/* create workspace for CopyReadAttributes results */
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
 		cstate->max_fields = attr_count;
@@ -1789,6 +1802,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+	if (cstate->routine)

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-29 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 1 Mar 2024 14:31:38 +0900,
  Michael Paquier  wrote:

> I guess so.  It does not make much of a difference, though.  The thing
> is that the dispatch caused by the custom callbacks called for each
> row is noticeable in any profiles I'm taking (not that much in the
> worst-case scenarios, still a few percents), meaning that this impacts
> the performance for all the in-core formats (text, csv, binary) as
> long as we refactor text/csv/binary to use the routines of copyapi.h.
> I don't really see a way forward, except if we don't dispatch the
> in-core formats to not impact the default cases.  That makes the code
> a bit less elegant, but equally efficient for the existing formats.

It's an option based on your profile result but your
execution result also shows that v15 is faster than HEAD [1]:

> I am getting faster runtimes with v15 (6232ms in average)
> vs HEAD (6550ms) at 5M rows with COPY TO

[1] 
https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889

I think that faster runtime is beneficial than mysterious
profile for users. So I think that we can merge v15 to
master.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-29 Thread Sutou Kouhei

Hi,

In <20240222.183948.518018047578925034@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 22 Feb 2024 18:39:48 +0900 (JST),
  Sutou Kouhei  wrote:

> How about adding "is_csv" to CopyReadline() and
> CopyReadLineText() too?

I tried this on my environment. This is a change for COPY
FROM not COPY TO but this decreases COPY TO
performance with [1]... Hmm...

master:   697.693 msec (the best case)
v15:  576.374 msec (the best case)
v15+this: 593.559 msec (the best case)

[1] COPY (SELECT 
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
 generate_series(1, 100::int4)) TO '/dev/null' \watch c=15

So I think that v15 is good.


perf result of master:

# Children  Self  Command   Shared Object  Symbol   

#       .  
.
#
31.39%14.54%  postgres  postgres   [.] CopyOneRowTo
|--17.00%--CopyOneRowTo
|  |--10.61%--FunctionCall1Coll
|  |   --8.40%--int2out
|  | |--2.58%--pg_ltoa
|  | |   --0.68%--pg_ultoa_n
|  | |--1.11%--pg_ultoa_n
|  | |--0.83%--AllocSetAlloc
|  | 
|--0.69%--__memcpy_avx_unaligned_erms (inlined)
|  | |--0.58%--FunctionCall1Coll
|  |  --0.55%--memcpy@plt
|  |--3.25%--appendBinaryStringInfo
|  |   --0.56%--pg_ultoa_n
|   --0.69%--CopyAttributeOutText

perf result of v15:

# Children  Self  Command   Shared Object  Symbol   

#       .  
.
#
25.60%10.47%  postgres  postgres   [.] CopyToTextOneRow
|--15.39%--CopyToTextOneRow
|  |--10.44%--FunctionCall1Coll
|  |  |--7.25%--int2out
|  |  |  |--2.60%--pg_ltoa
|  |  |  |   --0.71%--pg_ultoa_n
|  |  |  |--0.90%--FunctionCall1Coll
|  |  |  |--0.84%--pg_ultoa_n
|  |  |   --0.66%--AllocSetAlloc
|  |  |--0.79%--ExecProjectSet
|  |   --0.68%--int4out
|  |--2.50%--appendBinaryStringInfo
|   --0.53%--CopyAttributeOutText


The profiles on Michael's environment [2] showed that
CopyOneRow() % was increased by v15. But it
(CopyToTextOneRow() % not CopyOneRow() %) wasn't increased
by v15. It's decreased instead.

[2] 
https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889

So I think that v15 doesn't have performance regression but
my environment isn't suitable for benchmark...


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-29 Thread Michael Paquier

On Thu, Feb 22, 2024 at 06:39:48PM +0900, Sutou Kouhei wrote:
> If so, adding the change independently on HEAD makes
> sense. But I don't know why that improves
> performance... Inlining?

I guess so.  It does not make much of a difference, though.  The thing
is that the dispatch caused by the custom callbacks called for each
row is noticeable in any profiles I'm taking (not that much in the
worst-case scenarios, still a few percents), meaning that this impacts
the performance for all the in-core formats (text, csv, binary) as
long as we refactor text/csv/binary to use the routines of copyapi.h.
I don't really see a way forward, except if we don't dispatch the
in-core formats to not impact the default cases.  That makes the code
a bit less elegant, but equally efficient for the existing formats.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-22 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 22 Feb 2024 15:44:16 +0900,
  Michael Paquier  wrote:

> I was comparing what you have here, and what's been attached by Andres
> at [1] and the top of the changes on my development branch at [2]
> (v3-0008, mostly).  And, it strikes me that there is no need to do any
> major changes in any of the callbacks proposed up to v13 and v14 in
> this thread, as all the changes proposed want to plug in more data
> into each StateData for COPY FROM and COPY TO, the best part being
> that v3-0008 can just reuse the proposed callbacks as-is.  v1-0001
> from Sutou-san would need one slight tweak in the per-row callback,
> still that's minor.

I think so too. But I thought that some minor conflicts will
be happen with this and the v15. So I worked on this before
the v15.

We agreed that this optimization doesn't block v15: [1]
So we can work on the v15 without this optimization for now.

[1] 
https://www.postgresql.org/message-id/flat/20240219195351.5vy7cdl3wxia66kg%40awork3.anarazel.de#20f9677e074fb0f8c5bb3994ef059a15

> I have been spending more time on the patch to introduce the COPY
> APIs, leading me to the v15 attached, where I have replaced the
> previous attribute callbacks for the output representation and the
> reads with hardcoded routines that should be optimized by compilers,
> and I have done more profiling with -O2.

Thanks! I wanted to work on it but I didn't have enough time
for it in a few days...

I've reviewed the v15.


> @@ -751,8 +751,9 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int 
> nbytes)
>   *
>   * NOTE: force_not_null option are not applied to the returned fields.
>   */
> -bool
> -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> +static bool

"inline" is missing here.

> +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields,
> +   bool is_csv)
>  {
>   int fldct;


How about adding "is_csv" to CopyReadline() and
CopyReadLineText() too?


diff --git a/src/backend/commands/copyfromparse.c 
b/src/backend/commands/copyfromparse.c
index 25b8d4bc52..79fabecc69 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -150,8 +150,8 @@ static const char BinarySignature[11] = 
"PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static inline bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static inline int CopyReadAttributesText(CopyFromState cstate);
 static inline int CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -770,7 +770,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, 
int *nfields,
tupDesc = RelationGetDescr(cstate->rel);
 
cstate->cur_lineno++;
-   done = CopyReadLine(cstate);
+   done = CopyReadLine(cstate, is_csv);
 
if (cstate->opts.header_line == COPY_HEADER_MATCH)
{
@@ -823,7 +823,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, 
int *nfields,
cstate->cur_lineno++;
 
/* Actually read the line into memory here */
-   done = CopyReadLine(cstate);
+   done = CopyReadLine(cstate, is_csv);
 
/*
 * EOF at start of line means we're done.  If we see EOF after some
@@ -1133,8 +1133,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * by newline.  The terminating newline or EOF marker is not included
  * in the final value of line_buf.
  */
-static bool
-CopyReadLine(CopyFromState cstate)
+static inline bool
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
boolresult;
 
@@ -1142,7 +1142,7 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
 
/* Parse data and transfer into line_buf */
-   result = CopyReadLineText(cstate);
+   result = CopyReadLineText(cstate, is_csv);
 
if (result)
{
@@ -1209,8 +1209,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
char   *copy_input_buf;
int input_buf_ptr;
@@ -1226,7 +1226,7 @@ CopyReadLineText(CopyFromState cstate)
charquotec = '\0';
charescapec = '\0';
 
-   if (cstate->opts.csv_mode)
+   if (is_csv)
{
quotec = cstate->opts.quote[0];

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-21 Thread Michael Paquier

On Thu, Feb 15, 2024 at 03:34:21PM +0900, Sutou Kouhei wrote:
> It seems that it improves performance a bit but my
> environment isn't suitable for benchmark. So they may not
> be valid numbers.

I was comparing what you have here, and what's been attached by Andres
at [1] and the top of the changes on my development branch at [2]
(v3-0008, mostly).  And, it strikes me that there is no need to do any
major changes in any of the callbacks proposed up to v13 and v14 in
this thread, as all the changes proposed want to plug in more data
into each StateData for COPY FROM and COPY TO, the best part being
that v3-0008 can just reuse the proposed callbacks as-is.  v1-0001
from Sutou-san would need one slight tweak in the per-row callback,
still that's minor.

I have been spending more time on the patch to introduce the COPY
APIs, leading me to the v15 attached, where I have replaced the
previous attribute callbacks for the output representation and the
reads with hardcoded routines that should be optimized by compilers,
and I have done more profiling with -O2.  I'm aware of the disparities
in the per-row and start callbacks for the text/csv cases as well as
the default expressions, but these are really format-dependent with
their own assumptions so splitting them is something that makes
limited sense to me.  I've also looks at externalizing some of the
error handling, though the result was not that beautiful, so what I
got here is what makes the callbacks leaner and easier to work with.

First, some results for COPY FROM using the previous tests (30 int
attributes, running on scissors, data sent to blackhole_am, etc.) in
NextCopyFrom() which becomes the hot-spot:
* Using v15:
  Children  Self  Command   Shared Object   Symbol
-   66.42% 0.71%  postgres  postgres[.] NextCopyFrom
- 65.70% NextCopyFrom
   - 65.49% CopyFromTextLikeOneRow
  + 19.29% InputFunctionCallSafe
  + 15.81% CopyReadLine
13.89% CopyReadAttributesText
+ 0.71% _start
* Using HEAD (today's 011d60c4352c):
  Children  Self  Command   Shared Object   Symbol
-   67.09%16.64%  postgres  postgres[.] NextCopyFrom
- 50.45% NextCopyFrom
   - 30.89% NextCopyFromRawFields
  + 16.26% CopyReadLine
13.59% CopyReadAttributesText
   + 19.24% InputFunctionCallSafe
+ 16.64% _start

In this case, I have been able to limit the effects of the per-row
callback by making NextCopyFromRawFields() local to copyfromparse.c
while applying some inlining to it.  This brings me to a different
point, why don't we do this change independently on HEAD?  It's not 
really complicated to make NextCopyFromRawFields show high in the
profiles.  I was looking at external projects, and noticed that
there's nothing calling NextCopyFromRawFields() directly.

Second, some profiles with COPY TO (30 int integers, running on
scissors) where data is sent /dev/null:
* Using v15:
  Children  Self  Command   Shared Object   Symbol
-   85.61% 0.34%  postgres  postgres[.] CopyOneRowTo
- 85.26% CopyOneRowTo
   - 75.86% CopyToTextOneRow
  + 36.49% OutputFunctionCall
  + 10.53% appendBinaryStringInfo
9.66% CopyAttributeOutText
1.34% int4out
0.92% 0xa9803be8
0.79% enlargeStringInfo
0.77% memcpy@plt
0.69% 0xa9803be4
   + 3.12% CopySendEndOfRow
 2.81% CopySendChar
 0.95% pgstat_progress_update_param
 0.95% appendBinaryStringInfo
 0.55% MemoryContextReset
* Using HEAD (today's 011d60c4352c):
  Children  Self  Command   Shared Object   Symbol
-   80.35%14.23%  postgres  postgres[.] CopyOneRowTo
- 66.12% CopyOneRowTo
   + 35.40% OutputFunctionCall
   + 11.00% appendBinaryStringInfo
 8.38% CopyAttributeOutText
   + 2.98% CopySendEndOfRow
 1.52% int4out
 0.88% pgstat_progress_update_param
 0.87% 0x8ab32be8
 0.74% memcpy@plt
 0.68% enlargeStringInfo
 0.61% 0x8ab32be4
 0.51% MemoryContextReset
+ 14.23% _start

The increase in CopyOneRowTo from 80% to 85% worries me but I am not
quite sure how to optimize that with the current structure of the
code, so the dispatch caused by per-row callback is noticeable in
what's my worst test case.  I am not quite sure how to avoid that,
TBH.  A result that has been puzzling me is that I am getting faster
runtimes with v15 (6232ms in average) vs HEAD (6550ms) at 5M rows with
COPY TO for what led to these profiles (for tests without perf
attached to the backends).

Any thoughts overall?

[1]: 
https://www.postgresql.org/message-id/20240218015955.rmw5mcmobt5hbene%40awork3.anarazel.de
[2]: https://www.postgresql.org/message-id/zcwotr1n0gelf...@paquier.xyz
--
Michael
From 9787247320f9d11756a20e3f94e84ec9bb10092e Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 22 Feb 2024

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-15 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 15 Feb 2024 17:09:20 +0800,
  jian he  wrote:

> My environment is slow (around 10x) but consistent.
> I see around 2-3 percent increase consistently.
> (with patch 7369.068 ms, without patch 7574.802 ms)

Thanks for sharing your numbers! It will help us to
determine whether these changes improve performance or not.

> the patchset looks good in my eyes, i can understand it.
> however I cannot apply it cleanly against the HEAD.

Hmm, I used 9bc1eee988c31e66a27e007d41020664df490214 as the
base version. But both patches based on the same
revision. So we may not be able to apply both patches at
once cleanly.

> +/*
> + * Prepare callinfo for InputFunctionCallSafeWithInfo to reuse one callinfo
> + * instead of initializing it for each call. This is for performance.
> + */
> +FunctionCallInfoBaseData *
> +PrepareInputFunctionCallInfo(void)
> +{
> + FunctionCallInfoBaseData *fcinfo;
> +
> + fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
> 
> just wondering, I saw other similar places using palloc0,
> do we need to use palloc0?

I think that we don't need to use palloc0() here because the
following InitFunctionCallInfoData() call initializes all
members explicitly.

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-15 Thread jian he

On Thu, Feb 15, 2024 at 2:34 PM Sutou Kouhei  wrote:
>
>
> Thanks for the info. Let's use InputFunctionCallSafeWithInfo().
> See that attached patch:
> v2-0001-Reuse-fcinfo-used-in-COPY-FROM.patch
>
> I also attach a patch for COPY TO:
> v1-0001-Reuse-fcinfo-used-in-COPY-TO.patch
>
> I measured the COPY TO patch on my environment with:
> COPY (SELECT 
> 1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
>  generate_series(1, 100::int4)) TO '/dev/null' \watch c=5
>
> master:
> 740.066ms
> 734.884ms
> 738.579ms
> 734.170ms
> 727.953ms
>
> patched:
> 730.714ms
> 741.483ms
> 714.149ms
> 715.436ms
> 713.578ms
>
> It seems that it improves performance a bit but my
> environment isn't suitable for benchmark. So they may not
> be valid numbers.

My environment is slow (around 10x) but consistent.
I see around 2-3 percent increase consistently.
(with patch 7369.068 ms, without patch 7574.802 ms)

the patchset looks good in my eyes, i can understand it.
however I cannot apply it cleanly against the HEAD.

+/*
+ * Prepare callinfo for InputFunctionCallSafeWithInfo to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareInputFunctionCallInfo(void)
+{
+ FunctionCallInfoBaseData *fcinfo;
+
+ fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));

just wondering, I saw other similar places using palloc0,
do we need to use palloc0?

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-14 Thread Sutou Kouhei

Hi,

In <20240213.173340.1518143507526518973@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 13 Feb 2024 17:33:40 +0900 (JST),
  Sutou Kouhei  wrote:

> I'll reply other comments later...

I've read other comments and my answers for them are same as
Michael's one.


I'll prepare the v15 patch with static inline functions and
fixed arguments after the fcinfo cache patches are merged. I
think that the v15 patch will be conflicted with fcinfo
cache patches.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-14 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 14 Feb 2024 15:52:36 +0900,
  Michael Paquier  wrote:

>> How about InputFunctionCallSafeWithInfo(),
>> InputFunctionCallSafeInfo() or
>> InputFunctionCallInfoCallSafe()?
> 
> WithInfo() would not be a new thing.  There are a couple of APIs named
> like this when manipulating catalogs, so that sounds kind of a good
> choice from here.

Thanks for the info. Let's use InputFunctionCallSafeWithInfo().
See that attached patch:
v2-0001-Reuse-fcinfo-used-in-COPY-FROM.patch

I also attach a patch for COPY TO:
v1-0001-Reuse-fcinfo-used-in-COPY-TO.patch

I measured the COPY TO patch on my environment with:
COPY (SELECT 
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
 generate_series(1, 100::int4)) TO '/dev/null' \watch c=5

master:
740.066ms
734.884ms
738.579ms
734.170ms
727.953ms

patched:
730.714ms
741.483ms
714.149ms
715.436ms
713.578ms

It seems that it improves performance a bit but my
environment isn't suitable for benchmark. So they may not
be valid numbers.


Thanks,
-- 
kou
>From b677732f46f735a5601b889f79671e91be41 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Thu, 15 Feb 2024 15:01:08 +0900
Subject: [PATCH v2] Reuse fcinfo used in COPY FROM

Each NextCopyFrom() calls input functions or binary-input
functions. We can reuse fcinfo for them instead of creating a local
fcinfo for each call. This will improve performance.
---
 src/backend/commands/copyfrom.c  |   4 +
 src/backend/commands/copyfromparse.c |  20 ++--
 src/backend/utils/fmgr/fmgr.c| 126 +++
 src/include/commands/copyfrom_internal.h |   1 +
 src/include/fmgr.h   |  12 +++
 5 files changed, 154 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..ed375c012e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
 	/* We keep those variables in cstate. */
 	cstate->in_functions = in_functions;
 	cstate->typioparams = typioparams;
+	if (cstate->opts.binary)
+		cstate->fcinfo = PrepareReceiveFunctionCallInfo();
+	else
+		cstate->fcinfo = PrepareInputFunctionCallInfo();
 	cstate->defmap = defmap;
 	cstate->defexprs = defexprs;
 	cstate->volatile_defexprs = volatile_defexprs;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..7907e16ea8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -861,6 +861,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 num_defaults = cstate->num_defaults;
 	FmgrInfo   *in_functions = cstate->in_functions;
 	Oid		   *typioparams = cstate->typioparams;
+	FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
 	int			i;
 	int		   *defmap = cstate->defmap;
 	ExprState **defexprs = cstate->defexprs;
@@ -961,12 +962,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
 			 * errors
 			 */
-			else if (!InputFunctionCallSafe(_functions[m],
-			string,
-			typioparams[m],
-			att->atttypmod,
-			(Node *) cstate->escontext,
-			[m]))
+			else if (!InputFunctionCallSafeWithInfo(fcinfo,
+	_functions[m],
+	string,
+	typioparams[m],
+	att->atttypmod,
+	(Node *) cstate->escontext,
+	[m]))
 			{
 cstate->num_errors++;
 return true;
@@ -1966,7 +1968,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	if (fld_size == -1)
 	{
 		*isnull = true;
-		return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+		return ReceiveFunctionCallWithInfo(cstate->fcinfo, flinfo, NULL, typioparam, typmod);
 	}
 	if (fld_size < 0)
 		ereport(ERROR,
@@ -1987,8 +1989,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	cstate->attribute_buf.data[fld_size] = '\0';
 
 	/* Call the column type's binary input converter */
-	result = ReceiveFunctionCall(flinfo, >attribute_buf,
- typioparam, typmod);
+	result = ReceiveFunctionCallWithInfo(cstate->fcinfo, flinfo, >attribute_buf,
+		 typioparam, typmod);
 
 	/* Trouble if it didn't eat the whole buffer */
 	if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..14c3ed2bdb 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1672,6 +1672,73 @@ DirectInputFunctionCallSafe(PGFunction func, char *str,
 	return true;
 }
 
+/*
+ * Prepare callinfo f

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Michael Paquier

On Wed, Feb 14, 2024 at 02:08:51PM +0900, Sutou Kouhei wrote:
> I understand the feeling. SQL uses "prepared" for "prepared
> statement". There are similar function names such as
> InputFunctionCall()/InputFunctionCallSafe()/DirectInputFunctionCallSafe(). 
> They
> execute (call) an input function but they use "call" not
> "execute" for it... So "Execute...Call..." may be
> redundant...
> 
> How about InputFunctionCallSafeWithInfo(),
> InputFunctionCallSafeInfo() or
> InputFunctionCallInfoCallSafe()?

WithInfo() would not be a new thing.  There are a couple of APIs named
like this when manipulating catalogs, so that sounds kind of a good
choice from here.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 14 Feb 2024 12:28:38 +0900,
  Michael Paquier  wrote:

>> How about the attached patch approach? If it's a desired
>> approach, I can also write a separated patch for COPY TO.
> 
> Hmm, I have not studied that much, but my first impression was that we
> would not require any new facility in fmgr.c, but perhaps you're right
> and it's more elegant to pass a InitFunctionCallInfoData this way.

I'm not familiar with the fmgr.c related code base but it
seems that we abstract {,binary-}input function call by
fmgr.c. So I think that it's better that we follow the
design. (If there is a person who knows the fmgr.c related
code base, please help us.)

> PrepareInputFunctionCallInfo() looks OK as a name, but I'm less a fan
> of PreparedInputFunctionCallSafe() and its "Prepared" part.  How about
> something like ExecuteInputFunctionCallSafe()?

I understand the feeling. SQL uses "prepared" for "prepared
statement". There are similar function names such as
InputFunctionCall()/InputFunctionCallSafe()/DirectInputFunctionCallSafe(). They
execute (call) an input function but they use "call" not
"execute" for it... So "Execute...Call..." may be
redundant...

How about InputFunctionCallSafeWithInfo(),
InputFunctionCallSafeInfo() or
InputFunctionCallInfoCallSafe()?

> I may be able to look more at that next week, and I would surely check
> the impact of that with a simple COPY query throttled by CPU (more
> rows and more attributes the better).

Thanks!

>Note that I've reverted 06bd311bce24 for
> the moment, as this is just getting in the way of the main patch, and
> that was non-optimal once there is a per-row callback.

Thanks for sharing the information. I'll rebase on master
when I create the v15 patch.

>> diff --git a/src/backend/commands/copyfrom.c 
>> b/src/backend/commands/copyfrom.c
>> index 41f6bc43e4..a43c853e99 100644
>> --- a/src/backend/commands/copyfrom.c
>> +++ b/src/backend/commands/copyfrom.c
>> @@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
>>  /* We keep those variables in cstate. */
>>  cstate->in_functions = in_functions;
>>  cstate->typioparams = typioparams;
>> +if (cstate->opts.binary)
>> +cstate->fcinfo = PrepareInputFunctionCallInfo();
>> +else
>> +cstate->fcinfo = PrepareReceiveFunctionCallInfo();
> 
> Perhaps we'd better avoid more callbacks like that, for now.

I'll not use a callback for this. I'll not change this part
after we introduce Copy{To,From}Routine. cstate->fcinfo
isn't used some custom COPY format handlers such as Apache
Arrow handler like cstate->in_functions and
cstate->typioparams. But they will be always allocated. It's
a bit wasteful for those handlers but we may not care about
it. So we can always use "if (state->opts.binary)" condition
here.

BTW... This part was wrong... Sorry... It should be:

if (cstate->opts.binary)
cstate->fcinfo = PrepareReceiveFunctionCallInfo();
else
cstate->fcinfo = PrepareInputFunctionCallInfo();

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Michael Paquier

On Tue, Feb 13, 2024 at 05:33:40PM +0900, Sutou Kouhei wrote:
> Hi,
> 
> In <20240209192705.5qdilvviq3py2...@awork3.anarazel.de>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 9 Feb 2024 11:27:05 -0800,
>   Andres Freund  wrote:
> 
>>> +static void
>>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>>> +  FmgrInfo *finfo, Oid *typioparam)
>>> +{
>>> +   Oid func_oid;
>>> +
>>> +   getTypeInputInfo(atttypid, _oid, typioparam);
>>> +   fmgr_info(func_oid, finfo);
>>> +}
>> 
>> FWIW, we should really change the copy code to initialize 
>> FunctionCallInfoData
>> instead of re-initializing that on every call, realy makes a difference
>> performance wise.
> 
> How about the attached patch approach? If it's a desired
> approach, I can also write a separated patch for COPY TO.

Hmm, I have not studied that much, but my first impression was that we
would not require any new facility in fmgr.c, but perhaps you're right
and it's more elegant to pass a InitFunctionCallInfoData this way.

PrepareInputFunctionCallInfo() looks OK as a name, but I'm less a fan
of PreparedInputFunctionCallSafe() and its "Prepared" part.  How about
something like ExecuteInputFunctionCallSafe()?

I may be able to look more at that next week, and I would surely check
the impact of that with a simple COPY query throttled by CPU (more
rows and more attributes the better).

>>> +   cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
>>> +   /* Set read attribute callback */
>>> +   if (cstate->opts.csv_mode)
>>> +   cstate->copy_read_attributes = CopyReadAttributesCSV;
>>> +   else
>>> +   cstate->copy_read_attributes = CopyReadAttributesText;
>>> +}
>> 
>> Isn't this precisely repeating the mistake of 2889fd23be56?
> 
> What do you think about the approach in my previous mail's
> attachments?
> https://www.postgresql.org/message-id/flat/20240209.163205.704848659612151781.kou%40clear-code.com#dbb1f8d7f2f0e8fe3c7e37a757fcfc54
>
> If it's a desired approach, I can prepare a v15 patch set
> based on the v14 patch set and the approach.

Yes, this one looks like it's using the right angle: we don't rely
anymore in cstate to decide which CopyReadAttributes to use, the
routines do that instead.  Note that I've reverted 06bd311bce24 for
the moment, as this is just getting in the way of the main patch, and
that was non-optimal once there is a per-row callback.

> diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
> index 41f6bc43e4..a43c853e99 100644
> --- a/src/backend/commands/copyfrom.c
> +++ b/src/backend/commands/copyfrom.c
> @@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
>   /* We keep those variables in cstate. */
>   cstate->in_functions = in_functions;
>   cstate->typioparams = typioparams;
> + if (cstate->opts.binary)
> + cstate->fcinfo = PrepareInputFunctionCallInfo();
> + else
> + cstate->fcinfo = PrepareReceiveFunctionCallInfo();

Perhaps we'd better avoid more callbacks like that, for now.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Sutou Kouhei

Hi,

In <20240209192705.5qdilvviq3py2...@awork3.anarazel.de>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 9 Feb 2024 11:27:05 -0800,
  Andres Freund  wrote:

>> +static void
>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>> +   FmgrInfo *finfo, Oid *typioparam)
>> +{
>> +Oid func_oid;
>> +
>> +getTypeInputInfo(atttypid, _oid, typioparam);
>> +fmgr_info(func_oid, finfo);
>> +}
> 
> FWIW, we should really change the copy code to initialize FunctionCallInfoData
> instead of re-initializing that on every call, realy makes a difference
> performance wise.

How about the attached patch approach? If it's a desired
approach, I can also write a separated patch for COPY TO.

>> +cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
>> +/* Set read attribute callback */
>> +if (cstate->opts.csv_mode)
>> +cstate->copy_read_attributes = CopyReadAttributesCSV;
>> +else
>> +cstate->copy_read_attributes = CopyReadAttributesText;
>> +}
> 
> Isn't this precisely repeating the mistake of 2889fd23be56?

What do you think about the approach in my previous mail's
attachments?
https://www.postgresql.org/message-id/flat/20240209.163205.704848659612151781.kou%40clear-code.com#dbb1f8d7f2f0e8fe3c7e37a757fcfc54

If it's a desired approach, I can prepare a v15 patch set
based on the v14 patch set and the approach.


I'll reply other comments later...


Thanks,
-- 
kou
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 41f6bc43e4..a43c853e99 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
 	/* We keep those variables in cstate. */
 	cstate->in_functions = in_functions;
 	cstate->typioparams = typioparams;
+	if (cstate->opts.binary)
+		cstate->fcinfo = PrepareInputFunctionCallInfo();
+	else
+		cstate->fcinfo = PrepareReceiveFunctionCallInfo();
 	cstate->defmap = defmap;
 	cstate->defexprs = defexprs;
 	cstate->volatile_defexprs = volatile_defexprs;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 906756362e..e372e5efb8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -853,6 +853,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 num_defaults = cstate->num_defaults;
 	FmgrInfo   *in_functions = cstate->in_functions;
 	Oid		   *typioparams = cstate->typioparams;
+	FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
 	int			i;
 	int		   *defmap = cstate->defmap;
 	ExprState **defexprs = cstate->defexprs;
@@ -953,12 +954,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
 			 * errors
 			 */
-			else if (!InputFunctionCallSafe(_functions[m],
-			string,
-			typioparams[m],
-			att->atttypmod,
-			(Node *) cstate->escontext,
-			[m]))
+			else if (!PreparedInputFunctionCallSafe(fcinfo,
+	_functions[m],
+	string,
+	typioparams[m],
+	att->atttypmod,
+	(Node *) cstate->escontext,
+	[m]))
 			{
 cstate->num_errors++;
 return true;
@@ -1958,7 +1960,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	if (fld_size == -1)
 	{
 		*isnull = true;
-		return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+		return PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, NULL, typioparam, typmod);
 	}
 	if (fld_size < 0)
 		ereport(ERROR,
@@ -1979,8 +1981,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	cstate->attribute_buf.data[fld_size] = '\0';
 
 	/* Call the column type's binary input converter */
-	result = ReceiveFunctionCall(flinfo, >attribute_buf,
- typioparam, typmod);
+	result = PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, >attribute_buf,
+		 typioparam, typmod);
 
 	/* Trouble if it didn't eat the whole buffer */
 	if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..b0b5310219 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1672,6 +1672,73 @@ DirectInputFunctionCallSafe(PGFunction func, char *str,
 	return true;
 }
 
+/*
+ * Prepare callinfo for PreparedInputFunctionCallSafe to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareInputFunctionCallInfo(void)
+{
+	FunctionCallInfoBaseData *fcinfo;
+
+	fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunct

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-09 Thread Michael Paquier

On Fri, Feb 09, 2024 at 11:27:05AM -0800, Andres Freund wrote:
> On 2024-02-09 13:21:34 +0900, Michael Paquier wrote:
>> +static void
>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>> +   FmgrInfo *finfo, Oid *typioparam)
>> +{
>> +Oid func_oid;
>> +
>> +getTypeInputInfo(atttypid, _oid, typioparam);
>> +fmgr_info(func_oid, finfo);
>> +}
> 
> FWIW, we should really change the copy code to initialize FunctionCallInfoData
> instead of re-initializing that on every call, realy makes a difference
> performance wise.

You mean to initialize once its memory and let the internal routines
call InitFunctionCallInfoData for each attribute.  Sounds like a good
idea, doing that for HEAD before the main patch.  More impact with
more attributes.

>> +/*
>> + * CopyFromTextStart
>> + *
>> + * Start of COPY FROM for text/CSV format.
>> + */
>> +static void
>> +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
>> +{
>> +AttrNumber  attr_count;
>> +
>> +/*
>> + * If encoding conversion is needed, we need another buffer to hold the
>> + * converted input data.  Otherwise, we can just point input_buf to the
>> + * same buffer as raw_buf.
>> + */
>> +if (cstate->need_transcoding)
>> +{
>> +cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
>> +cstate->input_buf_index = cstate->input_buf_len = 0;
>> +}
>> +else
>> +cstate->input_buf = cstate->raw_buf;
>> +cstate->input_reached_eof = false;
>> +
>> +initStringInfo(>line_buf);
> 
> Seems kinda odd that we have a supposedly extensible API that then stores all
> this stuff in the non-extensible CopyFromState.

That relates to the introduction of the the opaque pointer mentioned 
upthread to point to a per-format structure, where we'd store data
specific to each format.

>> +/* create workspace for CopyReadAttributes results */
>> +attr_count = list_length(cstate->attnumlist);
>> +cstate->max_fields = attr_count;
> 
> Why is this here? This seems like generic code, not text format specific.

We don't care about that for binary.

>> +/*
>> + * CopyFromTextOneRow
>> + *
>> + * Copy one row to a set of `values` and `nulls` for the text and CSV
>> + * formats.
>> + */
> 
> I'm very doubtful it's a good idea to combine text and CSV here. They have
> basically no shared parsing code, so what's the point in sending them through
> one input routine?

The code shared between text and csv involves a path called once per
attribute.  TBH, I am not sure how much of the NULL handling should be
put outside the per-row routine as these options are embedded in the
core options.  So I don't have a better idea on this one than what's
proposed here if we cannot dispatch the routine calls once per
attribute.

>> +/* read raw fields in the next line */
>> +if (!NextCopyFromRawFields(cstate, _strings, ))
>> +return false;
>> +
>> +/* check for overflowing fields */
>> +if (attr_count > 0 && fldct > attr_count)
>> +ereport(ERROR,
>> +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> + errmsg("extra data after last expected 
>> column")));
> 
> It bothers me that we look to be ending up with different error handling
> across the various output formats, particularly if they're ending up in
> extensions. That'll make it harder to evolve this code in the future.

But different formats may have different requirements, including the
number of attributes detected vs expected.  That was not really
nothing me.

>> +if (cstate->opts.csv_mode)
>> +{
> 
> More unfortunate intermingling of multiple formats in a single
> routine.

Similar answer as a few paragraphs above.  Sutou-san was suggesting to
use an internal routine with fixed arguments instead, which would be
enough at the end with some inline instructions?

>> +
>> +if (cstate->defaults[m])
>> +{
>> +/*
>> + * The caller must supply econtext and have switched 
>> into the
>> + * per-tuple memory context in it.
>> + */
>> +Assert(econtext != NULL);
>> +Assert(CurrentMemoryContext == 
>> econtext->ecxt_per_tuple_memory);
>> +
>> +values[m] = ExecEvalExpr(defexprs[m], econtext, 
>> [m]);
>> +}
> 
> I don't think it's good that we end up with this code in different copy
> implementations.

Yeah, still we don't care about that for binary.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-09 Thread Andres Freund

Hi,

On 2024-02-09 13:21:34 +0900, Michael Paquier wrote:
> +static void
> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
> +FmgrInfo *finfo, Oid *typioparam)
> +{
> + Oid func_oid;
> +
> + getTypeInputInfo(atttypid, _oid, typioparam);
> + fmgr_info(func_oid, finfo);
> +}

FWIW, we should really change the copy code to initialize FunctionCallInfoData
instead of re-initializing that on every call, realy makes a difference
performance wise.


> +/*
> + * CopyFromTextStart
> + *
> + * Start of COPY FROM for text/CSV format.
> + */
> +static void
> +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
> +{
> + AttrNumber  attr_count;
> +
> + /*
> +  * If encoding conversion is needed, we need another buffer to hold the
> +  * converted input data.  Otherwise, we can just point input_buf to the
> +  * same buffer as raw_buf.
> +  */
> + if (cstate->need_transcoding)
> + {
> + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
> + cstate->input_buf_index = cstate->input_buf_len = 0;
> + }
> + else
> + cstate->input_buf = cstate->raw_buf;
> + cstate->input_reached_eof = false;
> +
> + initStringInfo(>line_buf);

Seems kinda odd that we have a supposedly extensible API that then stores all
this stuff in the non-extensible CopyFromState.


> + /* create workspace for CopyReadAttributes results */
> + attr_count = list_length(cstate->attnumlist);
> + cstate->max_fields = attr_count;

Why is this here? This seems like generic code, not text format specific.


> + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
> + /* Set read attribute callback */
> + if (cstate->opts.csv_mode)
> + cstate->copy_read_attributes = CopyReadAttributesCSV;
> + else
> + cstate->copy_read_attributes = CopyReadAttributesText;
> +}

Isn't this precisely repeating the mistake of 2889fd23be56?

And, why is this done here? Shouldn't this decision have been made prior to
even calling CopyFromTextStart()?

> +/*
> + * CopyFromTextOneRow
> + *
> + * Copy one row to a set of `values` and `nulls` for the text and CSV
> + * formats.
> + */

I'm very doubtful it's a good idea to combine text and CSV here. They have
basically no shared parsing code, so what's the point in sending them through
one input routine?


> +bool
> +CopyFromTextOneRow(CopyFromState cstate,
> +ExprContext *econtext,
> +Datum *values,
> +bool *nulls)
> +{
> + TupleDesc   tupDesc;
> + AttrNumber  attr_count;
> + FmgrInfo   *in_functions = cstate->in_functions;
> + Oid*typioparams = cstate->typioparams;
> + ExprState **defexprs = cstate->defexprs;
> + char  **field_strings;
> + ListCell   *cur;
> + int fldct;
> + int fieldno;
> + char   *string;
> +
> + tupDesc = RelationGetDescr(cstate->rel);
> + attr_count = list_length(cstate->attnumlist);
> +
> + /* read raw fields in the next line */
> + if (!NextCopyFromRawFields(cstate, _strings, ))
> + return false;
> +
> + /* check for overflowing fields */
> + if (attr_count > 0 && fldct > attr_count)
> + ereport(ERROR,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +  errmsg("extra data after last expected 
> column")));

It bothers me that we look to be ending up with different error handling
across the various output formats, particularly if they're ending up in
extensions. That'll make it harder to evolve this code in the future.


> + fieldno = 0;
> +
> + /* Loop to read the user attributes on the line. */
> + foreach(cur, cstate->attnumlist)
> + {
> + int attnum = lfirst_int(cur);
> + int m = attnum - 1;
> + Form_pg_attribute att = TupleDescAttr(tupDesc, m);
> +
> + if (fieldno >= fldct)
> + ereport(ERROR,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +  errmsg("missing data for column 
> \"%s\"",
> + 
> NameStr(att->attname;
> + string = field_strings[fieldno++];
> +
> + if (cstate->convert_select_flags &&
> + !cstate->convert_select_flags[m])
> + {
> + /* ignore input field, leaving column as NULL */
> + continue;
> + }
> +
> + cstate->cur_attname = NameStr(att->attname);
> + cstate->cur_attval = string;
> +
> + if (cstate->opts.csv_mode)
> + {

More unfortunate intermingling of

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-09 Thread Michael Paquier

On Fri, Feb 09, 2024 at 04:32:05PM +0900, Sutou Kouhei wrote:
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 9 Feb 2024 13:21:34 +0900,
>   Michael Paquier  wrote:
>> A next step I think we could take is to split the binary-only and the
>> text/csv-only data in each cstate into their own structure to make the
>> structure, with an opaque pointer that custom formats could use, but a
>> lot of fields are shared as well.
> 
> It'll make COPY code base cleaner but it may decrease
> performance.

Perhaps, but I'm not sure, TBH.  But perhaps others can comment on
this point.  This surely needs to be studied closely.

> My suggestion:
> 1. Introduce Copy{To,From}Routine
>(We can do it based on the v14 patch.)
> 2. Add an opaque pointer to Copy{To,From}Routine
>(This must not have performance impact.)
> 3.a. Split format specific data to the opaque space
> 3.b. Add support for registering custom format handler by
>  creating a function
> 4. ...

4. is going to need 3.  At this point 3.b sounds like the main thing
to tackle first if we want to get something usable for the end-user
into this release, at least.  Still 2 is important for pluggability
as we pass the cstates across all the routines and custom formats want
to save their own data, so this split sounds OK.  I am not sure how
much of 3.a we really need to do for the in-core formats.

> I think that we should not use this approach for
> performance. We need to use "static inline" and constant
> argument instead something like the attached
> remove-copy-read-attributes.diff.

FWIW, using inlining did not show any performance change here.
Perhaps that's only because this is called in the COPY FROM path once
per row (even for the case of using 1 attribute with blackhole_am).
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-08 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 9 Feb 2024 13:21:34 +0900,
  Michael Paquier  wrote:

>> - Revisit what we have here, looking at more profiles to see how HEAD
>> an v13 compare.  It looks like we are on a good path, but let's tackle
>> things one step at a time.
> 
> And attached is a v14 that's rebased on HEAD.

Thanks!

> A next step I think we could take is to split the binary-only and the
> text/csv-only data in each cstate into their own structure to make the
> structure, with an opaque pointer that custom formats could use, but a
> lot of fields are shared as well.

It'll make COPY code base cleaner but it may decrease
performance. How about just adding an opaque pointer to each
cstate as the next step and then try the split?

My suggestion:
1. Introduce Copy{To,From}Routine
   (We can do it based on the v14 patch.)
2. Add an opaque pointer to Copy{To,From}Routine
   (This must not have performance impact.)
3.a. Split format specific data to the opaque space
3.b. Add support for registering custom format handler by
 creating a function
4. ...

>This patch is already complicated
> enough IMO, so I'm OK to leave it out for the moment, and focus on
> making this infra pluggable as a next step.

I agree with you.

> Are there any comments about this v14?  Sutou-san?

Here are my comments:


+   /* Set read attribute callback */
+   if (cstate->opts.csv_mode)
+   cstate->copy_read_attributes = CopyReadAttributesCSV;
+   else
+   cstate->copy_read_attributes = CopyReadAttributesText;

I think that we should not use this approach for
performance. We need to use "static inline" and constant
argument instead something like the attached
remove-copy-read-attributes.diff.

We have similar codes for
CopyReadLine()/CopyReadLineText(). The attached
remove-copy-read-attributes-and-optimize-copy-read-line.diff
also applies the same optimization to
CopyReadLine()/CopyReadLineText().

I hope that this improved performance of COPY FROM.

+/*
+ * Routines assigned to each format.
++

Garbage "+"

+ * CSV and text share the same implementation, at the exception of the
+ * copy_read_attributes callback.
+ */


+/*
+ * CopyToTextOneRow
+ *
+ * Process one row for text/CSV format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate,
+TupleTableSlot *slot)
+{
...
+   if (cstate->opts.csv_mode)
+   CopyAttributeOutCSV(cstate, string,
+   
cstate->opts.force_quote_flags[attnum - 1]);
+   else
+   CopyAttributeOutText(cstate, string);
...

How about use "static inline" and constant argument approach
here too?

static inline void
CopyToTextBasedOneRow(CopyToState cstate,
  TupleTableSlot *slot,
  bool csv_mode)
{
...
if (cstate->opts.csv_mode)
CopyAttributeOutCSV(cstate, string,

cstate->opts.force_quote_flags[attnum - 1]);
else
CopyAttributeOutText(cstate, string);
...
}

static void
CopyToTextOneRow(CopyToState cstate,
 TupleTableSlot *slot,
 bool csv_mode)
{
CopyToTextBasedOneRow(cstate, slot, false);
}

static void
CopyToCSVOneRow(CopyToState cstate,
TupleTableSlot *slot,
bool csv_mode)
{
CopyToTextBasedOneRow(cstate, slot, true);
}

static const CopyToRoutine CopyCSVRoutineText = {
...
.CopyToOneRow = CopyToCSVOneRow,
...
};


Thanks,
-- 
kou
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a90b7189b5..6e244fb443 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -158,12 +158,6 @@ CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
 	attr_count = list_length(cstate->attnumlist);
 	cstate->max_fields = attr_count;
 	cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-
-	/* Set read attribute callback */
-	if (cstate->opts.csv_mode)
-		cstate->copy_read_attributes = CopyReadAttributesCSV;
-	else
-		cstate->copy_read_attributes = CopyReadAttributesText;
 }
 
 /*
@@ -221,9 +215,8 @@ CopyFromBinaryEnd(CopyFromState cstate)
 
 /*
  * Routines assigned to each format.
-+
  * CSV and text share the same implementation, at the exception of the
- * copy_read_attributes callback.
+ * CopyFromOneRow callback.
  */
 static const CopyFromRoutine CopyFromRoutineTe

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-08 Thread Michael Paquier

On Fri, Feb 09, 2024 at 01:19:50PM +0900, Sutou Kouhei wrote:
> Are you already working on this? Do you want me to write the
> next patch based on the current master?

No need for a new patch, thanks.  I've spent some time today doing a
rebase and measuring the whole, without seeing a degradation with what
should be the worst cases for COPY TO and FROM:
https://www.postgresql.org/message-id/ZcWoTr1N0GELFA9E%40paquier.xyz
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-08 Thread Michael Paquier

On Wed, Feb 07, 2024 at 01:33:18PM +0900, Michael Paquier wrote:
> Hmm.  That explains why I was not seeing any differences with this
> callback then.  It seems to me that the order of actions to take is
> clear, like:
> - Revert 2889fd23be56 to keep a clean state of the tree, now done with
> 1aa8324b81fa.
> - Dive into the strlen() issue, as it really looks like this can
> create more simplifications for the patch discussed on this thread
> with COPY TO.

This has been done this morning with b619852086ed.

> - Revisit what we have here, looking at more profiles to see how HEAD
> an v13 compare.  It looks like we are on a good path, but let's tackle
> things one step at a time.

And attached is a v14 that's rebased on HEAD.  While on it, I've
looked at more profiles and did more runtime checks.

Some runtimes, in (ms), average of 15 runs, 30 int attributes on 5M
rows as mentioned above:
COPY FROM  text   binary
HEAD   6066   7110
v146087   7105
COPY TOtext   binary
HEAD   6591   10161
v146508   10189

And here are some profiles, where I'm not seeing an impact at
row-level with the addition of the callbacks:
COPY FROM, text, master:
-   66.59%16.10%  postgres  postgres[.] NextCopyFrom

   ▒- 50.50% NextCopyFrom
   - 30.75% NextCopyFromRawFields
  + 15.93% CopyReadLine
13.73% CopyReadAttributesText
   - 19.43% InputFunctionCallSafe
  + 13.49% int4in
0.77% pg_strtoint32_safe
+ 16.10% _start
COPY FROM, text, v14:
-   66.42% 0.74%  postgres  postgres[.] NextCopyFrom
- 65.67% NextCopyFrom
   - 65.51% CopyFromTextOneRow
  - 30.25% NextCopyFromRawFields
 + 16.14% CopyReadLine
   13.40% CopyReadAttributesText
  - 18.96% InputFunctionCallSafe
 + 13.15% int4in
   0.70% pg_strtoint32_safe
+ 0.74% _start

COPY TO, binary, master
-   90.32% 7.14%  postgres  postgres[.] CopyOneRowTo
- 83.18% CopyOneRowTo
   + 60.30% SendFunctionCall
   + 10.99% appendBinaryStringInfo
   + 3.67% MemoryContextReset
   + 2.89% CopySendEndOfRow
 0.89% memcpy@plt
 0.66% 0xa052db5c
 0.62% enlargeStringInfo
 0.56% pgstat_progress_update_param
+ 7.14% _start
COPY TO, binary, v14
-   90.96% 0.21%  postgres  postgres[.] CopyOneRowTo
- 90.75% CopyOneRowTo
   - 81.86% CopyToBinaryOneRow
  + 59.17% SendFunctionCall
  + 10.56% appendBinaryStringInfo
1.10% enlargeStringInfo
0.59% int4send
0.57% memcpy@plt
   + 3.68% MemoryContextReset
   + 2.83% CopySendEndOfRow
 1.13% appendBinaryStringInfo
 0.58% SendFunctionCall
 0.58% pgstat_progress_update_param

Are there any comments about this v14?  Sutou-san?

A next step I think we could take is to split the binary-only and the
text/csv-only data in each cstate into their own structure to make the
structure, with an opaque pointer that custom formats could use, but a
lot of fields are shared as well.  This patch is already complicated
enough IMO, so I'm OK to leave it out for the moment, and focus on
making this infra pluggable as a next step.
--
Michael
From 6aa7f8cb65e792def8dda631fed19404db0a88af Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 6 Feb 2024 08:40:17 +0900
Subject: [PATCH v14] Extract COPY FROM/TO format implementations

This doesn't change the current behavior.  This just introduces a set of
copy routines called CopyFromRoutine and CopyToRoutine, which just has
function pointers of format implementation like TupleTableSlotOps, and
use it for existing "text", "csv" and "binary" format implementations.
There are plans to extend that more in the future with custom formats.

This improves performance when a relation has many attributes as this
eliminates format-based if branches.  The more the attributes, the
better the performance gain.  Blah add some numbers.
---
 src/include/commands/copyapi.h   | 100 ++
 src/include/commands/copyfrom_internal.h |  10 +
 src/backend/commands/copyfrom.c  | 209 ---
 src/backend/commands/copyfromparse.c | 362 ++-
 src/backend/commands/copyto.c| 434 ---
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 770 insertions(+), 347 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00..87e0629c2f
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,100 @@
+/*-
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-08 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 7 Feb 2024 13:33:18 +0900,
  Michael Paquier  wrote:

> Hmm.  That explains why I was not seeing any differences with this
> callback then.  It seems to me that the order of actions to take is
> clear, like:
> - Revert 2889fd23be56 to keep a clean state of the tree, now done with
> 1aa8324b81fa.

Done.

> - Dive into the strlen() issue, as it really looks like this can
> create more simplifications for the patch discussed on this thread
> with COPY TO.

Done: b619852086ed2b5df76631f5678f60d3bebd3745

> - Revisit what we have here, looking at more profiles to see how HEAD
> an v13 compare.  It looks like we are on a good path, but let's tackle
> things one step at a time.

Are you already working on this? Do you want me to write the
next patch based on the current master?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-08 Thread Michael Paquier

On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote:
> CREATE EXTENSION blackhole_am;

One thing I have forgotten here is to provide a copy of this AM for
future references, so here you go with a blackhole_am.tar.gz attached.
--
Michael


blackhole_am.tar.gz
Description: application/gzip


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-06 Thread Michael Paquier

On Tue, Feb 06, 2024 at 03:33:36PM -0800, Andres Freund wrote:
> Well, you can't just do that, because there's only one caller, namely
> CopyToTextOneRow(). What I am trying to suggest is something like the
> attached, just a quick hacky POC. Namely to split out CSV support from
> CopyToTextOneRow() by introducing CopyToCSVOneRow(), and to avoid code
> duplication by moving the code into a new CopyToTextLikeOneRow().

Ah, OK.  Got it now.

> I named it CopyToTextLike* here, because it seems confusing that some Text*
> are used for both CSV and text and others are actually just for text. But if
> were to go for that, we should go further.

This can always be argued later.

> To test the performnce effects I chose to remove the pointless encoding
> "check" we're discussing in the other thread, as it makes it harder to see the
> time differences due to the per-attribute code.  I did three runs of pgbench
> -t of [1] and chose the fastest result for each.
> 
> With turbo mode and power saving disabled:
>   Avg Time
> HEAD   995.349
> Remove Encoding Check  870.793
> v13-0001   869.678
> Remove out callback839.508

Hmm.  That explains why I was not seeing any differences with this
callback then.  It seems to me that the order of actions to take is
clear, like:
- Revert 2889fd23be56 to keep a clean state of the tree, now done with
1aa8324b81fa.
- Dive into the strlen() issue, as it really looks like this can
create more simplifications for the patch discussed on this thread
with COPY TO.
- Revisit what we have here, looking at more profiles to see how HEAD
an v13 compare.  It looks like we are on a good path, but let's tackle
things one step at a time.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-06 Thread Andres Freund

Hi,

On 2024-02-06 15:11:05 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 09:46:42PM -0800, Andres Freund wrote:
> > No - what I mean is that it doesn't make sense to have copy_attribute_out(),
> > as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it
> > can directly call the right function. That does require splitting a bit more
> > between csv and text output, but I think that can be done without much
> > duplication.
> 
> I am not sure to understand here.  In what is that different from
> reverting 2889fd23be56 then mark CopyAttributeOutCSV and
> CopyAttributeOutText as static inline?

Well, you can't just do that, because there's only one caller, namely
CopyToTextOneRow(). What I am trying to suggest is something like the
attached, just a quick hacky POC. Namely to split out CSV support from
CopyToTextOneRow() by introducing CopyToCSVOneRow(), and to avoid code
duplication by moving the code into a new CopyToTextLikeOneRow().

I named it CopyToTextLike* here, because it seems confusing that some Text*
are used for both CSV and text and others are actually just for text. But if
were to go for that, we should go further.


To test the performnce effects I chose to remove the pointless encoding
"check" we're discussing in the other thread, as it makes it harder to see the
time differences due to the per-attribute code.  I did three runs of pgbench
-t of [1] and chose the fastest result for each.


With turbo mode and power saving disabled:

  Avg Time
HEAD   995.349
Remove Encoding Check  870.793
v13-0001   869.678
Remove out callback839.508

Greetings,

Andres Freund

[1] COPY (SELECT 
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
 generate_series(1, 100::int4)) TO '/dev/null';
>From 28be7510a62984eb09b41aabe8c345a07b2b0e97 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 6 Feb 2024 14:57:20 -0800
Subject: [PATCH v13a 1/3] WIP: COPY TO: remove unnecessary and ineffective
 encoding "checks"

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/commands/copyto.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 52b42f8a522..ea317b5b3d5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -637,9 +637,10 @@ BeginCopyTo(ParseState *pstate,
 	 * are the same, we must apply pg_any_to_server() to validate data in
 	 * multibyte encodings.
 	 */
-	cstate->need_transcoding =
-		(cstate->file_encoding != GetDatabaseEncoding() ||
-		 pg_database_encoding_max_length() > 1);
+	cstate->need_transcoding = (
+		cstate->file_encoding != GetDatabaseEncoding()
+		 /* || pg_database_encoding_max_length() > 1 */
+		);
 	/* See Multibyte encoding comment above */
 	cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-- 
2.38.0

>From 176442831183baa7648fee87ab4aa016d9c4d8e4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 6 Feb 2024 08:40:17 +0900
Subject: [PATCH v13a 2/3] Extract COPY FROM/TO format implementations

This doesn't change the current behavior.  This just introduces a set of
copy routines called CopyFromRoutine and CopyToRoutine, which just has
function pointers of format implementation like TupleTableSlotOps, and
use it for existing "text", "csv" and "binary" format implementations.
There are plans to extend that more in the future with custom formats.

This improves performance when a relation has many attributes as this
eliminates format-based if branches.  The more the attributes, the
better the performance gain.  Blah add some numbers.
---
 src/include/commands/copyapi.h   | 100 ++
 src/include/commands/copyfrom_internal.h |  10 +
 src/backend/commands/copyfrom.c  | 209 ---
 src/backend/commands/copyfromparse.c | 362 ++-
 src/backend/commands/copyto.c| 438 +++
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 770 insertions(+), 351 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 000..87e0629c2f7
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,100 @@
+/*-
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Michael Paquier

On Mon, Feb 05, 2024 at 09:46:42PM -0800, Andres Freund wrote:
> No - what I mean is that it doesn't make sense to have copy_attribute_out(),
> as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it
> can directly call the right function. That does require splitting a bit more
> between csv and text output, but I think that can be done without much
> duplication.

I am not sure to understand here.  In what is that different from
reverting 2889fd23be56 then mark CopyAttributeOutCSV and
CopyAttributeOutText as static inline?  Or you mean to merge
CopyAttributeOutText and CopyAttributeOutCSV together into a single
inlined function, reducing a bit code readability?  Both routines have
their own roadmap for encoding_embeds_ascii with quoting and escaping,
so keeping them separated looks kinda cleaner here.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Andres Freund

Hi,

On 2024-02-06 11:41:06 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 05:41:25PM -0800, Andres Freund wrote:
> > On 2024-02-06 10:01:36 +0900, Michael Paquier wrote:
> >> If you have concerns about that, I'm OK to revert, I'm not wedded to
> >> this level of control.  Note that I've actually seen *better*
> >> runtimes.
> > 
> > I'm somewhat worried that handling the different formats at that level will
> > make it harder to improve copy performance - it's quite attrociously slow
> > right now. The more we reduce the per-row/field overhead, the more the
> > dispatch overhead will matter.
> 
> Yep.  That's the hard part when it comes to design these callbacks.
> We don't want something too high level because this leads to more code
> duplication churns when someone wants to plug in its own routine set,
> and we don't want to be at a too low level because of the indirect
> calls as you said.  I'd like to think that the current CopyFromOneRow
> offers a good balance here, avoiding the "if" branch with the binary
> and non-binary paths.

One way to address code duplication is to use static inline helper functions
that do a lot of the work in a generic fashion, but where the compiler can
optimize the branches away, because it can do constant folding.


> >> If yes, then I'd assume that this shuts down the whole thread or that it
> >> needs a completely different approach, because we will multiply indirect
> >> function calls that can control how data is generated for each row, which 
> >> is
> >> the original case that Sutou-san wanted to tackle.
> > 
> > I think it could be rescued fairly easily - remove the dispatch via
> > ->copy_attribute_out().  To avoid duplicating code you could use a static
> > inline function that's used with constant arguments by both csv and text 
> > mode.
> 
> Hmm.  So you basically mean to tweak the beginning of
> CopyToTextOneRow() and CopyToTextStart() so as copy_attribute_out is
> saved in a local variable outside of cstate and we'd save the "if"
> checked for each attribute.  If I got that right, it would mean
> something like the v13-0002 attached, on top of the v13-0001 of
> upthread.  Is that what you meant?

No - what I mean is that it doesn't make sense to have copy_attribute_out(),
as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it
can directly call the right function. That does require splitting a bit more
between csv and text output, but I think that can be done without much
duplication.

Greetings,

Andres Freund

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Michael Paquier

On Mon, Feb 05, 2024 at 05:41:25PM -0800, Andres Freund wrote:
> On 2024-02-06 10:01:36 +0900, Michael Paquier wrote:
>> If you have concerns about that, I'm OK to revert, I'm not wedded to
>> this level of control.  Note that I've actually seen *better*
>> runtimes.
> 
> I'm somewhat worried that handling the different formats at that level will
> make it harder to improve copy performance - it's quite attrociously slow
> right now. The more we reduce the per-row/field overhead, the more the
> dispatch overhead will matter.

Yep.  That's the hard part when it comes to design these callbacks.
We don't want something too high level because this leads to more code
duplication churns when someone wants to plug in its own routine set,
and we don't want to be at a too low level because of the indirect
calls as you said.  I'd like to think that the current CopyFromOneRow
offers a good balance here, avoiding the "if" branch with the binary
and non-binary paths.

>> Hmm.  Do you have concerns about v13 posted on [2] then?
> 
> As is I'm indeed not a fan. It imo doesn't make sense to have an indirect
> dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when
> in ->CopyToOneRow for text, we could know that we need to call
> CopyAttributeOutText etc.

Right.

>> If yes, then I'd assume that this shuts down the whole thread or that it
>> needs a completely different approach, because we will multiply indirect
>> function calls that can control how data is generated for each row, which is
>> the original case that Sutou-san wanted to tackle.
> 
> I think it could be rescued fairly easily - remove the dispatch via
> ->copy_attribute_out().  To avoid duplicating code you could use a static
> inline function that's used with constant arguments by both csv and text mode.

Hmm.  So you basically mean to tweak the beginning of
CopyToTextOneRow() and CopyToTextStart() so as copy_attribute_out is
saved in a local variable outside of cstate and we'd save the "if"
checked for each attribute.  If I got that right, it would mean
something like the v13-0002 attached, on top of the v13-0001 of
upthread.  Is that what you meant?

> I think it might also be worth ensuring that future patches can move branches
> like
>   if (cstate->encoding_embeds_ascii)
>   if (cstate->need_transcoding)
> into the choice of per-row callback.

Yeah, I'm still not sure how much we should split CopyToStateData in
the initial patch set.  I'd like to think that the best result would
be to have in the state data an opaque (void *) that points to a
structure that can be set for each format, so as there is a clean
split between which variable gets set and used where (same remark
applies to COPY FROM with its raw_fields, raw_fields, for example).
--
Michael
From 4f95908173a1fb69ec228eaf44922862afca48bd Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 6 Feb 2024 08:40:17 +0900
Subject: [PATCH v13 1/2] Extract COPY FROM/TO format implementations

This doesn't change the current behavior.  This just introduces a set of
copy routines called CopyFromRoutine and CopyToRoutine, which just has
function pointers of format implementation like TupleTableSlotOps, and
use it for existing "text", "csv" and "binary" format implementations.
There are plans to extend that more in the future with custom formats.

This improves performance when a relation has many attributes as this
eliminates format-based if branches.  The more the attributes, the
better the performance gain.  Blah add some numbers.
---
 src/include/commands/copyapi.h   | 100 ++
 src/include/commands/copyfrom_internal.h |  10 +
 src/backend/commands/copyfrom.c  | 209 ---
 src/backend/commands/copyfromparse.c | 362 ++-
 src/backend/commands/copyto.c| 438 +++
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 770 insertions(+), 351 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00..87e0629c2f
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,100 @@
+/*-
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.  Note this must be
+ * allocated in a server-lifetime manner, typically as a

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Andres Freund

Hi,

On 2024-02-06 10:01:36 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote:
> > Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
> > be surprised if it lead to a measurable performance regression.
>
> Yes, I was looking at runtimes and some profiles around CopyOneRowTo()
> to see the effects that this has yesterday.  The principal point of
> contention is CopyOneRowTo() where the callback is called once per
> attribute, so more attributes stress it more.

Right.


> If you have concerns about that, I'm OK to revert, I'm not wedded to
> this level of control.  Note that I've actually seen *better*
> runtimes.

I'm somewhat worried that handling the different formats at that level will
make it harder to improve copy performance - it's quite attrociously slow
right now. The more we reduce the per-row/field overhead, the more the
dispatch overhead will matter.



> [1]: https://www.postgresql.org/message-id/zbr6piwuvhdtf...@paquier.xyz
>
> > I think callbacks for individual attributes is the wrong approach - the
> > dispatch needs to happen at a higher level, otherwise there are too many
> > indirect function calls.
>
> Hmm.  Do you have concerns about v13 posted on [2] then?

As is I'm indeed not a fan. It imo doesn't make sense to have an indirect
dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when
in ->CopyToOneRow for text, we could know that we need to call
CopyAttributeOutText etc.


> If yes, then I'd assume that this shuts down the whole thread or that it
> needs a completely different approach, because we will multiply indirect
> function calls that can control how data is generated for each row, which is
> the original case that Sutou-san wanted to tackle.

I think it could be rescued fairly easily - remove the dispatch via
->copy_attribute_out().  To avoid duplicating code you could use a static
inline function that's used with constant arguments by both csv and text mode.

I think it might also be worth ensuring that future patches can move branches
like
if (cstate->encoding_embeds_ascii)
if (cstate->need_transcoding)
into the choice of per-row callback.


> The End, Start and In/OutFunc callbacks are called only once per query, so
> these don't matter AFAIU.

Right.

Greetings,

Andres Freund

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Michael Paquier

On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote:
> Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
> be surprised if it lead to a measurable performance regression.

Yes, I was looking at runtimes and some profiles around CopyOneRowTo()
to see the effects that this has yesterday.  The principal point of
contention is CopyOneRowTo() where the callback is called once per
attribute, so more attributes stress it more.  The method I've used is
described in [1], where I've used up to 50 int attributes (fixed value
size to limit appendBinaryStringInfo) with 5 million rows, with
shared_buffers large enough that all the data fits in it, while
prewarming the whole.  Postgres runs on a tmpfs, and COPY TO is
redirected to /dev/null.

For reference, I still have some reports lying around (-g attached to
the backend process running the COPY TO queries with text format), so
here you go:
* At 95fb5b49024a:
-   83.04%11.46%  postgres  postgres[.] CopyOneRowTo
- 71.58% CopyOneRowTo
   - 30.37% OutputFunctionCall
  + 27.77% int4out
   + 13.18% CopyAttributeOutText
   + 10.19% appendBinaryStringInfo
 3.76% 0xa7096234
 2.78% 0xa7096214
   + 2.49% CopySendEndOfRow
 1.21% int4out
 0.83% memcpy@plt
 0.76% 0xa7094ba8
 0.75% 0xa7094ba4
 0.69% pgstat_progress_update_param
 0.57% enlargeStringInfo
 0.52% 0xa7096204
 0.52% 0xa7094b8c
+ 11.46% _start
* At 2889fd23be56:
-   83.53%14.24%  postgres  postgres[.] CopyOneRowTo
- 69.29% CopyOneRowTo
   - 29.89% OutputFunctionCall
  + 27.43% int4out
   - 12.89% CopyAttributeOutText
pg_server_to_any
   + 9.31% appendBinaryStringInfo
 3.68% 0xa6940234
   + 2.74% CopySendEndOfRow
 2.43% 0xa6940214
 1.36% int4out
 0.74% 0xa693eba8
 0.73% pgstat_progress_update_param
 0.65% memcpy@plt
 0.53% MemoryContextReset
+ 14.24% _start

If you have concerns about that, I'm OK to revert, I'm not wedded to
this level of control.  Note that I've actually seen *better*
runtimes.

[1]: https://www.postgresql.org/message-id/zbr6piwuvhdtf...@paquier.xyz

> I think callbacks for individual attributes is the wrong approach - the
> dispatch needs to happen at a higher level, otherwise there are too many
> indirect function calls.

Hmm.  Do you have concerns about v13 posted on [2] then?  If yes, then
I'd assume that this shuts down the whole thread or that it needs a
completely different approach, because we will multiply indirect
function calls that can control how data is generated for each row,
which is the original case that Sutou-san wanted to tackle.  There
could be many indirect calls with custom callbacks that control how
things should be processed at row-level, and COPY likes doing work
with loads of data.  The End, Start and In/OutFunc callbacks are
called only once per query, so these don't matter AFAIU.

[2]: https://www.postgresql.org/message-id/zcfz59njjqnjw...@paquier.xyz
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Michael Paquier

On Mon, Feb 05, 2024 at 06:05:15PM +0900, Sutou Kouhei wrote:
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 5 Feb 2024 16:14:08 +0900,
>   Michael Paquier  wrote:
>> 2) I have backpedaled on the postpare callback, which did not bring
>> much in clarity IMO while being a CSV-only callback.  Note that we
>> have in copyfromparse.c more paths that are only for CSV but the past
>> versions of the patch never cared about that.  This makes the text and
>> CSV implementations much closer to each other, as a result.
> 
> Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in
> CopyReadLineText(). The postpare callback is for
> optimization. If it doesn't improve performance, we don't
> need to introduce it.

No worries.

> We may want to try eliminating cstate->opts.csv_mode in
> CopyReadLineText() for performance. But we don't need to
> do this in introducing CopyFromRoutine. We can defer it.
> 
> So I don't object removing the postpare callback.

Rather related, but there has been a comment from Andres about this
kind of splits a few hours ago, so perhaps this is for the best:
https://www.postgresql.org/message-id/20240205182118.h5rkbnjgujwzuxip%40awork3.anarazel.de

I'll reply to this one in a bit.

>>  Let me know if you have
>> comments about all that.
> 
> Here are some comments for the patch:

Thanks.  My head was spinning after reading the diffs more than 20
times :)

> fmgr_info ->
> finfo
> optinally ->
> optionally
> CopyFromRoutine->OneRow ->
> CopyFromRoutine->CopyFromOneRow
> CopyFromTextStart ->
> CopyFromBinaryStart
> CopyFromTextEnd ->
> CopyFromBinaryEnd

Fixed all these.

> How about passing CopyFromState cstate too like other
> callbacks for consistency?

Yes, I was wondering a bit if this can be useful for the custom
formats.

> + /*
> +  * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
> +  *
> +  * 'econtext' is used to evaluate default expression for each column 
> that
> +  * is either not read from the file or is using the DEFAULT option of 
> COPY
> 
> or is ->
> or

"or is" is correct here IMO.

> Wow! I didn't know that we need to update typedefs.list when
> I add a "typedef struct".

That's for the automated indentation.  This is a habit I have when it
comes to work on shaping up patches to avoid weird diffs with pgindent
and new structure names.  It's OK to forget about it :)

Attaching a v13 for now.
--
Michael
From 50ebc2d6b1b5f7bcb2248dd3fa4cef3f14f7adaa Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 6 Feb 2024 08:40:17 +0900
Subject: [PATCH v13] Extract COPY FROM/TO format implementations

This doesn't change the current behavior.  This just introduces a set of
copy routines called CopyFromRoutine and CopyToRoutine, which just has
function pointers of format implementation like TupleTableSlotOps, and
use it for existing "text", "csv" and "binary" format implementations.
There are plans to extend that more in the future with custom formats.

This improves performance when a relation has many attributes as this
eliminates format-based if branches.  The more the attributes, the
better the performance gain.  Blah add some numbers.
---
 src/include/commands/copyapi.h   | 100 ++
 src/include/commands/copyfrom_internal.h |  10 +
 src/backend/commands/copyfrom.c  | 209 ---
 src/backend/commands/copyfromparse.c | 362 ++-
 src/backend/commands/copyto.c| 438 +++
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 770 insertions(+), 351 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00..87e0629c2f
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,100 @@
+/*-
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.  Note this must be
+ * allocated in a server-lifetime manner, typi

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Andres Freund

Hi,

Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
be surprised if it lead to a measurable performance regression.

I think callbacks for individual attributes is the wrong approach - the
dispatch needs to happen at a higher level, otherwise there are too many
indirect function calls.

Greetings,

Andres Freund

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 5 Feb 2024 16:14:08 +0900,
  Michael Paquier  wrote:

> So, I've looked at all that today, and finished by applying two
> patches as of 2889fd23be56 and 95fb5b49024a to get some of the
> weirdness with the workhorse routines out of the way.

Thanks!

> As this is called within the OneRow routine, I can live with that.  If
> there is an opposition to that, we could just attach it within the
> routines.

I don't object the approach.

> I am attaching a v12 which is close to what I want it to be, with
> much more documentation and comments.  There are two things that I've
> changed compared to the previous versions though:
> 1) I have added a callback to set up the input and output functions
> rather than attach that in the Start callback.

I'm OK with this. I just don't use them in Apache Arrow COPY
FORMAT extension.

> - No need for plugins to think about how to allocate this data.  v11
> and other versions were doing things the wrong way by allocating this
> stuff in the wrong memory context as we switch to the COPY context
> when we are in the Start routines.

Oh, sorry. I missed it when I moved them.

> 2) I have backpedaled on the postpare callback, which did not bring
> much in clarity IMO while being a CSV-only callback.  Note that we
> have in copyfromparse.c more paths that are only for CSV but the past
> versions of the patch never cared about that.  This makes the text and
> CSV implementations much closer to each other, as a result.

Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in
CopyReadLineText(). The postpare callback is for
optimization. If it doesn't improve performance, we don't
need to introduce it.

We may want to try eliminating cstate->opts.csv_mode in
CopyReadLineText() for performance. But we don't need to
do this in introducing CopyFromRoutine. We can defer it.

So I don't object removing the postpare callback.

>  Let me know if you have
> comments about all that.

Here are some comments for the patch:

+   /*
+* Called when COPY FROM is started to set up the input functions
+* associated to the relation's attributes writing to.  `fmgr_info` can 
be

fmgr_info ->
finfo

+* optionally filled to provide the catalog information of the input
+* function.  `typioparam` can be optinally filled to define the OID of

optinally ->
optionally

+* the type to pass to the input function.  `atttypid` is the OID of 
data
+* type used by the relation's attribute.
+*/
+   void(*CopyFromInFunc) (Oid atttypid, FmgrInfo *finfo,
+  Oid 
*typioparam);

How about passing CopyFromState cstate too like other
callbacks for consistency?

+   /*
+* Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+*
+* 'econtext' is used to evaluate default expression for each column 
that
+* is either not read from the file or is using the DEFAULT option of 
COPY

or is ->
or

(I'm not sure...)

+* FROM.  It is NULL if no default values are used.
+*
+* Returns false if there are no more tuples to copy.
+*/
+   bool(*CopyFromOneRow) (CopyFromState cstate, ExprContext 
*econtext,
+  Datum 
*values, bool *nulls);

+typedef struct CopyToRoutine
+{
+   /*
+* Called when COPY TO is started to set up the output functions
+* associated to the relation's attributes reading from.  `fmgr_info` 
can

fmgr_info ->
finfo

+* be optionally filled. `atttypid` is the OID of data type used by the
+* relation's attribute.
+*/
+   void(*CopyToOutFunc) (Oid atttypid, FmgrInfo *finfo);

How about passing CopyToState cstate too like other
callbacks for consistency?


@@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 extern int CopyReadAttributesCSV(CopyFromState cstate);
 extern int CopyReadAttributesText(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->OneRow */

CopyFromRoutine->OneRow ->
CopyFromRoutine->CopyFromOneRow

+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+  Datum *values, bool 
*nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+Datum *values, 
bool *nulls);
+
 #endif /* COPYFROM_INTERNAL_H 
*/

+/*
+ * CopyFromTextStart

CopyFromTextStart ->
CopyFromBinaryStart

+ *
+ * Start of COPY FROM for binary format.
+ */
+stati

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-04 Thread Michael Paquier

On Fri, Feb 02, 2024 at 05:46:18PM +0900, Sutou Kouhei wrote:
> Hi,
> 
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 2 Feb 2024 17:04:28 +0900,
>   Michael Paquier  wrote:
> 
> > One idea I was considering is whether we should use a special value in
> > the "format" DefElem, say "custom:$my_custom_format" where it would be
> > possible to bypass the formay check when processing options and find
> > the routines after processing all the options.  I'm not wedded to
> > that, but attaching the routines to the state data is IMO the correct
> > thing, because this has nothing to do with CopyFormatOptions.
> 
> Thanks for sharing your idea.
> Let's discuss how to support custom options after we
> complete the current performance changes.
> 
> >> I'm OK with the approach. But how about adding the extra
> >> callbacks to Copy{From,To}StateData not
> >> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
> >> CopyFromStateData::data_source_cb? They are only needed for
> >> "text" and "csv". So we don't need to add them to
> >> Copy{From,To}Routines to keep required callback minimum.
> > 
> > And set them in cstate while we are in the Start routine, right?
> 
> I imagined that it's done around the following part:
> 
> @@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
> /* Extract options from the statement node tree */
> ProcessCopyOptions(pstate, >opts, true /* is_from */ , 
> options);
>  
> +   /* Set format routine */
> +   cstate->routine = CopyFromGetRoutine(cstate->opts);
> +
> /* Process the target relation */
> cstate->rel = rel;
>  
> 
> Example1:
> 
> /* Set format routine */
> cstate->routine = CopyFromGetRoutine(cstate->opts);
> if (!cstate->opts.binary)
> if (cstate->opts.csv_mode)
> cstate->copy_read_attributes = CopyReadAttributesCSV;
> else
> cstate->copy_read_attributes = CopyReadAttributesText;
> 
> Example2:
> 
> static void
> CopyFromSetRoutine(CopyFromState cstate)
> {
> if (cstate->opts.csv_mode)
> {
> cstate->routine = 
> cstate->copy_read_attributes = CopyReadAttributesCSV;
> }
> else if (cstate.binary)
> cstate->routine = 
> else
> {
> cstate->routine = 
> cstate->copy_read_attributes = CopyReadAttributesText;
> }
> }
> 
> BeginCopyFrom()
> {
> /* Set format routine */
> CopyFromSetRoutine(cstate);
> }
> 
> 
> But I don't object your original approach. If we have the
> extra callbacks in Copy{From,To}Routines, I just don't use
> them for my custom format extension.
> 
> >> What is the better next action for us? Do you want to
> >> complete the WIP v11 patch set by yourself (and commit it)?
> >> Or should I take over it?
> > 
> > I was planning to work on that, but wanted to be sure how you felt
> > about the problem with text and csv first.
> 
> OK.
> My opinion is the above. I have an idea how to implement it
> but it's not a strong idea. You can choose whichever you like.

So, I've looked at all that today, and finished by applying two
patches as of 2889fd23be56 and 95fb5b49024a to get some of the
weirdness with the workhorse routines out of the way.  Both have added
callbacks assigned in their respective cstate data for text and csv.
As this is called within the OneRow routine, I can live with that.  If
there is an opposition to that, we could just attach it within the
routines.  The CopyAttributeOut routines had a strange argument
layout, actually, the flag for the quotes is required as a header uses
no quotes, but there was little point in the "single arg" case, so
I've removed it.

I am attaching a v12 which is close to what I want it to be, with
much more documentation and comments.  There are two things that I've
changed compared to the previous versions though:
1) I have added a callback to set up the input and output functions
rather than attach that in the Start callback.  These routines are now
called once per argument, where we know that the argument is valid.
The callbacks are in charge of filling the FmgrInfos.  There are some
good reasons behind that:
- No need for plugins to think about how to allocate this data.  v11
and other versions were doing things the wrong way by allocating this
stuff in the wrong memory context as we switch to the COPY context
when we are in the Start routines.
- This avoids attisdropped problems, and we have a long history of
bugs regarding that.  I'm ready to bet that custom f

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-02 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 17:04:28 +0900,
  Michael Paquier  wrote:

> One idea I was considering is whether we should use a special value in
> the "format" DefElem, say "custom:$my_custom_format" where it would be
> possible to bypass the formay check when processing options and find
> the routines after processing all the options.  I'm not wedded to
> that, but attaching the routines to the state data is IMO the correct
> thing, because this has nothing to do with CopyFormatOptions.

Thanks for sharing your idea.
Let's discuss how to support custom options after we
complete the current performance changes.

>> I'm OK with the approach. But how about adding the extra
>> callbacks to Copy{From,To}StateData not
>> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
>> CopyFromStateData::data_source_cb? They are only needed for
>> "text" and "csv". So we don't need to add them to
>> Copy{From,To}Routines to keep required callback minimum.
> 
> And set them in cstate while we are in the Start routine, right?

I imagined that it's done around the following part:

@@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
/* Extract options from the statement node tree */
ProcessCopyOptions(pstate, >opts, true /* is_from */ , options);
 
+   /* Set format routine */
+   cstate->routine = CopyFromGetRoutine(cstate->opts);
+
/* Process the target relation */
cstate->rel = rel;
 

Example1:

/* Set format routine */
cstate->routine = CopyFromGetRoutine(cstate->opts);
if (!cstate->opts.binary)
if (cstate->opts.csv_mode)
cstate->copy_read_attributes = CopyReadAttributesCSV;
else
cstate->copy_read_attributes = CopyReadAttributesText;

Example2:

static void
CopyFromSetRoutine(CopyFromState cstate)
{
if (cstate->opts.csv_mode)
{
cstate->routine = 
cstate->copy_read_attributes = CopyReadAttributesCSV;
}
else if (cstate.binary)
cstate->routine = 
else
{
cstate->routine = 
cstate->copy_read_attributes = CopyReadAttributesText;
}
}

BeginCopyFrom()
{
/* Set format routine */
CopyFromSetRoutine(cstate);
}


But I don't object your original approach. If we have the
extra callbacks in Copy{From,To}Routines, I just don't use
them for my custom format extension.

>> What is the better next action for us? Do you want to
>> complete the WIP v11 patch set by yourself (and commit it)?
>> Or should I take over it?
> 
> I was planning to work on that, but wanted to be sure how you felt
> about the problem with text and csv first.

OK.
My opinion is the above. I have an idea how to implement it
but it's not a strong idea. You can choose whichever you like.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-02 Thread Michael Paquier

On Fri, Feb 02, 2024 at 04:33:19PM +0900, Sutou Kouhei wrote:
> Hi,
> 
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 2 Feb 2024 15:21:31 +0900,
>   Michael Paquier  wrote:
> 
> > I have done a review of v10, see v11 attached which is still WIP, with
> > the patches for COPY TO and COPY FROM merged together.  Note that I'm
> > thinking to merge them into a single commit.
> 
> OK. I don't have a strong opinion for commit unit.
> 
> > @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
> >  boolconvert_selectively;/* do selective binary conversion? 
> > */
> >  CopyOnErrorChoice on_error; /* what to do when error happened */
> >  List   *convert_select; /* list of column names (can be NIL) */
> > +constCopyToRoutine *to_routine;/* callback routines for 
> > COPY TO */
> >  } CopyFormatOptions;
> > 
> > Adding the routines to the structure for the format options is in my
> > opinion incorrect.  The elements of this structure are first processed
> > in the option deparsing path, and then we need to use the options to
> > guess which routines we need.
> 
> This was discussed with Sawada-san a bit before. [1][2]
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf%2BgXEk9Mg%40mail.gmail.com#bfd19262d261c67058fdb8d64e6a723c
> [2] 
> https://www.postgresql.org/message-id/flat/20240130.144531.1257430878438173740.kou%40clear-code.com#fc55392d77f400fc74e42686fe7e348a
> 
> I kept the routines in CopyFormatOptions for custom option
> processing. But I should have not cared about it in this
> patch set because this patch set doesn't include custom
> option processing.

One idea I was considering is whether we should use a special value in
the "format" DefElem, say "custom:$my_custom_format" where it would be
possible to bypass the formay check when processing options and find
the routines after processing all the options.  I'm not wedded to
that, but attaching the routines to the state data is IMO the correct
thing, because this has nothing to do with CopyFormatOptions.

> So I'm OK that we move the routines to
> Copy{From,To}StateData.

Okay.

>> copyapi.h needs more documentation, like what is expected for
>> extension developers when using these, what are the arguments, etc.  I
>> have added what I had in mind for now.
> 
> Thanks! I'm not good at writing documentation in English...

No worries.

> I'm OK with the approach. But how about adding the extra
> callbacks to Copy{From,To}StateData not
> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
> CopyFromStateData::data_source_cb? They are only needed for
> "text" and "csv". So we don't need to add them to
> Copy{From,To}Routines to keep required callback minimum.

And set them in cstate while we are in the Start routine, right?  Hmm.
Why not..  That would get rid of the multiples layers v11 has, which
is my pain point, and we have many fields in cstate that are already
used on a per-format basis.

> What is the better next action for us? Do you want to
> complete the WIP v11 patch set by yourself (and commit it)?
> Or should I take over it?

I was planning to work on that, but wanted to be sure how you felt
about the problem with text and csv first.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 15:27:15 +0800,
  Junwang Zhao  wrote:

> I agree CopyToRoutine should be placed into CopyToStateData, but
> why set it after ProcessCopyOptions, the implementation of
> CopyToGetRoutine doesn't make sense if we want to support custom
> format in the future.
> 
> Seems the refactor of v11 only considered performance but not
> *extendable copy format*.

Right.
We focus on performance for now. And then we will focus on
extendability. [1]

[1] 
https://www.postgresql.org/message-id/flat/20240130.171511.2014195814665030502.kou%40clear-code.com#757a48c273f140081656ec8eb69f502b

> If V7 and V10 have no performance reduction, then I think V6 is also
> good with performance, since most of the time goes to CopyToOneRow
> and CopyFromOneRow.

Don't worry. I'll re-submit changes in the v6 patch set
again after the current patch set that focuses on
performance is merged.

> I just think we should take the *extendable* into consideration at
> the beginning.

Introducing Copy{To,From}Routine is also valuable for
extendability. We can improve extendability later. Let's
focus on only performance for now to introduce
Copy{To,From}Routine.

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 15:21:31 +0900,
  Michael Paquier  wrote:

> I have done a review of v10, see v11 attached which is still WIP, with
> the patches for COPY TO and COPY FROM merged together.  Note that I'm
> thinking to merge them into a single commit.

OK. I don't have a strong opinion for commit unit.

> @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
>  boolconvert_selectively;/* do selective binary conversion? */
>  CopyOnErrorChoice on_error; /* what to do when error happened */
>  List   *convert_select; /* list of column names (can be NIL) */
> +constCopyToRoutine *to_routine;/* callback routines for COPY 
> TO */
>  } CopyFormatOptions;
> 
> Adding the routines to the structure for the format options is in my
> opinion incorrect.  The elements of this structure are first processed
> in the option deparsing path, and then we need to use the options to
> guess which routines we need.

This was discussed with Sawada-san a bit before. [1][2]

[1] 
https://www.postgresql.org/message-id/flat/CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf%2BgXEk9Mg%40mail.gmail.com#bfd19262d261c67058fdb8d64e6a723c
[2] 
https://www.postgresql.org/message-id/flat/20240130.144531.1257430878438173740.kou%40clear-code.com#fc55392d77f400fc74e42686fe7e348a

I kept the routines in CopyFormatOptions for custom option
processing. But I should have not cared about it in this
patch set because this patch set doesn't include custom
option processing.

So I'm OK that we move the routines to
Copy{From,To}StateData.

> This also led to a strange separation with
> ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
> the hole in-between.

They also for custom option processing. We don't need to
care about them in this patch set too.

> copyapi.h needs more documentation, like what is expected for
> extension developers when using these, what are the arguments, etc.  I
> have added what I had in mind for now.

Thanks! I'm not good at writing documentation in English...

> +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, 
> int m);
> 
> CopyReadAttributes and PostpareColumnValue are also callbacks specific
> to text and csv, except that they are used within the per-row
> callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
> It seems to me that it would be less confusing to store pointers to
> them in the routine structures, where the final picture involves not
> having multiple layers of APIs like CopyToCSVStart,
> CopyAttributeOutTextValue, etc.  These *have* to be documented
> properly in copyapi.h, and this is much easier now that cstate stores
> the routine pointers.  That would also make simpler function stacks.
> Note that I have not changed that in the v11 attached.
> 
> This business with the extra callbacks required for csv and text is my
> main point of contention, but I'd be OK once the model of the APIs is
> more linear, with everything in Copy{From,To}State.  The changes would
> be rather simple, and I'd be OK to put my hands on it.  Just,
> Sutou-san, would you agree with my last point about these extra
> callbacks?

I'm OK with the approach. But how about adding the extra
callbacks to Copy{From,To}StateData not
Copy{From,To}Routines like CopyToStateData::data_dest_cb and
CopyFromStateData::data_source_cb? They are only needed for
"text" and "csv". So we don't need to add them to
Copy{From,To}Routines to keep required callback minimum.

What is the better next action for us? Do you want to
complete the WIP v11 patch set by yourself (and commit it)?
Or should I take over it?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Junwang Zhao

On Fri, Feb 2, 2024 at 2:21 PM Michael Paquier  wrote:
>
> On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> > Thanks. It'll help us.
>
> I have done a review of v10, see v11 attached which is still WIP, with
> the patches for COPY TO and COPY FROM merged together.  Note that I'm
> thinking to merge them into a single commit.
>
> @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
>  boolconvert_selectively;/* do selective binary conversion? */
>  CopyOnErrorChoice on_error; /* what to do when error happened */
>  List   *convert_select; /* list of column names (can be NIL) */
> +constCopyToRoutine *to_routine;/* callback routines for COPY 
> TO */
>  } CopyFormatOptions;
>
> Adding the routines to the structure for the format options is in my
> opinion incorrect.  The elements of this structure are first processed
> in the option deparsing path, and then we need to use the options to
> guess which routines we need.  A more natural location is cstate
> itself, so as the pointer to the routines is isolated within copyto.c

I agree CopyToRoutine should be placed into CopyToStateData, but
why set it after ProcessCopyOptions, the implementation of
CopyToGetRoutine doesn't make sense if we want to support custom
format in the future.

Seems the refactor of v11 only considered performance but not
*extendable copy format*.

> and copyfrom_internal.h.  My point is: the routines are an
> implementation detail that the centralized copy.c has no need to know
> about.  This also led to a strange separation with
> ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
> the hole in-between.
>
> The separation between cstate and the format-related fields could be
> much better, though I am not sure if it is worth doing as it
> introduces more duplication.  For example, max_fields and raw_fields
> are specific to text and csv, while binary does not care much.
> Perhaps this is just useful to be for custom formats.

I think those can be placed in format specific fields by utilizing the opaque
space, but yeah, this will introduce duplication.

>
> copyapi.h needs more documentation, like what is expected for
> extension developers when using these, what are the arguments, etc.  I
> have added what I had in mind for now.
>
> +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, 
> int m);
>
> CopyReadAttributes and PostpareColumnValue are also callbacks specific
> to text and csv, except that they are used within the per-row
> callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
> It seems to me that it would be less confusing to store pointers to
> them in the routine structures, where the final picture involves not
> having multiple layers of APIs like CopyToCSVStart,
> CopyAttributeOutTextValue, etc.  These *have* to be documented
> properly in copyapi.h, and this is much easier now that cstate stores
> the routine pointers.  That would also make simpler function stacks.
> Note that I have not changed that in the v11 attached.
>
> This business with the extra callbacks required for csv and text is my
> main point of contention, but I'd be OK once the model of the APIs is
> more linear, with everything in Copy{From,To}State.  The changes would
> be rather simple, and I'd be OK to put my hands on it.  Just,
> Sutou-san, would you agree with my last point about these extra
> callbacks?
> --
> Michael

If V7 and V10 have no performance reduction, then I think V6 is also
good with performance, since most of the time goes to CopyToOneRow
and CopyFromOneRow.

I just think we should take the *extendable* into consideration at
the beginning.

-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Michael Paquier

On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> Thanks. It'll help us.

I have done a review of v10, see v11 attached which is still WIP, with
the patches for COPY TO and COPY FROM merged together.  Note that I'm
thinking to merge them into a single commit.

@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
 boolconvert_selectively;/* do selective binary conversion? */
 CopyOnErrorChoice on_error; /* what to do when error happened */
 List   *convert_select; /* list of column names (can be NIL) */
+constCopyToRoutine *to_routine;/* callback routines for COPY 
TO */
 } CopyFormatOptions;

Adding the routines to the structure for the format options is in my
opinion incorrect.  The elements of this structure are first processed
in the option deparsing path, and then we need to use the options to
guess which routines we need.  A more natural location is cstate
itself, so as the pointer to the routines is isolated within copyto.c
and copyfrom_internal.h.  My point is: the routines are an
implementation detail that the centralized copy.c has no need to know
about.  This also led to a strange separation with
ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
the hole in-between.

The separation between cstate and the format-related fields could be
much better, though I am not sure if it is worth doing as it
introduces more duplication.  For example, max_fields and raw_fields
are specific to text and csv, while binary does not care much.
Perhaps this is just useful to be for custom formats.

copyapi.h needs more documentation, like what is expected for
extension developers when using these, what are the arguments, etc.  I
have added what I had in mind for now.

+typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int 
m);

CopyReadAttributes and PostpareColumnValue are also callbacks specific
to text and csv, except that they are used within the per-row
callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
It seems to me that it would be less confusing to store pointers to
them in the routine structures, where the final picture involves not
having multiple layers of APIs like CopyToCSVStart,
CopyAttributeOutTextValue, etc.  These *have* to be documented
properly in copyapi.h, and this is much easier now that cstate stores
the routine pointers.  That would also make simpler function stacks.
Note that I have not changed that in the v11 attached.

This business with the extra callbacks required for csv and text is my
main point of contention, but I'd be OK once the model of the APIs is
more linear, with everything in Copy{From,To}State.  The changes would
be rather simple, and I'd be OK to put my hands on it.  Just,
Sutou-san, would you agree with my last point about these extra
callbacks?
--
Michael
From 7aca58d0f7adfd146d287059b1d11b47acdfa758 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Wed, 31 Jan 2024 13:37:02 +0900
Subject: [PATCH v11] Extract COPY FROM/TO format implementations

This doesn't change the current behavior.  This just introduces a set of
copy routines called CopyFromRoutine and CopyToRoutine, which just has
function pointers of format implementation like TupleTableSlotOps, and
use it for existing "text", "csv" and "binary" format implementations.
There are plans to extend that more in the future with custom formats.

This improves performance when a relation has many attributes as this
eliminates format-based if branches.  The more the attributes, the
better the performance gain.  Blah add some numbers.
---
 src/include/commands/copyapi.h   |  82 
 src/include/commands/copyfrom_internal.h |   8 +
 src/backend/commands/copyfrom.c  | 219 +++---
 src/backend/commands/copyfromparse.c | 439 
 src/backend/commands/copyto.c| 496 ---
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 870 insertions(+), 376 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00..3c0a6ba7a1
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,82 @@
+/*-
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.   Note this

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Michael Paquier

On Fri, Feb 02, 2024 at 06:51:02AM +0900, Michael Paquier wrote:
> I am going to try to plug in some rusage() calls in the backend for
> the COPY paths.  I hope that gives more precision about the backend
> activity.  I'll post that with more numbers.

And here they are with log_statement_stats enabled to get rusage() fot
these queries:
 test |  user_s  | system_s | elapsed_s 
--+--+--+---
 head_to_bin_1col | 1.639761 | 0.007998 |  1.647762
 v7_to_bin_1col   | 1.645499 | 0.004003 |  1.649498
 v10_to_bin_1col  | 1.639466 | 0.004008 |  1.643488

 head_to_bin_10col| 7.486369 | 0.056007 |  7.542485
 v7_to_bin_10col  | 7.314341 | 0.039990 |  7.354743
 v10_to_bin_10col | 7.329355 | 0.052007 |  7.381408

 head_to_text_1col| 1.581140 | 0.012000 |  1.593166
 v7_to_text_1col  | 1.615441 | 0.003992 |  1.619446
 v10_to_text_1col | 1.613443 | 0.00 |  1.613454

 head_to_text_10col   | 5.897014 | 0.011990 |  5.909063
 v7_to_text_10col | 5.722872 | 0.016014 |  5.738979
 v10_to_text_10col| 5.762286 | 0.011993 |  5.774265

 head_from_bin_1col   | 1.524038 | 0.02 |  1.544046
 v7_from_bin_1col | 1.551367 | 0.016015 |  1.567408
 v10_from_bin_1col| 1.560087 | 0.016001 |  1.576115

 head_from_bin_10col  | 5.238444 | 0.139993 |  5.378595
 v7_from_bin_10col| 5.170503 | 0.076021 |  5.246588
 v10_from_bin_10col   | 5.106496 | 0.112020 |  5.218565

 head_from_text_1col  | 1.664124 | 0.003998 |  1.668172
 v7_from_text_1col| 1.720616 | 0.007990 |  1.728617
 v10_from_text_1col   | 1.683950 | 0.007990 |  1.692098

 head_from_text_10col | 4.859651 | 0.015996 |  4.875747
 v7_from_text_10col   | 4.775975 | 0.032000 |  4.808051
 v10_from_text_10col  | 4.737512 | 0.028012 |  4.765522
(24 rows)

I'm looking at this table, and what I can see is still a lot of
variance in the tests with tables involving 1 attribute.  However, a
second thing stands out to me here: there is a speedup with the
10-attribute case for all both COPY FROM and COPY TO, and both
formats.  The data posted at [1] is showing me the same trend.  In
short, let's move on with this split refactoring with the per-row
callbacks.  That clearly shows benefits.

[1] https://www.postgresql.org/message-id/zbr6piwuvhdtf...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 06:51:02 +0900,
  Michael Paquier  wrote:

> On Fri, Feb 02, 2024 at 12:19:51AM +0900, Sutou Kouhei wrote:
>> Here are some numbers on my local machine (Note that my
>> local machine isn't suitable for benchmark as I said
>> before. Each number is median of "\watch 15" results):
>>>
>> I'll measure again on my local machine later. I'll stop
>> other processes such as Web browser, editor and so on as
>> much as possible when I do.
> 
> Thanks for compiling some numbers.  This is showing a lot of variance.
> Expecially, these two lines in table 2 are showing surprising results
> for v7:
>   direction format  n_columns master v7v10
>fromcsv  1917.973   1695.401871.991
>from binary  1841.104   1422.012820.786

Here are more numbers:

1:
 direction format  n_columns master v7v10
to   text  1   1053.844978.998956.575
tocsv  1   1091.316   1020.584   1098.314
to binary  1   1034.685969.224980.458
to   text 10   4216.264   3886.515   4111.417
tocsv 10   4649.228   4530.882   4682.988
to binary 10   4219.2284189.99   4211.942
  from   text  1851.697896.968890.458
  fromcsv  1890.229936.231 887.15
  from binary  1784.407 817.07938.736
  from   text 10   2549.056   2233.899   2630.892
  fromcsv 10   2809.441   2868.411   2895.196
  from binary 10   2985.674   3027.522 3397.5

2:
 direction format  n_columns master v7v10
to   text  1   1013.764   1011.968940.855
tocsv  1   1060.431   1065.4681040.68
to binary  1   1013.652   1009.956965.675
to   text 10   4411.484   4031.571   3896.836
tocsv 10   4739.6254715.81   4631.002
to binary 10   4374.077   4357.942   4227.215
  from   text  1955.078922.346866.222
  fromcsv  1   1040.717986.524905.657
  from binary  1849.316864.859833.152
  from   text 10   2703.209   2361.651   2533.992
  fromcsv 102990.35   3059.167   2930.632
  from binary 10   3008.375   3368.714   3055.723

3:
 direction format  n_columns master v7v10
to   text  1   1084.756   1003.822994.409
tocsv  1 1092.4   1062.536   1079.027
to binary  1   1046.774994.168993.633
to   text 104363.51   3978.205   4124.359
tocsv 10   4866.762   4616.001   4715.052
to binary 10   4382.412   4363.269   4213.456
  from   text  1852.976907.315860.749
  fromcsv  1925.187962.632897.833
  from binary  1824.997897.046828.231
  from   text 102591.07   2358.541   2540.431
  fromcsv 10   2907.033   3018.486   2915.997
  from binary 10   3069.0273209.21   3119.128

Other processes are stopped while I measure them. But I'm
not sure these numbers are more reliable than before...

> I am going to try to plug in some rusage() calls in the backend for
> the COPY paths.  I hope that gives more precision about the backend
> activity.  I'll post that with more numbers.

Thanks. It'll help us.


-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Michael Paquier

On Fri, Feb 02, 2024 at 12:19:51AM +0900, Sutou Kouhei wrote:
> Here are some numbers on my local machine (Note that my
> local machine isn't suitable for benchmark as I said
> before. Each number is median of "\watch 15" results):
>>
> I'll measure again on my local machine later. I'll stop
> other processes such as Web browser, editor and so on as
> much as possible when I do.

Thanks for compiling some numbers.  This is showing a lot of variance.
Expecially, these two lines in table 2 are showing surprising results
for v7:
  direction format  n_columns master v7v10
   fromcsv  1917.973   1695.401871.991
   from binary  1841.104   1422.012820.786

I am going to try to plug in some rusage() calls in the backend for
the COPY paths.  I hope that gives more precision about the backend
activity.  I'll post that with more numbers.
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

Thanks for preparing benchmark.

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 1 Feb 2024 12:49:59 +0900,
  Michael Paquier  wrote:

> On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote:
>> And here are the results I get for text and binary (ms, average of 15
>> queries after discarding the three highest and three lowest values):
>>   test   | master |  v7  | v10  
>> -++--+--
>>  from_bin_1col   | 1575   | 1546 | 1575
>>  from_bin_10col  | 5364   | 5208 | 5230
>>  from_text_1col  | 1690   | 1715 | 1684
>>  from_text_10col | 4875   | 4793 | 4757
>>  to_bin_1col | 1717   | 1730 | 1731
>>  to_bin_10col| 7728   | 7707 | 7513
>>  to_text_1col| 1710   | 1730 | 1698
>>  to_text_10col   | 5998   | 5960 | 5987
> 
> Here are some numbers from a second local machine:
>   test   | master |  v7  | v10  
> -++--+--
>  from_bin_1col   | 508| 467  | 461
>  from_bin_10col  | 2192   | 2083 | 2098
>  from_text_1col  | 510| 499  | 517
>  from_text_10col | 1970   | 1678 | 1654
>  to_bin_1col | 575| 577  | 573
>  to_bin_10col| 2680   | 2678 | 2722
>  to_text_1col| 516| 506  | 527
>  to_text_10col   | 2250   | 2245 | 2235
> 
> This is confirming a speedup with COPY FROM for both text and binary,
> with more impact with a larger number of attributes.  That's harder to
> conclude about COPY TO in both cases, but at least I'm not seeing any
> regression even with some variance caused by what looks like noise.
> We need more numbers from more people.  Sutou-san or Sawada-san, or
> any volunteers?

Here are some numbers on my local machine (Note that my
local machine isn't suitable for benchmark as I said
before. Each number is median of "\watch 15" results):

1:
 direction format  n_columns master v7v10
to   text  1   1077.254   1016.953   1028.434
tocsv  11079.88   1055.5451053.95
to binary  1   1051.2471033.931003.44
to   text 10   4373.168   3980.4423955.94
tocsv 10   4753.842 4719.2   4677.643
to binary 10   4598.374   4431.238   4285.757
  from   text  1875.729916.526869.283
  fromcsv  1909.355   1001.277918.655
  from binary  1872.943907.778859.433
  from   text 10   2594.429   2345.292   2587.603
  fromcsv 10   2968.972   3039.544   2964.468
  from binary 103072.01   3109.267   3093.983

2:
 direction format  n_columns master v7v10
to   text  1   1061.908988.768978.291
tocsv  1   1095.109   1037.015   1041.613
to binary  1   1076.992   1000.212983.318
to   text 10   4336.517   3901.833   3841.789
tocsv 10   4679.411   4640.975   4570.774
to binary 104465.04   4508.063   4261.749
  from   text  1866.689 917.54830.417
  fromcsv  1917.973   1695.401871.991
  from binary  1841.104   1422.012820.786
  from   text 10   2523.607   3147.738   2517.505
  fromcsv 10   2917.018   3042.685   2950.338
  from binary 10   2998.051   3128.542   3018.954

3:
 direction format  n_columns master v7v10
to   text  1   1021.168   1031.183962.945
tocsv  1   1076.549   1069.661   1060.258
to binary  1   1024.611   1022.143975.768
to   text 104327.24   3936.703   4049.893
tocsv 10   4620.436   4531.676   4685.672
to binary 10   4457.165   4390.992   4301.463
  from   text  1887.532907.365888.892
  fromcsv  1945.1671012.29895.921
  from binary  1 853.06854.652849.661
  from   text 10   2660.509   2304.256   2527.071
  fromcsv 10   2913.644   2968.204   2935.081
  from binary 10   3020.812   3081.162   3090.803

I'll measure again on my local machine later. I'll stop
other processes such as Web browser, editor and so on as
much as possible when I do.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-31 Thread Junwang Zhao

On Thu, Feb 1, 2024 at 11:56 AM Michael Paquier  wrote:
>
> On Thu, Feb 01, 2024 at 11:43:07AM +0800, Junwang Zhao wrote:
> > The first 6 rounds are like 10% better than the later 9 rounds, is this 
> > normal?
>
> Even with HEAD?  Perhaps you have some OS cache eviction in play here?
> FWIW, I'm not seeing any of that with longer runs after 7~ tries in a
> loop of 15.

Yeah, with HEAD. I'm on ubuntu 22.04, I did not change any gucs, maybe I should
set a higher shared_buffers? But I dought that's related ;(


> --
> Michael



-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-31 Thread Michael Paquier

On Thu, Feb 01, 2024 at 11:43:07AM +0800, Junwang Zhao wrote:
> The first 6 rounds are like 10% better than the later 9 rounds, is this 
> normal?

Even with HEAD?  Perhaps you have some OS cache eviction in play here?
FWIW, I'm not seeing any of that with longer runs after 7~ tries in a
loop of 15.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-31 Thread Michael Paquier

On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote:
> And here are the results I get for text and binary (ms, average of 15
> queries after discarding the three highest and three lowest values):
>   test   | master |  v7  | v10  
> -++--+--
>  from_bin_1col   | 1575   | 1546 | 1575
>  from_bin_10col  | 5364   | 5208 | 5230
>  from_text_1col  | 1690   | 1715 | 1684
>  from_text_10col | 4875   | 4793 | 4757
>  to_bin_1col | 1717   | 1730 | 1731
>  to_bin_10col| 7728   | 7707 | 7513
>  to_text_1col| 1710   | 1730 | 1698
>  to_text_10col   | 5998   | 5960 | 5987

Here are some numbers from a second local machine:
  test   | master |  v7  | v10  
-++--+--
 from_bin_1col   | 508| 467  | 461
 from_bin_10col  | 2192   | 2083 | 2098
 from_text_1col  | 510| 499  | 517
 from_text_10col | 1970   | 1678 | 1654
 to_bin_1col | 575| 577  | 573
 to_bin_10col| 2680   | 2678 | 2722
 to_text_1col| 516| 506  | 527
 to_text_10col   | 2250   | 2245 | 2235

This is confirming a speedup with COPY FROM for both text and binary,
with more impact with a larger number of attributes.  That's harder to
conclude about COPY TO in both cases, but at least I'm not seeing any
regression even with some variance caused by what looks like noise.
We need more numbers from more people.  Sutou-san or Sawada-san, or
any volunteers?
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-31 Thread Junwang Zhao

Hi Michael,

On Thu, Feb 1, 2024 at 9:58 AM Michael Paquier  wrote:
>
> On Wed, Jan 31, 2024 at 02:39:54PM +0900, Michael Paquier wrote:
> > Thanks, I'm looking into that now.
>
> I have much to say about the patch, but for now I have begun running
> some performance tests using the patches, because this thread won't
> get far until we are sure that the callbacks do not impact performance
> in some kind of worst-case scenario.  First, here is what I used to
> setup a set of tables used for COPY FROM and COPY TO (requires [1] to
> feed COPY FROM's data to the void, and note that default values is to
> have a strict control on the size of the StringInfos used in the copy
> paths):
> CREATE EXTENSION blackhole_am;
> CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
> RETURNS VOID AS
> $func$
> DECLARE
>   query text;
> BEGIN
>   query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
>   FOR i IN 1..num_cols LOOP
> query := query || 'a_' || i::text || ' int default 1';
> IF i != num_cols THEN
>   query := query || ', ';
> END IF;
>   END LOOP;
>   query := query || ')';
>   EXECUTE format(query);
> END
> $func$ LANGUAGE plpgsql;
> -- Tables used for COPY TO
> SELECT create_table_cols ('to_tab_1', 1);
> SELECT create_table_cols ('to_tab_10', 10);
> INSERT INTO to_tab_1 SELECT FROM generate_series(1, 1000);
> INSERT INTO to_tab_10 SELECT FROM generate_series(1, 1000);
> -- Data for COPY FROM
> COPY to_tab_1 TO '/tmp/to_tab_1.bin' WITH (format binary);
> COPY to_tab_10 TO '/tmp/to_tab_10.bin' WITH (format binary);
> COPY to_tab_1 TO '/tmp/to_tab_1.txt' WITH (format text);
> COPY to_tab_10 TO '/tmp/to_tab_10.txt' WITH (format text);
> -- Tables used for COPY FROM
> SELECT create_table_cols ('from_tab_1', 1);
> SELECT create_table_cols ('from_tab_10', 10);
> ALTER TABLE from_tab_1 SET ACCESS METHOD blackhole_am;
> ALTER TABLE from_tab_10 SET ACCESS METHOD blackhole_am;
>
> Then I have run a set of tests using HEAD, v7 and v10 with queries
> like that (adapt them depending on the format and table):
> COPY to_tab_1 TO '/dev/null' WITH (FORMAT text) \watch count=5
> SET client_min_messages TO error; -- for blackhole_am
> COPY from_tab_1 FROM '/tmp/to_tab_1.txt' with (FORMAT 'text') \watch count=5
> COPY from_tab_1 FROM '/tmp/to_tab_1.bin' with (FORMAT 'binary') \watch count=5
>
> All the patches have been compiled with -O2, without assertions, etc.
> Postgres is run in tmpfs mode, on scissors, without fsync.  Unlogged
> tables help a bit in focusing on the execution paths as we don't care
> about WAL, of course.  I have also included v7 in the test of tests,
> as this version uses more simple per-row callbacks.
>
> And here are the results I get for text and binary (ms, average of 15
> queries after discarding the three highest and three lowest values):
>   test   | master |  v7  | v10
> -++--+--
>  from_bin_1col   | 1575   | 1546 | 1575
>  from_bin_10col  | 5364   | 5208 | 5230
>  from_text_1col  | 1690   | 1715 | 1684
>  from_text_10col | 4875   | 4793 | 4757
>  to_bin_1col | 1717   | 1730 | 1731
>  to_bin_10col| 7728   | 7707 | 7513
>  to_text_1col| 1710   | 1730 | 1698
>  to_text_10col   | 5998   | 5960 | 5987
>
> I am getting an interesting trend here in terms of a speedup between
> HEAD and the patches with a table that has 10 attributes filled with
> integers, especially for binary and text with COPY FROM.  COPY TO
> binary also gets nice numbers, while text looks rather stable.  Hmm.
>
> These were on my buildfarm animal, but we need to be more confident
> about all this.  Could more people run these tests?  I am going to do
> a second session on a local machine I have at hand and see what
> happens.  Will publish the numbers here, the method will be the same.
>
> [1]: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
> --
> Michael

I'm running the benchmark, but I got some strong numbers:

postgres=# \timing
Timing is on.
postgres=# COPY to_tab_10 TO '/dev/null' WITH (FORMAT binary) \watch count=15
COPY 1000
Time: 3168.497 ms (00:03.168)
COPY 1000
Time: 3255.464 ms (00:03.255)
COPY 1000
Time: 3270.625 ms (00:03.271)
COPY 1000
Time: 3285.112 ms (00:03.285)
COPY 1000
Time: 3322.304 ms (00:03.322)
COPY 1000
Time: 3341.328 ms (00:03.341)
COPY 1000
Time: 3621.564 ms (00:03.622)
COPY 1000
Time: 3700.911 ms (00:03.701)
COPY 1000
Time: 3717.992 ms (00:03.718)
COPY 1000
Time: 3708.350 ms (00:03.708)
COPY 1000
Time: 3704.367 ms (00:03.704)
COPY 1000
Time: 3724.281 ms (00:03.724)
COPY 1000
Time: 3703.335 ms (00:03.703)
COPY 1000
Time: 3728.629 ms (00:03.729)
COPY 1000
Time: 3758.135 ms (00:03.758)

The first 6 rounds are like 10% better than the later 9 rounds, is this normal?

-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-31 Thread Michael Paquier

On Wed, Jan 31, 2024 at 02:39:54PM +0900, Michael Paquier wrote:
> Thanks, I'm looking into that now.

I have much to say about the patch, but for now I have begun running
some performance tests using the patches, because this thread won't
get far until we are sure that the callbacks do not impact performance
in some kind of worst-case scenario.  First, here is what I used to
setup a set of tables used for COPY FROM and COPY TO (requires [1] to
feed COPY FROM's data to the void, and note that default values is to
have a strict control on the size of the StringInfos used in the copy
paths):
CREATE EXTENSION blackhole_am;
CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
  query text;
BEGIN
  query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
query := query || 'a_' || i::text || ' int default 1';
IF i != num_cols THEN
  query := query || ', ';
END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;
-- Tables used for COPY TO
SELECT create_table_cols ('to_tab_1', 1);
SELECT create_table_cols ('to_tab_10', 10);
INSERT INTO to_tab_1 SELECT FROM generate_series(1, 1000);
INSERT INTO to_tab_10 SELECT FROM generate_series(1, 1000);
-- Data for COPY FROM
COPY to_tab_1 TO '/tmp/to_tab_1.bin' WITH (format binary);
COPY to_tab_10 TO '/tmp/to_tab_10.bin' WITH (format binary);
COPY to_tab_1 TO '/tmp/to_tab_1.txt' WITH (format text);
COPY to_tab_10 TO '/tmp/to_tab_10.txt' WITH (format text);
-- Tables used for COPY FROM
SELECT create_table_cols ('from_tab_1', 1);
SELECT create_table_cols ('from_tab_10', 10);
ALTER TABLE from_tab_1 SET ACCESS METHOD blackhole_am;
ALTER TABLE from_tab_10 SET ACCESS METHOD blackhole_am;

Then I have run a set of tests using HEAD, v7 and v10 with queries
like that (adapt them depending on the format and table):
COPY to_tab_1 TO '/dev/null' WITH (FORMAT text) \watch count=5
SET client_min_messages TO error; -- for blackhole_am
COPY from_tab_1 FROM '/tmp/to_tab_1.txt' with (FORMAT 'text') \watch count=5
COPY from_tab_1 FROM '/tmp/to_tab_1.bin' with (FORMAT 'binary') \watch count=5

All the patches have been compiled with -O2, without assertions, etc.
Postgres is run in tmpfs mode, on scissors, without fsync.  Unlogged
tables help a bit in focusing on the execution paths as we don't care
about WAL, of course.  I have also included v7 in the test of tests,
as this version uses more simple per-row callbacks.

And here are the results I get for text and binary (ms, average of 15
queries after discarding the three highest and three lowest values):
  test   | master |  v7  | v10  
-++--+--
 from_bin_1col   | 1575   | 1546 | 1575
 from_bin_10col  | 5364   | 5208 | 5230
 from_text_1col  | 1690   | 1715 | 1684
 from_text_10col | 4875   | 4793 | 4757
 to_bin_1col | 1717   | 1730 | 1731
 to_bin_10col| 7728   | 7707 | 7513
 to_text_1col| 1710   | 1730 | 1698
 to_text_10col   | 5998   | 5960 | 5987

I am getting an interesting trend here in terms of a speedup between
HEAD and the patches with a table that has 10 attributes filled with
integers, especially for binary and text with COPY FROM.  COPY TO
binary also gets nice numbers, while text looks rather stable.  Hmm.

These were on my buildfarm animal, but we need to be more confident
about all this.  Could more people run these tests?  I am going to do
a second session on a local machine I have at hand and see what
happens.  Will publish the numbers here, the method will be the same.

[1]: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Michael Paquier

On Wed, Jan 31, 2024 at 02:11:22PM +0900, Sutou Kouhei wrote:
> Ah, yes. defel->location is used in later patches. For
> example, it's used when a COPY handler for the specified
> FORMAT isn't found.

I see.

> I've prepared the v10 patch set. Could you try this?

Thanks, I'm looking into that now.

> FYI: Here are Copy{From,To}Routine in the v10 patch set. I
> think that only Copy{From,To}OneRow are minimal callbacks
> for the performance gain. But can we keep Copy{From,To}Start
> and Copy{From,To}End for consistency? We can remove a few
> {csv_mode,binary} conditions by Copy{From,To}{Start,End}. It
> doesn't depend on the number of COPY target tuples. So they
> will not affect performance.

I think I'm OK to keep the start/end callbacks.  This makes the code
more consistent as a whole, as well.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 30 Jan 2024 17:37:35 +0900,
  Michael Paquier  wrote:

>> We use defel->location for an error message. (We don't need
>> to set location for the default "text" DefElem.)
> 
> Yeah, but you should not need to have this error in the paths that set
> the callback routines in opts_out if the same validation happens a few
> lines before, in copy.c.

Ah, yes. defel->location is used in later patches. For
example, it's used when a COPY handler for the specified
FORMAT isn't found.

>   I am really worried about the complexities
> this thread is getting into because we are trying to shape the
> callbacks in the most generic way possible based on *two* use cases.
> This is going to be a never-ending discussion.  I'd rather get some
> simple basics, and then we can discuss if tweaking the callbacks is
> really necessary or not.  Even after introducing the pg_proc lookups
> to get custom callbacks.

I understand your concern. Let's introduce minimal callbacks
as the first step. I think that we completed our design
discussion for this feature. We can choose minimal callbacks
based on the discussion.

> The custom options don't seem like an absolute strong
> requirement for the first shot with the callbacks or even the
> possibility to retrieve the callbacks from a function call.  I mean,
> you could provide some control with SET commands and a few GUCs, at
> least, even if that would be strange.  Manipulations with a list of
> DefElems is the intuitive way to have custom options at query level,
> but we also have to guess the set of callbacks from this list of
> DefElems coming from the query.  You see my point, I am not sure 
> if it would be the best thing to process twice the options, especially
> when it comes to decide if a DefElem should be valid or not depending
> on the callbacks used.  Or we could use a kind of "special" DefElem
> where we could store a set of key:value fed to a callback :)

Interesting. Let's remove custom options support from the
initial minimal callbacks.

>> Can I prepare the v10 patch set as "the v7 patch set" + "the
>> removal of the "if" checks in NextCopyFromRawFields()"?
>> (+ reverting opts_out->{csv_mode,binary} changes in
>> ProcessCopyOptions().)
> 
> Yep, if I got it that would make sense to me.  If you can do that,
> that would help quite a bit.  :)

I've prepared the v10 patch set. Could you try this?

Changes since the v7 patch set:

0001:

* Remove CopyToProcessOption() callback
* Remove CopyToGetFormat() callback
* Revert passing CopyToState to ProcessCopyOptions()
* Revert moving "opts_out->{csv_mode,binary} = true" to
  ProcessCopyOptionFormatTo()
* Change to receive "const char *format" instead "DefElem  *defel"
  by ProcessCopyOptionFormatTo()

0002:

* Remove CopyFromProcessOption() callback
* Remove CopyFromGetFormat() callback
* Change to receive "const char *format" instead "DefElem
  *defel" by ProcessCopyOptionFormatFrom()
* Remove "if (cstate->opts.csv_mode)" branches from
  NextCopyFromRawFields()

FYI: Here are Copy{From,To}Routine in the v10 patch set. I
think that only Copy{From,To}OneRow are minimal callbacks
for the performance gain. But can we keep Copy{From,To}Start
and Copy{From,To}End for consistency? We can remove a few
{csv_mode,binary} conditions by Copy{From,To}{Start,End}. It
doesn't depend on the number of COPY target tuples. So they
will not affect performance.

/* Routines for a COPY FROM format implementation. */
typedef struct CopyFromRoutine
{
/*
 * Called when COPY FROM is started. This will initialize something and
 * receive a header.
 */
void(*CopyFromStart) (CopyFromState cstate, TupleDesc 
tupDesc);

/* Copy one row. It returns false if no more tuples. */
bool(*CopyFromOneRow) (CopyFromState cstate, ExprContext 
*econtext, Datum *values, bool *nulls);

/* Called when COPY FROM is ended. This will finalize something. */
void(*CopyFromEnd) (CopyFromState cstate);
}   CopyFromRoutine;

/* Routines for a COPY TO format implementation. */
typedef struct CopyToRoutine
{
/* Called when COPY TO is started. This will send a header. */
void(*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);

/* Copy one row for COPY TO. */
void(*CopyToOneRow) (CopyToState cstate, TupleTableSlot 
*slot);

/* Called when COPY TO is ended. This will send a trailer. */
void(*CopyToEnd) (CopyToState cstate);
}   CopyToRoutine;

Thanks,
-- 
kou
>From

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Michael Paquier

On Tue, Jan 30, 2024 at 05:15:11PM +0900, Sutou Kouhei wrote:
> We use defel->location for an error message. (We don't need
> to set location for the default "text" DefElem.)

Yeah, but you should not need to have this error in the paths that set
the callback routines in opts_out if the same validation happens a few
lines before, in copy.c.

> Yes, sure.
> (We want to focus on the performance gains in the first
> patch set and then focus on extensibility again, right?)

Yep, exactly, the numbers are too good to just ignore.  I don't want
to hijack the thread, but I am really worried about the complexities
this thread is getting into because we are trying to shape the
callbacks in the most generic way possible based on *two* use cases.
This is going to be a never-ending discussion.  I'd rather get some
simple basics, and then we can discuss if tweaking the callbacks is
really necessary or not.  Even after introducing the pg_proc lookups
to get custom callbacks.

> For the purpose, I think that the v7 patch set is more
> suitable than the v9 patch set. The v7 patch set doesn't
> include Init() callbacks, custom options validation support
> or extra Copy{In,Out}Response support. But the v7 patch set
> misses the removal of the "if" checks in
> NextCopyFromRawFields() that exists in the v9 patch set. I'm
> not sure how much performance will improve by this but it
> may be worth a try.

Yeah..  The custom options don't seem like an absolute strong
requirement for the first shot with the callbacks or even the
possibility to retrieve the callbacks from a function call.  I mean,
you could provide some control with SET commands and a few GUCs, at
least, even if that would be strange.  Manipulations with a list of
DefElems is the intuitive way to have custom options at query level,
but we also have to guess the set of callbacks from this list of
DefElems coming from the query.  You see my point, I am not sure 
if it would be the best thing to process twice the options, especially
when it comes to decide if a DefElem should be valid or not depending
on the callbacks used.  Or we could use a kind of "special" DefElem
where we could store a set of key:value fed to a callback :)

> Can I prepare the v10 patch set as "the v7 patch set" + "the
> removal of the "if" checks in NextCopyFromRawFields()"?
> (+ reverting opts_out->{csv_mode,binary} changes in
> ProcessCopyOptions().)

Yep, if I got it that would make sense to me.  If you can do that,
that would help quite a bit.  :)
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 30 Jan 2024 16:20:54 +0900,
  Michael Paquier  wrote:

>>> +if (!format_specified)
>>> +/* Set the default format. */
>>> +ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
>>> +
>>> 
>>> I think we can pass "text" in this case instead of NULL. That way,
>>> ProcessCopyOptionFormatTo doesn't need to handle NULL case.
>> 
>> Yes, we can do it. But it needs a DefElem allocation. Is it
>> acceptable?
> 
> I don't think that there is a need for a DelElem at all here?

We use defel->location for an error message. (We don't need
to set location for the default "text" DefElem.)

>While I
> am OK with the choice of calling CopyToInit() in the
> ProcessCopyOption*() routines that exist to keep the set of callbacks
> local to copyto.c and copyfrom.c, I think that this should not bother
> about setting opts_out->csv_mode or opts_out->csv_mode but just set 
> the opts_out->{to,from}_routine callbacks.

OK. I'll keep opts_out->{csv_mode,binary} in copy.c.

>  Now, all the Init() callbacks are empty for the
> in-core callbacks, so I think that we should just remove it entirely
> for now.  Let's keep the core patch a maximum simple.  It is always
> possible to build on top of it depending on what people need.  It's
> been mentioned that JSON would want that, but this also proves that we
> just don't care about that for all the in-core callbacks, as well.  I
> would choose a minimalistic design for now.

OK. Let's remove Init() callbacks from the first patch set.

> I would be really tempted to put my hands on this patch to put into
> shape a minimal set of changes because I'm caring quite a lot about
> the performance gains reported with the removal of the "if" checks in
> the per-row callbacks, and that's one goal of this thread quite
> independent on the extensibility.  Sutou-san, would you be OK with
> that?

Yes, sure.
(We want to focus on the performance gains in the first
patch set and then focus on extensibility again, right?)

For the purpose, I think that the v7 patch set is more
suitable than the v9 patch set. The v7 patch set doesn't
include Init() callbacks, custom options validation support
or extra Copy{In,Out}Response support. But the v7 patch set
misses the removal of the "if" checks in
NextCopyFromRawFields() that exists in the v9 patch set. I'm
not sure how much performance will improve by this but it
may be worth a try.

Can I prepare the v10 patch set as "the v7 patch set" + "the
removal of the "if" checks in NextCopyFromRawFields()"?
(+ reverting opts_out->{csv_mode,binary} changes in
ProcessCopyOptions().)

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-29 Thread Michael Paquier

On Tue, Jan 30, 2024 at 02:45:31PM +0900, Sutou Kouhei wrote:
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 30 Jan 2024 11:11:59 +0900,
>   Masahiko Sawada  wrote:
> 
>> ---
>> +if (!format_specified)
>> +/* Set the default format. */
>> +ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
>> +
>> 
>> I think we can pass "text" in this case instead of NULL. That way,
>> ProcessCopyOptionFormatTo doesn't need to handle NULL case.
> 
> Yes, we can do it. But it needs a DefElem allocation. Is it
> acceptable?

I don't think that there is a need for a DelElem at all here?  While I
am OK with the choice of calling CopyToInit() in the
ProcessCopyOption*() routines that exist to keep the set of callbacks
local to copyto.c and copyfrom.c, I think that this should not bother
about setting opts_out->csv_mode or opts_out->csv_mode but just set 
the opts_out->{to,from}_routine callbacks.

>> +static void
>> +CopyToTextBasedInit(CopyToState cstate)
>> +{
>> +}
>> 
>> and
>> 
>> +static void
>> +CopyToBinaryInit(CopyToState cstate)
>> +{
>> +}
>> 
>> Do we really need separate callbacks for the same behavior? I think we
>> can have a common init function say CopyToBuitinInit() that does
>> nothing. Or we can make the init callback optional.

Keeping empty options does not strike as a bad idea, because this
forces extension developers to think about this code path rather than
just ignore it.  Now, all the Init() callbacks are empty for the
in-core callbacks, so I think that we should just remove it entirely
for now.  Let's keep the core patch a maximum simple.  It is always
possible to build on top of it depending on what people need.  It's
been mentioned that JSON would want that, but this also proves that we
just don't care about that for all the in-core callbacks, as well.  I
would choose a minimalistic design for now.

>> +/* Get info about the columns we need to process. */
>> +cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs *
>> sizeof(Fmgr\Info));
>> +foreach(cur, cstate->attnumlist)
>> +{
>> +intattnum = lfirst_int(cur);
>> +Oidout_func_oid;
>> +bool   isvarlena;
>> +Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
>> +
>> +getTypeOutputInfo(attr->atttypid, _func_oid, 
>> );
>> +fmgr_info(out_func_oid, >out_functions[attnum - 1]);
>> +}
>> 
>> Is preparing the out functions an extension's responsibility? I
>> thought the core could prepare them based on the overall format
>> specified by extensions, as long as the overall format matches the
>> actual data format to send. What do you think?
> 
> Hmm. I want to keep the preparation as an extension's
> responsibility. Because it's not needed for all formats. For
> example, Apache Arrow FORMAT doesn't need it. And JSON
> FORMAT doesn't need it too because it use
> composite_to_json().

I agree that it could be really useful for extensions to be able to
force that.  We already know that for the in-core formats we've cared
about being able to enforce the way data is handled in input and
output.

> It makes sense. I couldn't find the documentation when I
> wrote it but I found it now...:
> https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY
> 
> Is there recommended comment style to refer a documentation?
> "See doc/src/sgml/protocol.sgml for the CopyOutResponse
> message details" is OK?

There are a couple of places in the C code where we refer to SGML docs
when it comes to specific details, so using a method like that here to
avoid a duplication with the docs sounds sensible for me.

I would be really tempted to put my hands on this patch to put into
shape a minimal set of changes because I'm caring quite a lot about
the performance gains reported with the removal of the "if" checks in
the per-row callbacks, and that's one goal of this thread quite
independent on the extensibility.  Sutou-san, would you be OK with
that?
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-29 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 30 Jan 2024 11:11:59 +0900,
  Masahiko Sawada  wrote:

> ---
> +if (!format_specified)
> +/* Set the default format. */
> +ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
> +
> 
> I think we can pass "text" in this case instead of NULL. That way,
> ProcessCopyOptionFormatTo doesn't need to handle NULL case.

Yes, we can do it. But it needs a DefElem allocation. Is it
acceptable?

> We need curly brackets for this "if branch" as follows:
> 
> if (!format_specifed)
> {
> /* Set the default format. */
> ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
> }

Oh, sorry. I assumed that pgindent adjusts the style too.

> ---
> +/* Process not built-in options. */
> +foreach(option, unknown_options)
> +{
> +DefElem*defel = lfirst_node(DefElem, option);
> +bool   processed = false;
> +
> +if (!is_from)
> +processed =
> opts_out->to_routine->CopyToProcessOption(cstate, defel);
> +if (!processed)
> +ereport(ERROR,
> +(errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("option \"%s\" not 
> recognized",
> +defel->defname),
> + parser_errposition(pstate,
> defel->location)));
> +}
> +list_free(unknown_options);
> 
> I think we can check the duplicated options in the core as we discussed.

Oh, sorry. I missed the part. I'll implement it.

> ---
> +static void
> +CopyToTextBasedInit(CopyToState cstate)
> +{
> +}
> 
> and
> 
> +static void
> +CopyToBinaryInit(CopyToState cstate)
> +{
> +}
> 
> Do we really need separate callbacks for the same behavior? I think we
> can have a common init function say CopyToBuitinInit() that does
> nothing. Or we can make the init callback optional.
> 
> The same is true for process-option callback.

OK. I'll make them optional.

> ---
>  List  *convert_select; /* list of column names (can be NIL) */
> +const  CopyToRoutine *to_routine;  /* callback
> routines for COPY TO */
>  } CopyFormatOptions;
> 
> I think CopyToStateData is a better place to have CopyToRoutine.
> copy_data_dest_cb is also there.

We can do it but ProcessCopyOptions() accepts NULL
CopyToState for file_fdw. Can we create an empty
CopyToStateData internally like we did for opts_out in
ProcessCopyOptions()? (But it requires exporting
CopyToStateData. We'll export it in a later patch but it's
not yet in 0001.)

> The 0002 patch replaces the options checks with
> ProcessCopyOptionFormatFrom(). However, both
> ProcessCopyOptionFormatTo() and ProcessCOpyOptionFormatFrom() would
> set format-related options such as opts_out->csv_mode etc, which seems
> not elegant. IIUC the reason why we process only the "format" option
> first is to set the callback functions and call the init callback. So
> I think we don't necessarily need to do both setting callbacks and
> setting format-related options together. Probably we can do only the
> callback stuff first and then set format-related options in the
> original place we used to do?

If we do it, we need to write the (strcmp(format, "csv") ==
0) condition in copyto.c and copy.c. I wanted to avoid it. I
think that the duplication (setting opts_out->csv_mode in
copyto.c and copyfrom.c) is not a problem. But it's not a
strong opinion. If (strcmp(format, "csv") == 0) duplication
is better than opts_out->csv_mode = true duplication, I'll
do it.

BTW, if we want to make the CSV format implementation more
modularized, we will remove opts_out->csv_mode, move CSV
related options to CopyToCSVProcessOption() and keep CSV
related options in its opaque space. For example,
opts_out->force_quote exists in COPY TO opaque space but
doesn't exist in COPY FROM opaque space because it's not
used in COPY FROM.


> +static void
> +CopyToTextBasedFillCopyOutResponse(CopyToState cstate, StringInfoData *buf)
> +{
> +int16  format = 0;
> +intnatts = list_length(cstate->attnumlist);
> +inti;
> +
> +pq_sendbyte(buf, format);  /* overall format */
> +pq_sendint16(buf, natts);
> +for (i = 0; i < natts; i++)
> +pq_sendint16(buf, format); /* per-column formats */
> +}
> 
> This function and CopyToBinaryFillCo

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-29 Thread Masahiko Sawada

On Mon, Jan 29, 2024 at 6:45 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 29 Jan 2024 11:37:07 +0800,
>   Junwang Zhao  wrote:
>
> >> > > Does it make sense to pass only non-builtin options to the custom
> >> > > format callback after parsing and evaluating the builtin options? That
> >> > > is, we parse and evaluate only the builtin options and populate
> >> > > opts_out first, then pass each rest option to the custom format
> >> > > handler callback. The callback can refer to the builtin option values.
> >>
> >> What I imagined is that while parsing the all specified options, we
> >> evaluate builtin options and we add non-builtin options to another
> >> list. Then when parsing a non-builtin option, we check if this option
> >> already exists in the list. If there is, we raise the "option %s not
> >> recognized" error.". Once we complete checking all options, we pass
> >> each option in the list to the callback.
>
> I implemented this idea and the following ideas:
>
> 1. Add init callback for initialization
> 2. Change GetFormat() to FillCopyXXXResponse()
>because JSON format always use 1 column
> 3. FROM only: Eliminate more cstate->opts.csv_mode branches
>(This is for performance.)
>
> See the attached v9 patch set for details. Changes since v7:
>
> 0001:
>
> * Move CopyToProcessOption() calls to the end of
>   ProcessCopyOptions() for easy to option validation
> * Add CopyToState::CopyToInit() and call it in
>   ProcessCopyOptionFormatTo()
> * Change CopyToState::CopyToGetFormat() to
>   CopyToState::CopyToFillCopyOutResponse() and use it in
>   SendCopyBegin()

Thank you for updating the patch! Here are comments on 0001 patch:

---
+if (!format_specified)
+/* Set the default format. */
+ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
+

I think we can pass "text" in this case instead of NULL. That way,
ProcessCopyOptionFormatTo doesn't need to handle NULL case.

We need curly brackets for this "if branch" as follows:

if (!format_specifed)
{
/* Set the default format. */
ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
}

---
+/* Process not built-in options. */
+foreach(option, unknown_options)
+{
+DefElem*defel = lfirst_node(DefElem, option);
+bool   processed = false;
+
+if (!is_from)
+processed =
opts_out->to_routine->CopyToProcessOption(cstate, defel);
+if (!processed)
+ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+defel->defname),
+ parser_errposition(pstate,
defel->location)));
+}
+list_free(unknown_options);

I think we can check the duplicated options in the core as we discussed.

---
+static void
+CopyToTextBasedInit(CopyToState cstate)
+{
+}

and

+static void
+CopyToBinaryInit(CopyToState cstate)
+{
+}

Do we really need separate callbacks for the same behavior? I think we
can have a common init function say CopyToBuitinInit() that does
nothing. Or we can make the init callback optional.

The same is true for process-option callback.

---
 List  *convert_select; /* list of column names (can be NIL) */
+const  CopyToRoutine *to_routine;  /* callback
routines for COPY TO */
 } CopyFormatOptions;

I think CopyToStateData is a better place to have CopyToRoutine.
copy_data_dest_cb is also there.

---
-if (strcmp(fmt, "text") == 0)
- /* default format */ ;
-else if (strcmp(fmt, "csv") == 0)
-opts_out->csv_mode = true;
-else if (strcmp(fmt, "binary") == 0)
-opts_out->binary = true;
+
+if (is_from)
+{
+char  *fmt = defGetString(defel);
+
+if (strcmp(fmt, "text") == 0)
+ /* default format */ ;
+else if (strcmp(fmt, "csv") == 0)
+{
+opts_out->csv_mode = true;
+}
+else if (strcmp(fmt, "binar

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-29 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 29 Jan 2024 11:37:07 +0800,
  Junwang Zhao  wrote:

>> > > Does it make sense to pass only non-builtin options to the custom
>> > > format callback after parsing and evaluating the builtin options? That
>> > > is, we parse and evaluate only the builtin options and populate
>> > > opts_out first, then pass each rest option to the custom format
>> > > handler callback. The callback can refer to the builtin option values.
>>
>> What I imagined is that while parsing the all specified options, we
>> evaluate builtin options and we add non-builtin options to another
>> list. Then when parsing a non-builtin option, we check if this option
>> already exists in the list. If there is, we raise the "option %s not
>> recognized" error.". Once we complete checking all options, we pass
>> each option in the list to the callback.

I implemented this idea and the following ideas:

1. Add init callback for initialization
2. Change GetFormat() to FillCopyXXXResponse()
   because JSON format always use 1 column
3. FROM only: Eliminate more cstate->opts.csv_mode branches
   (This is for performance.)

See the attached v9 patch set for details. Changes since v7:

0001:

* Move CopyToProcessOption() calls to the end of
  ProcessCopyOptions() for easy to option validation
* Add CopyToState::CopyToInit() and call it in
  ProcessCopyOptionFormatTo()
* Change CopyToState::CopyToGetFormat() to
  CopyToState::CopyToFillCopyOutResponse() and use it in
  SendCopyBegin()

0002:

* Move CopyFromProcessOption() calls to the end of
  ProcessCopyOptions() for easy to option validation
* Add CopyFromState::CopyFromInit() and call it in
  ProcessCopyOptionFormatFrom()
* Change CopyFromState::CopyFromGetFormat() to
  CopyFromState::CopyFromFillCopyOutResponse() and use it in
  ReceiveCopyBegin()
* Rename NextCopyFromRawFields() to
  NextCopyFromRawFieldsInternal() and pass the read
  attributes callback explicitly to eliminate more
  cstate->opts.csv_mode branches

Thanks,
-- 
kou
>From c136833f4a385574474b246a381014abeb631377 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Fri, 26 Jan 2024 16:46:51 +0900
Subject: [PATCH v9 1/2] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToRoutine can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

CREATE TABLE data (int32 integer);
SELECT setseed(0.29);
INSERT INTO data
  SELECT random() * 1
FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

format,elapsed time (ms)
text,10.561
csv,10.868
binary,10.287

100K with this change:

format,elapsed time (ms)
text,9.962
csv,10.453
binary,9.473

1M without this change:

format,elapsed time (ms)
text,103.265
csv,109.789
binary,104.078

1M with this change:

format,elapsed time (ms)
text,98.612
csv,101.908
binary,94.456

10M without this change:

format,elapsed time (ms)
text,1060.614
csv,1065.272
binary,1025.875

10M with this change:

format,elapsed time (ms)
text,1020.050
csv,1031.279
binary,954.792
---
 contrib/file_fdw/file_fdw.c |   2 +-
 src/backend/commands/copy.c |  82 -
 src/backend/commands/copyfrom.c |   2 +-
 src/backend/commands/copyto.c   | 587 +++-
 src/include/commands/copy.h |   8 +-
 src/include/commands/copyapi.h  |  62 
 6 files changed, 560 insertions(+), 183 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --gi

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-28 Thread Junwang Zhao

On Mon, Jan 29, 2024 at 2:03 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Sat, 27 Jan 2024 14:15:02 +0800,
>   Junwang Zhao  wrote:
>
> > I have been working on a *COPY TO JSON* extension since yesterday,
> > which is based on your V6 patch set, I'd like to give you more input
> > so you can make better decisions about the implementation(with only
> > pg-copy-arrow you might not get everything considered).
>
> Thanks!
>
> > 0009 is some changes made by me, I changed CopyToGetFormat to
> > CopyToSendCopyBegin because pg_copy_json need to send different bytes
> > in SendCopyBegin, get the format code along is not enough
>
> Oh, I haven't cared about the case.
> How about the following API instead?
>
> static void
> SendCopyBegin(CopyToState cstate)
> {
> StringInfoData buf;
>
> pq_beginmessage(, PqMsg_CopyOutResponse);
> cstate->opts.to_routine->CopyToFillCopyOutResponse(cstate, );
> pq_endmessage();
> cstate->copy_dest = COPY_FRONTEND;
> }
>
> static void
> CopyToJsonFillCopyOutResponse(CopyToState cstate, StringInfoData )
> {
> int16   format = 0;
>
> pq_sendbyte(, format);  /* overall format */
> /*
>  * JSON mode is always one non-binary column
>  */
> pq_sendint16(, 1);
> pq_sendint16(, format);
> }

Make sense to me.

>
> > 00010 is the pg_copy_json extension, I think this should be a good
> > case which can utilize the *extendable copy format* feature
>
> It seems that it's convenient that we have one more callback
> for initializing CopyToState::opaque. It's called only once
> when Copy{To,From}Routine is chosen:
>
> typedef struct CopyToRoutine
> {
> void(*CopyToInit) (CopyToState cstate);
> ...
> };

I like this, we can alloc private data in this hook.

>
> void
> ProcessCopyOptions(ParseState *pstate,
>CopyFormatOptions *opts_out,
>bool is_from,
>void *cstate,
>List *options)
> {
> ...
> foreach(option, options)
> {
> DefElem*defel = lfirst_node(DefElem, option);
>
> if (strcmp(defel->defname, "format") == 0)
> {
> ...
> opts_out->to_routine = 
> opts_out->to_routine->CopyToInit(cstate);
> ...
> }
> }
> ...
> }
>
>
> >  maybe we
> > should delete copy_test_format if we have this extension as an
> > example?
>
> I haven't read the COPY TO format json thread[1] carefully
> (sorry), but we may add the JSON format as a built-in
> format. If we do it, copy_test_format is useful to test the
> extension API.

Yeah, maybe, I have no strong opinion here, pg_copy_json is
just a toy extension for discussion.

>
> [1] 
> https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
>
>
> Thanks,
> --
> kou



-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-28 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Sat, 27 Jan 2024 14:15:02 +0800,
  Junwang Zhao  wrote:

> I have been working on a *COPY TO JSON* extension since yesterday,
> which is based on your V6 patch set, I'd like to give you more input
> so you can make better decisions about the implementation(with only
> pg-copy-arrow you might not get everything considered).

Thanks!

> 0009 is some changes made by me, I changed CopyToGetFormat to
> CopyToSendCopyBegin because pg_copy_json need to send different bytes
> in SendCopyBegin, get the format code along is not enough

Oh, I haven't cared about the case.
How about the following API instead?

static void
SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;

pq_beginmessage(, PqMsg_CopyOutResponse);
cstate->opts.to_routine->CopyToFillCopyOutResponse(cstate, );
pq_endmessage();
cstate->copy_dest = COPY_FRONTEND;
}

static void
CopyToJsonFillCopyOutResponse(CopyToState cstate, StringInfoData )
{
int16   format = 0;

pq_sendbyte(, format);  /* overall format */
/*
 * JSON mode is always one non-binary column
 */
pq_sendint16(, 1);
pq_sendint16(, format);
}

> 00010 is the pg_copy_json extension, I think this should be a good
> case which can utilize the *extendable copy format* feature

It seems that it's convenient that we have one more callback
for initializing CopyToState::opaque. It's called only once
when Copy{To,From}Routine is chosen:

typedef struct CopyToRoutine
{
void(*CopyToInit) (CopyToState cstate);
...
};

void
ProcessCopyOptions(ParseState *pstate,
   CopyFormatOptions *opts_out,
   bool is_from,
   void *cstate,
   List *options)
{
...
foreach(option, options)
{
DefElem*defel = lfirst_node(DefElem, option);

if (strcmp(defel->defname, "format") == 0)
{
...
opts_out->to_routine = 
opts_out->to_routine->CopyToInit(cstate);
...
}
}
...
}


>  maybe we
> should delete copy_test_format if we have this extension as an
> example?

I haven't read the COPY TO format json thread[1] carefully
(sorry), but we may add the JSON format as a built-in
format. If we do it, copy_test_format is useful to test the
extension API.

[1] 
https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-28 Thread Junwang Zhao

On Mon, Jan 29, 2024 at 11:22 AM Masahiko Sawada  wrote:
>
> On Mon, Jan 29, 2024 at 12:10 PM Junwang Zhao  wrote:
> >
> > On Mon, Jan 29, 2024 at 10:42 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao  wrote:
> > > >
> > > > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > In 
> > > > > 
> > > > >   "Re: Make COPY format extendable: Extract COPY TO format 
> > > > > implementations" on Fri, 26 Jan 2024 16:41:50 +0800,
> > > > >   Junwang Zhao  wrote:
> > > > >
> > > > > > CopyToProcessOption()/CopyFromProcessOption() can only handle
> > > > > > single option, and store the options in the opaque field,  but it 
> > > > > > can not
> > > > > > check the relation of two options, for example, considering json 
> > > > > > format,
> > > > > > the `header` option can not be handled by these two functions.
> > > > > >
> > > > > > I want to find a way when the user specifies the header option, 
> > > > > > customer
> > > > > > handler can error out.
> > > > >
> > > > > Ah, you want to use a built-in option (such as "header")
> > > > > value from a custom handler, right? Hmm, it may be better
> > > > > that we call CopyToProcessOption()/CopyFromProcessOption()
> > > > > for all options including built-in options.
> > > > >
> > > > Hmm, still I don't think it can handle all cases, since we don't know
> > > > the sequence of the options, we need all the options been parsed
> > > > before we check the compatibility of the options, or customer
> > > > handlers will need complicated logic to resolve that, which might
> > > > lead to ugly code :(
> > > >
> > >
> > > Does it make sense to pass only non-builtin options to the custom
> > > format callback after parsing and evaluating the builtin options? That
> > > is, we parse and evaluate only the builtin options and populate
> > > opts_out first, then pass each rest option to the custom format
> > > handler callback. The callback can refer to the builtin option values.
> >
> > Yeah, I think this makes sense.
> >
> > > The callback is expected to return false if the passed option is not
> > > supported. If one of the builtin formats is specified and the rest
> > > options list has at least one option, we raise "option %s not
> > > recognized" error.  IOW it's the core's responsibility to ranse the
> > > "option %s not recognized" error, which is in order to raise a
> > > consistent error message. Also, I think the core should check the
> > > redundant options including bultiin and custom options.
> >
> > It would be good that core could check all the redundant options,
> > but where should core do the book-keeping of all the options? I have
> > no idea about this, in my implementation of pg_copy_json extension,
> > I handle redundant options by adding a xxx_specified field for each
> > xxx.
>
> What I imagined is that while parsing the all specified options, we
> evaluate builtin options and we add non-builtin options to another
> list. Then when parsing a non-builtin option, we check if this option
> already exists in the list. If there is, we raise the "option %s not
> recognized" error.". Once we complete checking all options, we pass
> each option in the list to the callback.

LGTM.

>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com



-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-28 Thread Masahiko Sawada

On Mon, Jan 29, 2024 at 12:10 PM Junwang Zhao  wrote:
>
> On Mon, Jan 29, 2024 at 10:42 AM Masahiko Sawada  
> wrote:
> >
> > On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao  wrote:
> > >
> > > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei  wrote:
> > > >
> > > > Hi,
> > > >
> > > > In 
> > > >   "Re: Make COPY format extendable: Extract COPY TO format 
> > > > implementations" on Fri, 26 Jan 2024 16:41:50 +0800,
> > > >   Junwang Zhao  wrote:
> > > >
> > > > > CopyToProcessOption()/CopyFromProcessOption() can only handle
> > > > > single option, and store the options in the opaque field,  but it can 
> > > > > not
> > > > > check the relation of two options, for example, considering json 
> > > > > format,
> > > > > the `header` option can not be handled by these two functions.
> > > > >
> > > > > I want to find a way when the user specifies the header option, 
> > > > > customer
> > > > > handler can error out.
> > > >
> > > > Ah, you want to use a built-in option (such as "header")
> > > > value from a custom handler, right? Hmm, it may be better
> > > > that we call CopyToProcessOption()/CopyFromProcessOption()
> > > > for all options including built-in options.
> > > >
> > > Hmm, still I don't think it can handle all cases, since we don't know
> > > the sequence of the options, we need all the options been parsed
> > > before we check the compatibility of the options, or customer
> > > handlers will need complicated logic to resolve that, which might
> > > lead to ugly code :(
> > >
> >
> > Does it make sense to pass only non-builtin options to the custom
> > format callback after parsing and evaluating the builtin options? That
> > is, we parse and evaluate only the builtin options and populate
> > opts_out first, then pass each rest option to the custom format
> > handler callback. The callback can refer to the builtin option values.
>
> Yeah, I think this makes sense.
>
> > The callback is expected to return false if the passed option is not
> > supported. If one of the builtin formats is specified and the rest
> > options list has at least one option, we raise "option %s not
> > recognized" error.  IOW it's the core's responsibility to ranse the
> > "option %s not recognized" error, which is in order to raise a
> > consistent error message. Also, I think the core should check the
> > redundant options including bultiin and custom options.
>
> It would be good that core could check all the redundant options,
> but where should core do the book-keeping of all the options? I have
> no idea about this, in my implementation of pg_copy_json extension,
> I handle redundant options by adding a xxx_specified field for each
> xxx.

What I imagined is that while parsing the all specified options, we
evaluate builtin options and we add non-builtin options to another
list. Then when parsing a non-builtin option, we check if this option
already exists in the list. If there is, we raise the "option %s not
recognized" error.". Once we complete checking all options, we pass
each option in the list to the callback.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-28 Thread Junwang Zhao

On Mon, Jan 29, 2024 at 10:42 AM Masahiko Sawada  wrote:
>
> On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao  wrote:
> >
> > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei  wrote:
> > >
> > > Hi,
> > >
> > > In 
> > >   "Re: Make COPY format extendable: Extract COPY TO format 
> > > implementations" on Fri, 26 Jan 2024 16:41:50 +0800,
> > >   Junwang Zhao  wrote:
> > >
> > > > CopyToProcessOption()/CopyFromProcessOption() can only handle
> > > > single option, and store the options in the opaque field,  but it can 
> > > > not
> > > > check the relation of two options, for example, considering json format,
> > > > the `header` option can not be handled by these two functions.
> > > >
> > > > I want to find a way when the user specifies the header option, customer
> > > > handler can error out.
> > >
> > > Ah, you want to use a built-in option (such as "header")
> > > value from a custom handler, right? Hmm, it may be better
> > > that we call CopyToProcessOption()/CopyFromProcessOption()
> > > for all options including built-in options.
> > >
> > Hmm, still I don't think it can handle all cases, since we don't know
> > the sequence of the options, we need all the options been parsed
> > before we check the compatibility of the options, or customer
> > handlers will need complicated logic to resolve that, which might
> > lead to ugly code :(
> >
>
> Does it make sense to pass only non-builtin options to the custom
> format callback after parsing and evaluating the builtin options? That
> is, we parse and evaluate only the builtin options and populate
> opts_out first, then pass each rest option to the custom format
> handler callback. The callback can refer to the builtin option values.

Yeah, I think this makes sense.

> The callback is expected to return false if the passed option is not
> supported. If one of the builtin formats is specified and the rest
> options list has at least one option, we raise "option %s not
> recognized" error.  IOW it's the core's responsibility to ranse the
> "option %s not recognized" error, which is in order to raise a
> consistent error message. Also, I think the core should check the
> redundant options including bultiin and custom options.

It would be good that core could check all the redundant options,
but where should core do the book-keeping of all the options? I have
no idea about this, in my implementation of pg_copy_json extension,
I handle redundant options by adding a xxx_specified field for each
xxx.

>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com



-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-28 Thread Masahiko Sawada

On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao  wrote:
>
> On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei  wrote:
> >
> > Hi,
> >
> > In 
> >   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> > on Fri, 26 Jan 2024 16:41:50 +0800,
> >   Junwang Zhao  wrote:
> >
> > > CopyToProcessOption()/CopyFromProcessOption() can only handle
> > > single option, and store the options in the opaque field,  but it can not
> > > check the relation of two options, for example, considering json format,
> > > the `header` option can not be handled by these two functions.
> > >
> > > I want to find a way when the user specifies the header option, customer
> > > handler can error out.
> >
> > Ah, you want to use a built-in option (such as "header")
> > value from a custom handler, right? Hmm, it may be better
> > that we call CopyToProcessOption()/CopyFromProcessOption()
> > for all options including built-in options.
> >
> Hmm, still I don't think it can handle all cases, since we don't know
> the sequence of the options, we need all the options been parsed
> before we check the compatibility of the options, or customer
> handlers will need complicated logic to resolve that, which might
> lead to ugly code :(
>

Does it make sense to pass only non-builtin options to the custom
format callback after parsing and evaluating the builtin options? That
is, we parse and evaluate only the builtin options and populate
opts_out first, then pass each rest option to the custom format
handler callback. The callback can refer to the builtin option values.
The callback is expected to return false if the passed option is not
supported. If one of the builtin formats is specified and the rest
options list has at least one option, we raise "option %s not
recognized" error.  IOW it's the core's responsibility to ranse the
"option %s not recognized" error, which is in order to raise a
consistent error message. Also, I think the core should check the
redundant options including bultiin and custom options.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Junwang Zhao

On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 26 Jan 2024 16:41:50 +0800,
>   Junwang Zhao  wrote:
>
> > CopyToProcessOption()/CopyFromProcessOption() can only handle
> > single option, and store the options in the opaque field,  but it can not
> > check the relation of two options, for example, considering json format,
> > the `header` option can not be handled by these two functions.
> >
> > I want to find a way when the user specifies the header option, customer
> > handler can error out.
>
> Ah, you want to use a built-in option (such as "header")
> value from a custom handler, right? Hmm, it may be better
> that we call CopyToProcessOption()/CopyFromProcessOption()
> for all options including built-in options.
>
Hmm, still I don't think it can handle all cases, since we don't know
the sequence of the options, we need all the options been parsed
before we check the compatibility of the options, or customer
handlers will need complicated logic to resolve that, which might
lead to ugly code :(

>
> Thanks,
> --
> kou



-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 26 Jan 2024 16:41:50 +0800,
  Junwang Zhao  wrote:

> CopyToProcessOption()/CopyFromProcessOption() can only handle
> single option, and store the options in the opaque field,  but it can not
> check the relation of two options, for example, considering json format,
> the `header` option can not be handled by these two functions.
> 
> I want to find a way when the user specifies the header option, customer
> handler can error out.

Ah, you want to use a built-in option (such as "header")
value from a custom handler, right? Hmm, it may be better
that we call CopyToProcessOption()/CopyFromProcessOption()
for all options including built-in options.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 26 Jan 2024 08:35:19 +0900,
  Michael Paquier  wrote:

>> OK. How about the following for the fetch function
>> signature?
>> 
>> extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);
> 
> Or CopyToRoutineGet()?  I am not wedded to my suggestion, got a bad
> history with naming things around here.

Thanks for the suggestion.
I rethink about this and use the following:

+extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions 
*opts_out, DefElem *defel);

It's not a fetch function. It sets CopyToRoutine opts_out
instead. But it hides CopyToRoutine* to copyto.c. Is it
acceptable?

>> OK. I'll focus on 0001 and 0005 for now. I'll restart
>> 0002-0004/0006-0008 after 0001 and 0005 are accepted.
> 
> Once you get these, I'd be interested in re-doing an evaluation of
> COPY TO and more tests with COPY FROM while running Postgres on
> scissors.  One thing I was thinking to use here is my blackhole_am for
> COPY FROM:
> https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am

Thanks!

Could you evaluate the attached patch set with COPY FROM?

I attach v7 patch set. It includes only the 0001 and 0005
parts in v6 patch set because we focus on them for now.

0001: This is based on 0001 in v6.

Changes since v6:

* Fix comment style
* Hide CopyToRoutine{Text,CSV,Binary}
* Add more comments
* Eliminate "if (cstate->opts.csv_mode)" branches from "text"
  and "csv" callbacks
* Remove CopyTo*_function typedefs
* Update benchmark results in commit message but the results
  are measured on my environment that isn't suitable for
  accurate benchmark

0002: This is based on 0005 in v6.

Changes since v6:

* Fix comment style
* Hide CopyFromRoutine{Text,CSV,Binary}
* Add more comments
* Eliminate a "if (cstate->opts.csv_mode)" branch from "text"
  and "csv" callbacks
  * NOTE: We can eliminate more "if (cstate->opts.csv_mode)" branches
such as one in NextCopyFromRawFields(). Should we do it
in this feature improvement (make COPY format
extendable)? Can we defer this as a separated improvement?
* Remove CopyFrom*_function typedefs

Thanks,
-- 
kou
>From 3e75129c2e9d9d34eebb6ef31b4fe6579f9eb02d Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Fri, 26 Jan 2024 16:46:51 +0900
Subject: [PATCH v7 1/2] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToRoutine can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

CREATE TABLE data (int32 integer);
SELECT setseed(0.29);
INSERT INTO data
  SELECT random() * 1
FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

format,elapsed time (ms)
text,10.561
csv,10.868
binary,10.287

100K with this change:

format,elapsed time (ms)
text,9.962
csv,10.453
binary,9.473

1M without this change:

format,elapsed time (ms)
text,103.265
csv,109.789
binary,104.078

1M with this change:

format,elapsed time (ms)
text,98.612
csv,101.908
binary,94.456

10M without this change:

format,elapsed time (ms)
text,1060.614
csv,1065.272
binary,1025.875

10M with this change:

format,elapsed time (ms)
text,1020.050
csv,1031.279
binary,954.792
---
 contrib/file_fdw/file_fdw.c |   2 +-
 src/backend/commands/copy.c |  71 ++--
 src/backend/commands/copyfrom.c |   2 +-
 src/backend/commands/copyto.c   | 551 +++-
 src/include/commands/copy.h |   8 +-
 src/

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Junwang Zhao

On Fri, Jan 26, 2024 at 4:32 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 26 Jan 2024 16:18:14 +0800,
>   Junwang Zhao  wrote:
>
> > In the current implementation, there is no way that one can check
> > incompatibility
> > options in ProcessCopyOptions, we can postpone the check in CopyFromStart
> > or CopyToStart, but I think it is a little bit late. Do you think
> > adding an extra
> > check for incompatible options hook is acceptable (PFA)?
>
> Thanks for the suggestion! But I think that a custom handler
> can do it in
> CopyToProcessOption()/CopyFromProcessOption(). What do you
> think about this? Or could you share a sample COPY TO/FROM
> WITH() SQL you think?

CopyToProcessOption()/CopyFromProcessOption() can only handle
single option, and store the options in the opaque field,  but it can not
check the relation of two options, for example, considering json format,
the `header` option can not be handled by these two functions.

I want to find a way when the user specifies the header option, customer
handler can error out.

>
>
> Thanks,
> --
> kou

-- 
Regards
Junwang Zhao

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 26 Jan 2024 16:18:14 +0800,
  Junwang Zhao  wrote:

> In the current implementation, there is no way that one can check
> incompatibility
> options in ProcessCopyOptions, we can postpone the check in CopyFromStart
> or CopyToStart, but I think it is a little bit late. Do you think
> adding an extra
> check for incompatible options hook is acceptable (PFA)?

Thanks for the suggestion! But I think that a custom handler
can do it in
CopyToProcessOption()/CopyFromProcessOption(). What do you
think about this? Or could you share a sample COPY TO/FROM
WITH() SQL you think?

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Junwang Zhao

On Thu, Jan 25, 2024 at 4:52 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 25 Jan 2024 13:36:03 +0900,
>   Masahiko Sawada  wrote:
>
> > I've experimented with a similar optimization for csv
> > and text format; have different callbacks for text and csv format and
> > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> > for that. Here are results:
> >
> > HEAD w/ 0001 patch + remove branches:
> > binary 2824.502 ms
> > text 2715.264 ms
> > csv 2803.381 ms
> >
> > The numbers look better now. I'm not sure these are within a noise
> > range but it might be worth considering having different callbacks for
> > text and csv formats.
>
> Wow! Interesting. I tried the approach before but I didn't
> see any difference by the approach. But it may depend on my
> environment.
>
> I'll import the approach to the next patch set so that
> others can try the approach easily.
>
>
> Thanks,
> --
> kou

Hi Kou-san,

In the current implementation, there is no way that one can check
incompatibility
options in ProcessCopyOptions, we can postpone the check in CopyFromStart
or CopyToStart, but I think it is a little bit late. Do you think
adding an extra
check for incompatible options hook is acceptable (PFA)?


-- 
Regards
Junwang Zhao


0001-add-check-incomptiblity-options-hooks.patch
Description: Binary data

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Michael Paquier

On Thu, Jan 25, 2024 at 05:45:43PM +0900, Sutou Kouhei wrote:
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 25 Jan 2024 12:17:55 +0900,
>   Michael Paquier  wrote:
>> +extern CopyToRoutine CopyToRoutineText;
>> +extern CopyToRoutine CopyToRoutineCSV;
>> +extern CopyToRoutine CopyToRoutineBinary;
>> 
>> All that should IMO remain in copyto.c and copyfrom.c in the initial
>> patch doing the refactoring.  Why not using a fetch function instead
>> that uses a string in input?  Then you can call that once after
>> parsing the List of options in ProcessCopyOptions().
> 
> OK. How about the following for the fetch function
> signature?
> 
> extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);

Or CopyToRoutineGet()?  I am not wedded to my suggestion, got a bad
history with naming things around here.

> We may introduce an enum and use it:
> 
> typedef enum CopyBuiltinFormat
> {
>   COPY_BUILTIN_FORMAT_TEXT = 0,
>   COPY_BUILTIN_FORMAT_CSV,
>   COPY_BUILTIN_FORMAT_BINARY,
> } CopyBuiltinFormat;
> 
> extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);

I am not sure that this is necessary as the option value is a string.

> Oh, sorry. I assumed that the comment style was adjusted by
> pgindent.

No worries, that's just something we get used to.  I tend to fix a lot
of these things by myself when editing patches.

>> +getTypeBinaryOutputInfo(attr->atttypid, _func_oid, );
>> +fmgr_info(out_func_oid, >out_functions[attnum - 1]);
>> 
>> Actually, this split is interesting.  It is possible for a custom
>> format to plug in a custom set of out functions.  Did you make use of
>> something custom for your own stuff?
> 
> I didn't. My PoC custom COPY format handler for Apache Arrow
> just handles integer and text for now. It doesn't use
> cstate->out_functions because cstate->out_functions may not
> return a valid binary format value for Apache Arrow. So it
> formats each value by itself.

I mean, if you use a custom output function, you could tweak things
even more with byteas or such..  If a callback is expected to do
something, like setting the output function OIDs in the start
callback, we'd better document it rather than letting that be implied.

>>   Actually, could it make sense to
>> split the assignment of cstate->out_functions into its own callback?
> 
> Yes. Because we need to use getTypeBinaryOutputInfo() for
> "binary" and use getTypeOutputInfo() for "text" and "csv".

Okay.  After sleeping on it, a split makes sense here, because it also
reduces the presence of TupleDesc in the start callback.

>> Sure, that's part of the start phase, but at least it would make clear
>> that a custom method *has* to assign these OIDs to work.  The patch
>> implies that as a rule, without a comment that CopyToStart *must* set
>> up these OIDs.
> 
> CopyToStart doesn't need to set up them if the handler
> doesn't use cstate->out_functions.

Noted.

>> I think that 0001 and 0005 should be handled first, as pieces
>> independent of the rest.  Then we could move on with 0002~0004 and
>> 0006~0008.
> 
> OK. I'll focus on 0001 and 0005 for now. I'll restart
> 0002-0004/0006-0008 after 0001 and 0005 are accepted.

Once you get these, I'd be interested in re-doing an evaluation of
COPY TO and more tests with COPY FROM while running Postgres on
scissors.  One thing I was thinking to use here is my blackhole_am for
COPY FROM:
https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am

As per its name, it does nothing on INSERT, so you could create a
table using it as access method, and stress the COPY FROM execution
paths without having to mount Postgres on a tmpfs because the data is
sent to the void.  Perhaps it does not matter, but that moves the
tests to the bottlenecks we want to stress (aka the per-row callback
for large data sets).

I've switched the patch as waiting on author for now.  Thanks for your
perseverance here.  I understand that's not easy to follow up with
patches and reviews (^_^;)
--
Michael

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 13:36:03 +0900,
  Masahiko Sawada  wrote:

> I've experimented with a similar optimization for csv
> and text format; have different callbacks for text and csv format and
> remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> for that. Here are results:
> 
> HEAD w/ 0001 patch + remove branches:
> binary 2824.502 ms
> text 2715.264 ms
> csv 2803.381 ms
> 
> The numbers look better now. I'm not sure these are within a noise
> range but it might be worth considering having different callbacks for
> text and csv formats.

Wow! Interesting. I tried the approach before but I didn't
see any difference by the approach. But it may depend on my
environment.

I'll import the approach to the next patch set so that
others can try the approach easily.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 12:17:55 +0900,
  Michael Paquier  wrote:

> +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem 
> *defel);
> +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
> +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot 
> *slot);
> +typedef void (*CopyToEnd_function) (CopyToState cstate);
> 
> We don't really need a set of typedefs here, let's put the definitions
> in the CopyToRoutine struct instead.

OK. I'll do it.

> +extern CopyToRoutine CopyToRoutineText;
> +extern CopyToRoutine CopyToRoutineCSV;
> +extern CopyToRoutine CopyToRoutineBinary;
> 
> All that should IMO remain in copyto.c and copyfrom.c in the initial
> patch doing the refactoring.  Why not using a fetch function instead
> that uses a string in input?  Then you can call that once after
> parsing the List of options in ProcessCopyOptions().

OK. How about the following for the fetch function
signature?

extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);

We may introduce an enum and use it:

typedef enum CopyBuiltinFormat
{
COPY_BUILTIN_FORMAT_TEXT = 0,
COPY_BUILTIN_FORMAT_CSV,
COPY_BUILTIN_FORMAT_BINARY,
} CopyBuiltinFormat;

extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);

> +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
> + * move the code to here later. */
> Some areas, like this comment, are written in an incorrect format.

Oh, sorry. I assumed that the comment style was adjusted by
pgindent.

I'll use the following style:

/*
 * ...
 */

> +getTypeBinaryOutputInfo(attr->atttypid, _func_oid, );
> +fmgr_info(out_func_oid, >out_functions[attnum - 1]);
> 
> Actually, this split is interesting.  It is possible for a custom
> format to plug in a custom set of out functions.  Did you make use of
> something custom for your own stuff?

I didn't. My PoC custom COPY format handler for Apache Arrow
just handles integer and text for now. It doesn't use
cstate->out_functions because cstate->out_functions may not
return a valid binary format value for Apache Arrow. So it
formats each value by itself.

I'll chose one of them for a custom type (that isn't
supported by Apache Arrow, e.g. PostGIS types):

1. Report an unsupported error
2. Call output function for Apache Arrow provided by the
   custom type

>   Actually, could it make sense to
> split the assignment of cstate->out_functions into its own callback?

Yes. Because we need to use getTypeBinaryOutputInfo() for
"binary" and use getTypeOutputInfo() for "text" and "csv".

> Sure, that's part of the start phase, but at least it would make clear
> that a custom method *has* to assign these OIDs to work.  The patch
> implies that as a rule, without a comment that CopyToStart *must* set
> up these OIDs.

CopyToStart doesn't need to set up them if the handler
doesn't use cstate->out_functions.

> I think that 0001 and 0005 should be handled first, as pieces
> independent of the rest.  Then we could move on with 0002~0004 and
> 0006~0008.

OK. I'll focus on 0001 and 0005 for now. I'll restart
0002-0004/0006-0008 after 0001 and 0005 are accepted.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

Thanks for trying these patches!

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 10:53:58 +0800,
  jian he  wrote:

> COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5

Wow! I didn't know the "\watch count="!
I'll use it.

> Time: 668.996 ms
> Time: 596.254 ms
> Time: 592.723 ms
> Time: 591.663 ms
> Time: 590.803 ms

It seems that 5 times isn't enough for this case as Michael
said. But thanks for trying!

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Masahiko Sawada

On Thu, Jan 25, 2024 at 1:53 PM Michael Paquier  wrote:
>
> On Thu, Jan 25, 2024 at 01:36:03PM +0900, Masahiko Sawada wrote:
> > Hmm I can see a similar trend that Suto-san had; the binary format got
> > slightly faster whereas both text and csv format has small regression
> > (4%~5%). I think that the improvement for binary came from the fact
> > that we removed "if (cstate->opts.binary)" branches from the original
> > CopyOneRowTo(). I've experimented with a similar optimization for csv
> > and text format; have different callbacks for text and csv format and
> > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> > for that. Here are results:
> >
> > HEAD w/ 0001 patch + remove branches:
> > binary 2824.502 ms
> > text 2715.264 ms
> > csv 2803.381 ms
> >
> > The numbers look better now. I'm not sure these are within a noise
> > range but it might be worth considering having different callbacks for
> > text and csv formats.
>
> Interesting.
>
> Your numbers imply a 0.3% speedup for text, 0.7% speedup for csv and
> 0.9% speedup for binary, which may be around the noise range assuming
> a ~1% range.  While this does not imply a regression, that seems worth
> the duplication IMO.

Agreed. In addition to that, now that each format routine has its own
callbacks, there would be chances that we can do other optimizations
dedicated to the format type in the future if available.

>  The patch had better document the reason why the
> split is done, as well.

+1

>
> CopyFromTextOneRow() has also specific branches for binary and
> non-binary removed in 0005, so assuming that I/O is not a bottleneck,
> the operation would be faster because we would not evaluate this "if"
> condition for each row.  Wouldn't we also see improvements for COPY
> FROM with short row values, say when mounting PGDATA into a
> tmpfs/ramfs?

Probably. Seems worth evaluating.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Michael Paquier

On Thu, Jan 25, 2024 at 01:36:03PM +0900, Masahiko Sawada wrote:
> Hmm I can see a similar trend that Suto-san had; the binary format got
> slightly faster whereas both text and csv format has small regression
> (4%~5%). I think that the improvement for binary came from the fact
> that we removed "if (cstate->opts.binary)" branches from the original
> CopyOneRowTo(). I've experimented with a similar optimization for csv
> and text format; have different callbacks for text and csv format and
> remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> for that. Here are results:
> 
> HEAD w/ 0001 patch + remove branches:
> binary 2824.502 ms
> text 2715.264 ms
> csv 2803.381 ms
> 
> The numbers look better now. I'm not sure these are within a noise
> range but it might be worth considering having different callbacks for
> text and csv formats.

Interesting.

Your numbers imply a 0.3% speedup for text, 0.7% speedup for csv and
0.9% speedup for binary, which may be around the noise range assuming
a ~1% range.  While this does not imply a regression, that seems worth
the duplication IMO.  The patch had better document the reason why the
split is done, as well.

CopyFromTextOneRow() has also specific branches for binary and
non-binary removed in 0005, so assuming that I/O is not a bottleneck,
the operation would be faster because we would not evaluate this "if"
condition for each row.  Wouldn't we also see improvements for COPY
FROM with short row values, say when mounting PGDATA into a
tmpfs/ramfs?
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Masahiko Sawada

On Wed, Jan 24, 2024 at 11:17 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In <10025bac-158c-ffe7-fbec-32b426291...@dunslane.net>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Wed, 24 Jan 2024 07:15:55 -0500,
>   Andrew Dunstan  wrote:
>
> >
> > On 2024-01-24 We 03:11, Michael Paquier wrote:
> >> On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote:
> >>> For COPY TO:
> >>>
> >>> 0001: This adds CopyToRoutine and use it for text/csv/binary
> >>> formats. No implementation change. This just move codes.
> >> 10M without this change:
> >>
> >>  format,elapsed time (ms)
> >>  text,1090.763
> >>  csv,1136.103
> >>  binary,1137.141
> >>
> >> 10M with this change:
> >>
> >>  format,elapsed time (ms)
> >>  text,1082.654
> >>  csv,1196.991
> >>  binary,1069.697
> >>
> >> These numbers point out that binary is faster by 6%, csv is slower by
> >> 5%, while text stays around what looks like noise range.  That's not
> >> negligible.  Are these numbers reproducible?  If they are, that could
> >> be a problem for anybody doing bulk-loading of large data sets.  I am
> >> not sure to understand where the improvement for binary comes from by
> >> reading the patch, but perhaps perf would tell more for each format?
> >> The loss with csv could be blamed on the extra manipulations of the
> >> function pointers, likely.
> >
> >
> > I don't think that's at all acceptable.
> >
> > We've spent quite a lot of blood sweat and tears over the years to make COPY
> > fast, and we should not sacrifice any of that lightly.
>
> These numbers aren't reproducible. Because these benchmarks
> executed on my normal machine not a machine only for
> benchmarking. The machine runs another processes such as
> editor and Web browser.
>
> For example, here are some results with master
> (94edfe250c6a200d2067b0debfe00b4122e9b11e):
>
> Format,N records,Elapsed time (ms)
> csv,1000,1073.715
> csv,1000,1022.830
> csv,1000,1073.584
> csv,1000,1090.651
> csv,1000,1052.259
>
> Here are some results with master + the 0001 patch:
>
> Format,N records,Elapsed time (ms)
> csv,1000,1025.356
> csv,1000,1067.202
> csv,1000,1014.563
> csv,1000,1032.088
> csv,1000,1058.110
>
>
> I uploaded my benchmark script so that you can run the same
> benchmark on your machine:
>
> https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5
>
> Could anyone try the benchmark with master and master+0001?
>

I've run a similar scenario:

create unlogged table test (a int);
insert into test select c from generate_series(1, 2500) c;
copy test to '/tmp/result.csv' with (format csv); -- generates 230MB file

I've run it on HEAD and HEAD+0001 patch and here are the medians of 10
executions for each format:

HEAD:
binary 2930.353 ms
text 2754.852 ms
csv 2890.012 ms

HEAD w/ 0001 patch:
binary 2814.838 ms
text 2900.845 ms
csv 3015.210 ms

Hmm I can see a similar trend that Suto-san had; the binary format got
slightly faster whereas both text and csv format has small regression
(4%~5%). I think that the improvement for binary came from the fact
that we removed "if (cstate->opts.binary)" branches from the original
CopyOneRowTo(). I've experimented with a similar optimization for csv
and text format; have different callbacks for text and csv format and
remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
for that. Here are results:

HEAD w/ 0001 patch + remove branches:
binary 2824.502 ms
text 2715.264 ms
csv 2803.381 ms

The numbers look better now. I'm not sure these are within a noise
range but it might be worth considering having different callbacks for
text and csv formats.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


add_callback_for_csv_format.patch
Description: Binary data

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Michael Paquier

On Thu, Jan 25, 2024 at 10:53:58AM +0800, jian he wrote:
> apply your patch:
> COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5
> Time: 668.996 ms
> Time: 596.254 ms
> Time: 592.723 ms
> Time: 591.663 ms
> Time: 590.803 ms
> 
> not apply your patch, at git 729439607ad210dbb446e31754e8627d7e3f7dda
> COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5
> Time: 644.246 ms
> Time: 583.075 ms
> Time: 568.670 ms
> Time: 569.463 ms
> Time: 569.201 ms
> 
> I forgot to test other formats.

There can be some variance in the tests, so you'd better run much more
tests so as you can get a better idea of the mean.  Discarding the N
highest and lowest values also reduces slightly the effects of the
noise you would get across single runs.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Michael Paquier

On Wed, Jan 24, 2024 at 11:17:26PM +0900, Sutou Kouhei wrote:
> In <10025bac-158c-ffe7-fbec-32b426291...@dunslane.net>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Wed, 24 Jan 2024 07:15:55 -0500,
>   Andrew Dunstan  wrote:
>> We've spent quite a lot of blood sweat and tears over the years to make COPY
>> fast, and we should not sacrifice any of that lightly.

Clearly.

> I uploaded my benchmark script so that you can run the same
> benchmark on your machine:
> 
> https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5

Thanks, that saves time.  I am attaching it to this email as well, for
the sake of the archives if this link is removed in the future.

> Could anyone try the benchmark with master and master+0001?

Yep.  It is one point we need to settle before deciding what to do
with this patch set, and I've done so to reach my own conclusion.

I have a rather good machine at my disposal in the cloud, so I did a
few runs with HEAD and HEAD+0001, with PGDATA mounted on a tmpfs.
Here are some results for the 10M row case, as these should be the
least prone to noise, 5 runs each: 

master
text 10M  1732.570 1684.542 1693.430 1687.696 1714.845
csv 10M   1729.113 1724.926 1727.414 1726.237 1728.865
bin 10M   1679.097 1677.887 1676.764 1677.554 1678.120

master+0001
text 10M  1702.207 1654.818 1647.069 1690.568 1654.446
csv 10M   1764.939 1714.313 1712.444 1712.323 1716.952
bin 10M   1703.061 1702.719 1702.234 1703.346 1704.137

Hmm.  The point of contention in the patch is the change to use the
CopyToOneRow callback in CopyOneRowTo(), as we go through it for each
row and we should habe a worst-case scenario with a relation that has
a small attribute size.  The more rows, the more effect it would have.
The memory context switches and the StringInfo manipulations are
equally important, and there are a bunch of the latter, actually, with
optimizations around fe_msgbuf.

I've repeated a few runs across these two builds, and there is some
variance and noise, but I am going to agree with your point that the
effect 0001 cannot be seen.  Even HEAD is showing some noise.  So I am
discarding the concerns I had after seeing the numbers you posted
upthread.

+typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem 
*defel);
+typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
+typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
+typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot 
*slot);
+typedef void (*CopyToEnd_function) (CopyToState cstate);

We don't really need a set of typedefs here, let's put the definitions
in the CopyToRoutine struct instead.

+extern CopyToRoutine CopyToRoutineText;
+extern CopyToRoutine CopyToRoutineCSV;
+extern CopyToRoutine CopyToRoutineBinary;

All that should IMO remain in copyto.c and copyfrom.c in the initial
patch doing the refactoring.  Why not using a fetch function instead
that uses a string in input?  Then you can call that once after
parsing the List of options in ProcessCopyOptions().

Introducing copyapi.h in the initial patch makes sense here for the TO
and FROM routines.

+/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
+ * move the code to here later. */
Some areas, like this comment, are written in an incorrect format.

+if (cstate->opts.csv_mode)
+CopyAttributeOutCSV(cstate, colname, false,
+list_length(cstate->attnumlist) == 1);
+else
+CopyAttributeOutText(cstate, colname);

You are right that this is not worth the trouble of creating a
different set of callbacks for CSV.  This makes the result cleaner.

+getTypeBinaryOutputInfo(attr->atttypid, _func_oid, );
+fmgr_info(out_func_oid, >out_functions[attnum - 1]);

Actually, this split is interesting.  It is possible for a custom
format to plug in a custom set of out functions.  Did you make use of
something custom for your own stuff?  Actually, could it make sense to
split the assignment of cstate->out_functions into its own callback?
Sure, that's part of the start phase, but at least it would make clear
that a custom method *has* to assign these OIDs to work.  The patch
implies that as a rule, without a comment that CopyToStart *must* set
up these OIDs.

I think that 0001 and 0005 should be handled first, as pieces
independent of the rest.  Then we could move on with 0002~0004 and
0006~0008.
--
Michael
#!/bin/bash

set -eu
set -o pipefail

base_dir=$(dirname "$0")

prepare_sql()
{
  size=$1
  db_name=$2

  cat <

signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread jian he

On Wed, Jan 24, 2024 at 10:17 PM Sutou Kouhei  wrote:
>
> I uploaded my benchmark script so that you can run the same
> benchmark on your machine:
>
> https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5
>
> Could anyone try the benchmark with master and master+0001?
>

sorry. I made a mistake. I applied v6, 0001 to 0008 all the patches.

my tests:
CREATE unlogged TABLE data (a bigint);
SELECT setseed(0.29);
INSERT INTO data SELECT random() * 1 FROM generate_series(1, 1e7);

my setup:
meson setup --reconfigure ${BUILD} \
-Dprefix=${PG_PREFIX} \
-Dpgport=5462 \
-Dbuildtype=release \
-Ddocs_html_style=website \
-Ddocs_pdf=disabled \
-Dllvm=disabled \
-Dextra_version=_release_build

gcc version:  PostgreSQL 17devel_release_build on x86_64-linux,
compiled by gcc-11.4.0, 64-bit

apply your patch:
COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5
Time: 668.996 ms
Time: 596.254 ms
Time: 592.723 ms
Time: 591.663 ms
Time: 590.803 ms

not apply your patch, at git 729439607ad210dbb446e31754e8627d7e3f7dda
COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5
Time: 644.246 ms
Time: 583.075 ms
Time: 568.670 ms
Time: 569.463 ms
Time: 569.201 ms

I forgot to test other formats.

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Sutou Kouhei

Hi,

In <20240124.144936.67229716500876806@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 24 Jan 2024 14:49:36 +0900 (JST),
  Sutou Kouhei  wrote:

> I've implemented custom COPY format feature based on the
> current design discussion. See the attached patches for
> details.

I forgot to mention one note. Documentation isn't included
in these patches. I'll write it after all (or some) patches
are merged. Is it OK?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Sutou Kouhei

Hi,

In <10025bac-158c-ffe7-fbec-32b426291...@dunslane.net>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 24 Jan 2024 07:15:55 -0500,
  Andrew Dunstan  wrote:

> 
> On 2024-01-24 We 03:11, Michael Paquier wrote:
>> On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote:
>>> For COPY TO:
>>>
>>> 0001: This adds CopyToRoutine and use it for text/csv/binary
>>> formats. No implementation change. This just move codes.
>> 10M without this change:
>>
>>  format,elapsed time (ms)
>>  text,1090.763
>>  csv,1136.103
>>  binary,1137.141
>>
>> 10M with this change:
>>
>>  format,elapsed time (ms)
>>  text,1082.654
>>  csv,1196.991
>>  binary,1069.697
>>
>> These numbers point out that binary is faster by 6%, csv is slower by
>> 5%, while text stays around what looks like noise range.  That's not
>> negligible.  Are these numbers reproducible?  If they are, that could
>> be a problem for anybody doing bulk-loading of large data sets.  I am
>> not sure to understand where the improvement for binary comes from by
>> reading the patch, but perhaps perf would tell more for each format?
>> The loss with csv could be blamed on the extra manipulations of the
>> function pointers, likely.
> 
> 
> I don't think that's at all acceptable.
> 
> We've spent quite a lot of blood sweat and tears over the years to make COPY
> fast, and we should not sacrifice any of that lightly.

These numbers aren't reproducible. Because these benchmarks
executed on my normal machine not a machine only for
benchmarking. The machine runs another processes such as
editor and Web browser.

For example, here are some results with master
(94edfe250c6a200d2067b0debfe00b4122e9b11e):

Format,N records,Elapsed time (ms)
csv,1000,1073.715
csv,1000,1022.830
csv,1000,1073.584
csv,1000,1090.651
csv,1000,1052.259

Here are some results with master + the 0001 patch:

Format,N records,Elapsed time (ms)
csv,1000,1025.356
csv,1000,1067.202
csv,1000,1014.563
csv,1000,1032.088
csv,1000,1058.110


I uploaded my benchmark script so that you can run the same
benchmark on your machine:

https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5

Could anyone try the benchmark with master and master+0001?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Andrew Dunstan




On 2024-01-24 We 03:11, Michael Paquier wrote:

On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote:

For COPY TO:

0001: This adds CopyToRoutine and use it for text/csv/binary
formats. No implementation change. This just move codes.

10M without this change:

 format,elapsed time (ms)
 text,1090.763
 csv,1136.103
 binary,1137.141

10M with this change:

 format,elapsed time (ms)
 text,1082.654
 csv,1196.991
 binary,1069.697

These numbers point out that binary is faster by 6%, csv is slower by
5%, while text stays around what looks like noise range.  That's not
negligible.  Are these numbers reproducible?  If they are, that could
be a problem for anybody doing bulk-loading of large data sets.  I am
not sure to understand where the improvement for binary comes from by
reading the patch, but perhaps perf would tell more for each format?
The loss with csv could be blamed on the extra manipulations of the
function pointers, likely.



I don't think that's at all acceptable.

We've spent quite a lot of blood sweat and tears over the years to make 
COPY fast, and we should not sacrifice any of that lightly.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Michael Paquier

On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote:
> For COPY TO:
> 
> 0001: This adds CopyToRoutine and use it for text/csv/binary
> formats. No implementation change. This just move codes.

10M without this change:

format,elapsed time (ms)
text,1090.763
csv,1136.103
binary,1137.141

10M with this change:

format,elapsed time (ms)
text,1082.654
csv,1196.991
binary,1069.697

These numbers point out that binary is faster by 6%, csv is slower by
5%, while text stays around what looks like noise range.  That's not
negligible.  Are these numbers reproducible?  If they are, that could
be a problem for anybody doing bulk-loading of large data sets.  I am
not sure to understand where the improvement for binary comes from by
reading the patch, but perhaps perf would tell more for each format?
The loss with csv could be blamed on the extra manipulations of the
function pointers, likely.
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-23 Thread Sutou Kouhei

Hi,

I've implemented custom COPY format feature based on the
current design discussion. See the attached patches for
details.

I also implemented a PoC COPY format handler for Apache
Arrow with this implementation and it worked.
https://github.com/kou/pg-copy-arrow

The patches implement not only custom COPY TO format feature
but also custom COPY FROM format feature.

0001-0004 is for COPY TO and 0005-0008 is for COPY FROM.

For COPY TO:

0001: This adds CopyToRoutine and use it for text/csv/binary
formats. No implementation change. This just move codes.

0002: This adds support for adding custom COPY TO format by
"CREATE FUNCTION ${FORMAT_NAME}". This uses the same
approach provided by Sawada-san[1] but this doesn't
introduce a wrapper CopyRoutine struct for
Copy{To,From}Routine. Because I noticed that a wrapper
CopyRoutine struct is needless. Copy handler can just return
CopyToRoutine or CopyFromRtouine because both of them have
NodeTag. We can distinct a returned struct by the NodeTag.

[1] 
https://www.postgresql.org/message-id/cad21aocunywhird3gapzwe6s9jg1wzxj3cr6vgn36ddhegj...@mail.gmail.com

0003: This exports CopyToStateData. No implementation change
except CopyDest enum values. I changed COPY_ prefix to
COPY_DEST_ to avoid name conflict with CopySource enum
values. This just moves codes.

0004: This adds CopyToState::opaque and exports
CopySendEndOfRow(). CopySendEndOfRow() is renamed to
CopyToStateFlush().

For COPY FROM:

0005: Same as 0001 but for COPY FROM. This adds
CopyFromRoutine and use it for text/csv/binary formats. No
implementation change. This just move codes.

0006: Same as 0002 but for COPY FROM. This adds support for
adding custom COPY FROM format by "CREATE FUNCTION
${FORMAT_NAME}".

0007: Same as 0003 but for COPY FROM. This exports
CopyFromStateData. No implementation change except
CopySource enum values. I changed COPY_ prefix to
COPY_SOURCE_ to align CopyDest changes in 0003. This just
moves codes.

0008: Same as 0004 but for COPY FROM. This adds
CopyFromState::opaque and exports
CopyReadBinaryData(). CopyReadBinaryData() is renamed to
CopyFromStateRead().

Thanks,
-- 
kou

In <20240115.152702.2011620917962812379@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 15 Jan 2024 15:27:02 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> If there are no more comments for the current design, I'll
> start implementing this feature with the following
> approaches for "Discussing" items:
> 
>> 3.1 Should we use one function(internal) for COPY TO/FROM
>> handlers or two function(internal)s (one is for COPY TO
>> handler and another is for COPY FROM handler)?
>> [4]
> 
> I'll choose "one function(internal) for COPY TO/FROM handlers".
> 
>> 3.4 Should we export Copy{To,From}State? Or should we just
>> provide getters/setters to access Copy{To,From}State
>> internal?
>> [10]
> 
> I'll export Copy{To,From}State.
> 
> 
> Thanks,
> -- 
> kou
> 
> In <20240112.144615.157925223373344229@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 12 Jan 2024 14:46:15 +0900 (JST),
>   Sutou Kouhei  wrote:
> 
>> Hi,
>> 
>> Here is the current summary for a this discussion to make
>> COPY format extendable. It's for reaching consensus and
>> starting implementing the feature. (I'll start implementing
>> the feature once we reach consensus.) If you have any
>> opinion, please share it.
>> 
>> Confirmed:
>> 
>> 1.1 Making COPY format extendable will not reduce performance.
>> [1]
>> 
>> Decisions:
>> 
>> 2.1 Use separated handler for COPY TO and COPY FROM because
>> our COPY TO implementation (copyto.c) and COPY FROM
>> implementation (coypfrom.c) are separated.
>> [2]
>> 
>> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
>> just use a function(internal) that returns a handler instead.
>> [3]
>> 
>> 2.3 The implementation must include documentation.
>> [5]
>> 
>> 2.4 The implementation must include test.
>> [6]
>> 
>> 2.5 The implementation should be consist of small patches
>> for easy to review.
>> [6]
>> 
>> 2.7 Copy{To,From}State must have a opaque space for
>> handlers.
>> [8]
>> 
>> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
>> handlers.
>> [8]
>> 
>> 2.9 Make "format" in PgMsg_CopyOutResponse message
>> extendable.
>> [9]
>> 
>> 2.10 Make makeNode() call avoidable in function(internal)
>>

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-15 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 15 Jan 2024 16:03:41 +0900,
  Masahiko Sawada  wrote:

>> Defining one more static const struct instead of providing a
>> convenient (but a bit tricky) macro may be straightforward:
>>
>> static const CopyToFormatRoutine testfmt_copyto_routine = {
>> .type = T_CopyToFormatRoutine,
>> .start_fn = testfmt_copyto_start,
>> .onerow_fn = testfmt_copyto_onerow,
>> .end_fn = testfmt_copyto_end
>> };
>>
>> static const CopyFormatRoutine testfmt_copyto_handler = {
>> .type = T_CopyFormatRoutine,
>> .is_from = false,
>> .routine = (Node *) _copyto_routine
>> };
> 
> Yeah, IIUC this is the option 2 you mentioned[1]. I think we can go
> with this idea as it's the simplest.
>
> [1] 
> https://www.postgresql.org/message-id/20240110.120034.501385498034538233.kou%40clear-code.com

Ah, you're right. I forgot it...

>  That is CopyFormatRoutine will be like:
> 
> typedef struct CopyFormatRoutine
> {
> NodeTag type;
> 
> /* either CopyToFormatRoutine or CopyFromFormatRoutine */
> Node   *routine;
> }   CopyFormatRoutine;
> 
> And the core can check the node type of the 'routine7 in the
> CopyFormatRoutine returned by extensions.

It makes sense.


If no more comments about the current design, I'll start
implementing this feature based on the current design.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Masahiko Sawada

On Thu, Jan 11, 2024 at 10:24 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Wed, 10 Jan 2024 16:53:48 +0900,
>   Masahiko Sawada  wrote:
>
> >> Interesting. But I feel that it introduces another (a bit)
> >> tricky mechanism...
> >
> > Right. On the other hand, I don't think the idea 3 is good for the
> > same reason Michael-san pointed out before[1][2].
> >
> > [1] https://www.postgresql.org/message-id/ZXEUIy6wl4jHy6Nm%40paquier.xyz
> > [2] https://www.postgresql.org/message-id/ZXKm9tmnSPIVrqZz%40paquier.xyz
>
> I think that the important part of the Michael-san's opinion
> is "keep COPY TO implementation and COPY FROM implementation
> separated for maintainability".
>
> The patch focused in [1][2] uses one routine for both of
> COPY TO and COPY FROM. If we use the approach, we need to
> change one common routine from copyto.c and copyfrom.c (or
> export callbacks from copyto.c and copyfrom.c and use them
> in copyto.c to construct one common routine). It's
> the problem.
>
> The idea 3 still has separated routines for COPY TO and COPY
> FROM. So I think that it still keeps COPY TO implementation
> and COPY FROM implementation separated.
>
> >> BTW, we also need to set .type:
> >>
> >>  .routine = COPYTO_ROUTINE (
> >>  .type = T_CopyToFormatRoutine,
> >>  .start_fn = testfmt_copyto_start,
> >>  .onerow_fn = testfmt_copyto_onerow,
> >>  .end_fn = testfmt_copyto_end
> >>  )
> >
> > I think it's fine as the same is true for table AM.
>
> Ah, sorry. I should have said explicitly. I don't this that
> it's not a problem. I just wanted to say that it's missing.

Thank you for pointing it out.

>
>
> Defining one more static const struct instead of providing a
> convenient (but a bit tricky) macro may be straightforward:
>
> static const CopyToFormatRoutine testfmt_copyto_routine = {
> .type = T_CopyToFormatRoutine,
> .start_fn = testfmt_copyto_start,
> .onerow_fn = testfmt_copyto_onerow,
> .end_fn = testfmt_copyto_end
> };
>
> static const CopyFormatRoutine testfmt_copyto_handler = {
> .type = T_CopyFormatRoutine,
> .is_from = false,
> .routine = (Node *) _copyto_routine
> };

Yeah, IIUC this is the option 2 you mentioned[1]. I think we can go
with this idea as it's the simplest. If we find a better way, we can
change it later. That is CopyFormatRoutine will be like:

typedef struct CopyFormatRoutine
{
NodeTag type;

/* either CopyToFormatRoutine or CopyFromFormatRoutine */
Node   *routine;
}   CopyFormatRoutine;

And the core can check the node type of the 'routine7 in the
CopyFormatRoutine returned by extensions.

Regards,

[1] 
https://www.postgresql.org/message-id/20240110.120034.501385498034538233.kou%40clear-code.com

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Sutou Kouhei

Hi,

If there are no more comments for the current design, I'll
start implementing this feature with the following
approaches for "Discussing" items:

> 3.1 Should we use one function(internal) for COPY TO/FROM
> handlers or two function(internal)s (one is for COPY TO
> handler and another is for COPY FROM handler)?
> [4]

I'll choose "one function(internal) for COPY TO/FROM handlers".

> 3.4 Should we export Copy{To,From}State? Or should we just
> provide getters/setters to access Copy{To,From}State
> internal?
> [10]

I'll export Copy{To,From}State.


Thanks,
-- 
kou

In <20240112.144615.157925223373344229@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 12 Jan 2024 14:46:15 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> Here is the current summary for a this discussion to make
> COPY format extendable. It's for reaching consensus and
> starting implementing the feature. (I'll start implementing
> the feature once we reach consensus.) If you have any
> opinion, please share it.
> 
> Confirmed:
> 
> 1.1 Making COPY format extendable will not reduce performance.
> [1]
> 
> Decisions:
> 
> 2.1 Use separated handler for COPY TO and COPY FROM because
> our COPY TO implementation (copyto.c) and COPY FROM
> implementation (coypfrom.c) are separated.
> [2]
> 
> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
> just use a function(internal) that returns a handler instead.
> [3]
> 
> 2.3 The implementation must include documentation.
> [5]
> 
> 2.4 The implementation must include test.
> [6]
> 
> 2.5 The implementation should be consist of small patches
> for easy to review.
> [6]
> 
> 2.7 Copy{To,From}State must have a opaque space for
> handlers.
> [8]
> 
> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
> handlers.
> [8]
> 
> 2.9 Make "format" in PgMsg_CopyOutResponse message
> extendable.
> [9]
> 
> 2.10 Make makeNode() call avoidable in function(internal)
>  that returns COPY TO/FROM handler.
>  [9]
> 
> 2.11 Custom COPY TO/FROM handlers must be able to parse
>  their options.
>  [11]
> 
> Discussing:
> 
> 3.1 Should we use one function(internal) for COPY TO/FROM
> handlers or two function(internal)s (one is for COPY TO
> handler and another is for COPY FROM handler)?
> [4]
> 
> 3.2 If we use separated function(internal) for COPY TO/FROM
> handlers, we need to define naming rule. For example,
> _to(internal) for COPY TO handler and
> _from(internal) for COPY FROM handler.
> [4]
> 
> 3.3 Should we use prefix or suffix for function(internal)
> name to avoid name conflict with other handlers such as
> tablesample handlers?
> [7]
> 
> 3.4 Should we export Copy{To,From}State? Or should we just
> provide getters/setters to access Copy{To,From}State
> internal?
> [10]
> 
> 
> [1] 
> https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
> [2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
> [3] 
> https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
> [4] 
> https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
> [5] 
> https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
> [7] 
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
> [8] 
> https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
> [9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
> [10] 
> https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
> [11] 
> https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com
> 
> 
> Thanks,
> -- 
> kou
> 
>

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 12 Jan 2024 14:40:41 +0800,
  Junwang Zhao  wrote:

>> Could you clarify what should we discuss? We should require
>> that COPY TO/FROM handlers should use PostgreSQL's memory
>> context for all internal memory allocations?
> 
> Yes, handlers should use PostgreSQL's memory context, and I think
> creating other memory context under CopyToStateData.copycontext
> should be suggested for handler creators, so I proposed exporting
> CopyToStateData to public header.

I see.

We can provide a getter for CopyToStateData::copycontext if
we don't want to export CopyToStateData. Note that I don't
have a strong opinion whether we should export
CopyToStateData or not.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-11 Thread Junwang Zhao

Hi,

On Wed, Jan 10, 2024 at 2:20 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 22 Dec 2023 10:58:05 +0800,
>   Junwang Zhao  wrote:
>
> >> 1. Add an opaque space for custom COPY TO handler
> >>* Add CopyToState{Get,Set}Opaque()
> >>
> >> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
> >>
> >> 2. Export CopyToState::attnumlist
> >>* Add CopyToStateGetAttNumList()
> >>
> >> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
> >>
> >> 3. Export CopySend*()
> >>* Rename CopySend*() to CopyToStateSend*() and export them
> >>* Exception: CopySendEndOfRow() to CopyToStateFlush() because
> >>  it just flushes the internal buffer now.
> >>
> >> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
> >>
> > I guess the purpose of these helpers is to avoid expose CopyToState to
> > copy.h,
>
> Yes.
>
> > but I
> > think expose CopyToState to user might make life easier, users might want 
> > to use
> > the memory contexts of the structure (though I agree not all the
> > fields are necessary
> > for extension handers).
>
> OK. I don't object it as I said in another e-mail:
> https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72
>
> >> 2. Need an opaque space like IndexScanDesc::opaque does
> >>
> >>* A custom COPY TO handler needs to keep its data
> >
> > I once thought users might want to parse their own options, maybe this
> > is a use case for this opaque space.
>
> Good catch! I forgot to suggest a callback for custom format
> options. How about the following API?
>
> 
> ...
> typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem 
> *defel);
>
> ...
> typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem 
> *defel);
>
> typedef struct CopyToFormatRoutine
> {
> ...
> CopyToProcessOption_function process_option_fn;
> } CopyToFormatRoutine;
>
> typedef struct CopyFromFormatRoutine
> {
> ...
> CopyFromProcessOption_function process_option_fn;
> } CopyFromFormatRoutine;
> 
>
> 
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> index e7597894bf..1aa8b62551 100644
> --- a/src/backend/commands/copy.c
> +++ b/src/backend/commands/copy.c
> @@ -416,6 +416,7 @@ void
>  ProcessCopyOptions(ParseState *pstate,
>CopyFormatOptions *opts_out,
>bool is_from,
> +  void *cstate, /* CopyToState* for 
> !is_from, CopyFromState* for is_from */
>List *options)
>  {
> boolformat_specified = false;
> @@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
>  parser_errposition(pstate, 
> defel->location)));
> }
> else
> -   ereport(ERROR,
> -   (errcode(ERRCODE_SYNTAX_ERROR),
> -errmsg("option \"%s\" not 
> recognized",
> -   defel->defname),
> -parser_errposition(pstate, 
> defel->location)));
> +   {
> +   bool processed;
> +   if (is_from)
> +   processed = 
> opts_out->from_ops->process_option_fn(cstate, defel);
> +   else
> +   processed = 
> opts_out->to_ops->process_option_fn(cstate, defel);
> +   if (!processed)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("option \"%s\" not 
> recognized",
> +   
> defel->defname),
> +parser_errposition(pstate, 
> defel->location)));
> +   }
> }
>
> /*
> 

Looks good.

>
> > For the name, I thought private_data might be a better candidate than
>

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-11 Thread Sutou Kouhei

Hi,

Here is the current summary for a this discussion to make
COPY format extendable. It's for reaching consensus and
starting implementing the feature. (I'll start implementing
the feature once we reach consensus.) If you have any
opinion, please share it.

Confirmed:

1.1 Making COPY format extendable will not reduce performance.
[1]

Decisions:

2.1 Use separated handler for COPY TO and COPY FROM because
our COPY TO implementation (copyto.c) and COPY FROM
implementation (coypfrom.c) are separated.
[2]

2.2 Don't use system catalog for COPY TO/FROM handlers. We can
just use a function(internal) that returns a handler instead.
[3]

2.3 The implementation must include documentation.
[5]

2.4 The implementation must include test.
[6]

2.5 The implementation should be consist of small patches
for easy to review.
[6]

2.7 Copy{To,From}State must have a opaque space for
handlers.
[8]

2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
handlers.
[8]

2.9 Make "format" in PgMsg_CopyOutResponse message
extendable.
[9]

2.10 Make makeNode() call avoidable in function(internal)
 that returns COPY TO/FROM handler.
 [9]

2.11 Custom COPY TO/FROM handlers must be able to parse
 their options.
 [11]

Discussing:

3.1 Should we use one function(internal) for COPY TO/FROM
handlers or two function(internal)s (one is for COPY TO
handler and another is for COPY FROM handler)?
[4]

3.2 If we use separated function(internal) for COPY TO/FROM
handlers, we need to define naming rule. For example,
_to(internal) for COPY TO handler and
_from(internal) for COPY FROM handler.
[4]

3.3 Should we use prefix or suffix for function(internal)
name to avoid name conflict with other handlers such as
tablesample handlers?
[7]

3.4 Should we export Copy{To,From}State? Or should we just
provide getters/setters to access Copy{To,From}State
internal?
[10]


[1] 
https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
[2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
[3] 
https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
[7] 
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
[8] 
https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
[9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
[10] 
https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
[11] 
https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com


Thanks,
-- 
kou

1 2 >

1 - 100 of 154 matches

Mail list logo