from:"Sutou Kouhei"

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-05-29 Thread Sutou Kouhei

Hi,

In <4707d4ed-f268-43c0-b4dd-cdbc7520f...@eisentraut.org>
  "Re: meson: Specify -Wformat as a common warning flag for extensions" on Tue, 
28 May 2024 23:31:05 -0700,
  Peter Eisentraut  wrote:

> On 07.04.24 18:01, Sutou Kouhei wrote:
>> +# We don't have "warning_level == 3" and "warning_level ==
>> +# 'everything'" here because we don't use these warning levels.
>> +if warning_level == '1'
>> +  common_builtin_flags += ['-Wall']
>> +elif warning_level == '2'
>> +  common_builtin_flags += ['-Wall', '-Wextra']
>> +endif
> 
> I would trim this even further and always export just '-Wall'.  The
> other options aren't really something we support.

OK. How about the v6 patch? It always uses '-Wall'.

Thanks,
-- 
kou
>From 8238adba3f3fc96d4a9e50af611b1cb3566abc0e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Fri, 15 Mar 2024 18:27:30 +0900
Subject: [PATCH v6] meson: Restore implicit warning/debug/optimize flags for
 extensions

Meson specifies warning/debug/optimize flags such as "-Wall", "-g" and
"-O2" automatically based on "--warnlevel" and "--buildtype"
options. And we use "--warning_level=1" and
"--buildtype=debugoptimized" by default.

We don't specify warning/debug/optimize flags explicitly to build
PostgreSQL with Meson. Because Meson does it automatically as we said.
But Meson doesn't care about flags in Makefile.global and
pg_config. So we need to care about them manually.

This changes do it. They detect debug/optimize flags based on
debug/optimization option values because Meson doesn't tell us flags
Meson guessed. We always use -Wall for warning flags.
---
 meson.build | 27 +++
 src/include/meson.build |  4 ++--
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index d6401fb8e30..d7239dbb114 100644
--- a/meson.build
+++ b/meson.build
@@ -1851,6 +1851,33 @@ endif
 vectorize_cflags = cc.get_supported_arguments(['-ftree-vectorize'])
 unroll_loops_cflags = cc.get_supported_arguments(['-funroll-loops'])
 
+# They aren't used for building PostgreSQL itself because Meson does
+# everything internally. They are used by extensions via pg_config or
+# Makefile.global.
+common_builtin_flags = ['-Wall']
+
+if get_option('debug')
+  common_builtin_flags += ['-g']
+endif
+
+optimization = get_option('optimization')
+if optimization == '0'
+  common_builtin_flags += ['-O0']
+elif optimization == '1'
+  common_builtin_flags += ['-O1']
+elif optimization == '2'
+  common_builtin_flags += ['-O2']
+elif optimization == '3'
+  common_builtin_flags += ['-O3']
+elif optimization == 's'
+  common_builtin_flags += ['-Os']
+endif
+
+cflags_builtin = cc.get_supported_arguments(common_builtin_flags)
+if llvm.found()
+  cxxflags_builtin = cpp.get_supported_arguments(common_builtin_flags)
+endif
+
 common_warning_flags = [
   '-Wmissing-prototypes',
   '-Wpointer-arith',
diff --git a/src/include/meson.build b/src/include/meson.build
index a28f115d867..58b7a9c1e7e 100644
--- a/src/include/meson.build
+++ b/src/include/meson.build
@@ -44,9 +44,9 @@ config_paths_data.set_quoted('MANDIR', dir_prefix / dir_man)
 
 var_cc = ' '.join(cc.cmd_array())
 var_cpp = ' '.join(cc.cmd_array() + ['-E'])
-var_cflags = ' '.join(cflags + cflags_warn + get_option('c_args'))
+var_cflags = ' '.join(cflags + cflags_builtin + cflags_warn + get_option('c_args'))
 if llvm.found()
-  var_cxxflags = ' '.join(cxxflags + cxxflags_warn + get_option('cpp_args'))
+  var_cxxflags = ' '.join(cxxflags + cxxflags_builtin + cxxflags_warn + get_option('cpp_args'))
 else
   var_cxxflags = ''
 endif
-- 
2.43.0

Re: Is it acceptable making COPY format extendable?

2024-05-06 Thread Sutou Kouhei

Hi,

In <7a978dd2-371b-4d10-b58e-cb32ef728...@eisentraut.org>
  "Re: Is it acceptable making COPY format extendable?" on Thu, 25 Apr 2024 
09:59:18 +0200,
  Peter Eisentraut  wrote:

> PostgreSQL development is currently in feature freeze for PostgreSQL
> 17 (since April 8).  So most participants will not be looking very
> closely at development projects for releases beyond that.  The next
> commitfest for patches targeting PostgreSQL 18 is in July, and I see
> your patch registered there.  It's possible that your thread is not
> going to get much attention until then.
> 
> So, in summary, you are doing everything right, you just have to be a
> bit more patient right now. :)

I see. I'll wait for the next commitfest.


Thanks,
-- 
kou

Re: 2024-05-09 release announcement draft,2024-05-09 release announcement draft

2024-05-06 Thread Sutou Kouhei

Hi,

In 
<779790c7-45d7-4010-9305-c3f9e6a60...@postgresql.org>,<779790c7-45d7-4010-9305-c3f9e6a60...@postgresql.org>
  "2024-05-09 release announcement draft,2024-05-09 release announcement draft" 
on Mon, 6 May 2024 13:44:05 -0400,
  "Jonathan S. Katz"  wrote:

> * Added optimization for certain operations where an installation has 
> thousands
> of roles.

Added ->
Add

> * [Follow @postgresql on Twitter](https://twitter.com/postgresql)

Twitter ->
X


Thanks,
-- 
kou

Re: Is it acceptable making COPY format extendable?

2024-04-24 Thread Sutou Kouhei

Hi,

Thanks for replying this.

In <02cccd36-3083-4a50-aae4-f957e96fb...@eisentraut.org>
  "Re: Is it acceptable making COPY format extendable?" on Wed, 24 Apr 2024 
09:57:38 +0200,
  Peter Eisentraut  wrote:

>> I'm proposing a patch that making COPY format extendable:
>> https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
>> https://commitfest.postgresql.org/48/4681/
>>
>> But my proposal stalled. It it acceptable the feature
>> request that making COPY format extendable? If it's not
>> acceptable, what are blockers? Performance?
> 
> I think that thread is getting an adequate amount of attention.  Of
> course, you can always wish for more, but "stalled" looks very
> different.

Sorry for "stalled" misuse.

I haven't got any reply since 2024-03-15:
https://www.postgresql.org/message-id/flat/20240315.173754.2049843193122381085.kou%40clear-code.com#07aefc636d8165204ddfba971dc9a490
(I sent some pings.)

So I called this status as "stalled".

I'm not familiar with the PostgreSQL's development
style. What should I do for this case? Should I just wait
for a reply from others without doing anything?

> Let's not start a new thread here to discuss the merits of the other
> thread; that discussion belongs there.

I wanted to discuss possibility for this feature request in
this thread. I wanted to use my existing thread is for how
to implement this feature request.

Should I reuse
https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com
for it instead of this thread?

Thanks,
-- 
kou

Is it acceptable making COPY format extendable?

2024-04-22 Thread Sutou Kouhei

Hi,

I'm proposing a patch that making COPY format extendable:
https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
https://commitfest.postgresql.org/48/4681/

It's based on the discussion at:
https://www.postgresql.org/message-id/flat/3741749.1655952...@sss.pgh.pa.us#2bb7af4a3d2c7669f9a49808d777a20d

> > > IIUC, you want extensibility in FORMAT argument to COPY command
> > > https://www.postgresql.org/docs/current/sql-copy.html. Where the
> > > format is pluggable. That seems useful.
>
> > Agreed.
>
> Ditto.


But my proposal stalled. It it acceptable the feature
request that making COPY format extendable? If it's not
acceptable, what are blockers? Performance?


Thanks,
-- 
kou

Re: pg_stat_statements and "IN" conditions

2024-04-15 Thread Sutou Kouhei

Hi,

In <20240404143514.a26f7ttxrbdfc73a@erthalion.local>
  "Re: pg_stat_statements and "IN" conditions" on Thu, 4 Apr 2024 16:35:14 
+0200,
  Dmitry Dolgov <9erthali...@gmail.com> wrote:

> Here is the rebased version.

Thanks. I'm not familiar with this code base but I've
reviewed these patches because I'm interested in this
feature too.

0001:

> diff --git a/src/backend/nodes/queryjumblefuncs.c 
> b/src/backend/nodes/queryjumblefuncs.c
> index be823a7f8fa..e9473def361 100644
> --- a/src/backend/nodes/queryjumblefuncs.c
> +++ b/src/backend/nodes/queryjumblefuncs.c
> 
> @@ -212,15 +233,67 @@ RecordConstLocation(JumbleState *jstate, int location)
> ...
> +static bool
> +IsMergeableConstList(List *elements, Const **firstConst, Const **lastConst)
> +{
> + ListCell   *temp;
> + Node   *firstExpr = NULL;
> +
> + if (elements == NULL)

"elements == NIL" will be better for List.

> +static void
> +_jumbleElements(JumbleState *jstate, List *elements)
> +{
> + Const *first, *last;
> + if(IsMergeableConstList(elements, , ))

A space is missing between "if" and "(".

> diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
> index aa727e722cc..cf4f900d4ed 100644
> --- a/src/include/nodes/primnodes.h
> +++ b/src/include/nodes/primnodes.h
> @@ -1333,7 +1333,7 @@ typedef struct ArrayExpr
>   /* common type of array elements */
>   Oid element_typeid 
> pg_node_attr(query_jumble_ignore);
>   /* the array elements or sub-arrays */
> - List   *elements;
> + List   *elements pg_node_attr(query_jumble_merge);

Should we also update the pg_node_attr() comment for
query_jumble_merge in nodes.h?


0003:

> diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
> b/contrib/pg_stat_statements/pg_stat_statements.c
> index d7841b51cc9..00eec30feb1 100644
> --- a/contrib/pg_stat_statements/pg_stat_statements.c
> +++ b/contrib/pg_stat_statements/pg_stat_statements.c
> ...
> @@ -2883,12 +2886,22 @@ generate_normalized_query(JumbleState *jstate, const 
> char *query,
>   /* The firsts merged constant */
>   else if (!skip)
>   {
> + static const uint32 powers_of_ten[] = {
> + 1, 10, 100,
> + 1000, 1, 10,
> + 100, 1000, 1,
> + 10
> + };
> + int lower_merged = powers_of_ten[magnitude - 1];
> + int upper_merged = powers_of_ten[magnitude];

How about adding a reverse function of decimalLength32() to
numutils.h and use it here?

> - n_quer_loc += sprintf(norm_query + n_quer_loc, "...");
> + n_quer_loc += sprintf(norm_query + n_quer_loc, "... 
> [%d-%d entries]",
> +   lower_merged, 
> upper_merged - 1);

Do we still have enough space in norm_query for this change?
It seems that norm_query expects up to 10 additional bytes
per jstate->clocations[i].


It seems that we can merge 0001, 0003 and 0004 to one patch.
(Sorry. I haven't read all discussions yet. If we already
discuss this, sorry for this noise.)


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-04-10 Thread Sutou Kouhei

Hi Andres,

Could you take a look at this? I think that you don't want
to touch the current text/csv/binary implementations. The
v17 patch approach doesn't touch the current text/csv/binary
implementations. What do you think about this approach?


Thanks,
-- 
kou

In <20240320.232732.488684985873786799@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 20 Mar 2024 23:27:32 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> Could someone review the v17 patch to proceed this?
> 
> The v17 patch:
> https://www.postgresql.org/message-id/flat/20240305.171808.667980402249336456.kou%40clear-code.com#d2ee079b75ebcf00c410300ecc4a357a
> 
> Some profiles by Michael:
> https://www.postgresql.org/message-id/flat/ZelfYatRdVZN3FbE%40paquier.xyz#eccfd1a0131af93c48026d691cc247f4
> 
> Thanks,
> -- 
> kou
> 
> In <20240308.092254.359611633589181574@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 08 Mar 2024 09:22:54 +0900 (JST),
>   Sutou Kouhei  wrote:
> 
>> Hi,
>> 
>> In 
>>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
>> on Thu, 7 Mar 2024 15:32:01 +0900,
>>   Michael Paquier  wrote:
>> 
>>> While on it, here are some profiles based on HEAD and v17 with the
>>> previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
>>> 
>> ...
>>> 
>>> So, in short, and that's not really a surprise, there is no effect
>>> once we use the dispatching with the routines only when a format would
>>> want to plug-in with the APIs, but a custom format would still have a
>>> penalty of a few percents for both if bottlenecked on CPU.
>> 
>> Thanks for sharing these profiles!
>> I agree with you.
>> 
>> This shows that the v17 approach doesn't affect the current
>> text/csv/binary implementations. (The v17 approach just adds
>> 2 new structs, Copy{From,To}Rountine, without changing the
>> current text/csv/binary implementations.)
>> 
>> Can we push the v17 patch and proceed following
>> implementations? Could someone (especially a PostgreSQL
>> committer) take a look at this for double-check?
>> 
>> 
>> Thanks,
>> -- 
>> kou
>> 
>> 
> 
>

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-04-07 Thread Sutou Kouhei

Hi Andres,

Thanks for reviewing this!

In <20240407232635.fq4kc5556laha...@awork3.anarazel.de>
  "Re: meson: Specify -Wformat as a common warning flag for extensions" on Sun, 
7 Apr 2024 16:26:35 -0700,
  Andres Freund  wrote:

> This seems like a fair amount of extra configure tests. Particularly because
> /W* isn't ever interesting for Makefile.global - they're msvc flags - because
> you can't use that with msvc.
> 
> I'm also doubtful that it's worth supporting warning_level=3/everything, you
> end up with a completely flood of warnings that way.

OK. I've removed "/W*" flags and warning_level==3/everything
cases.

How about the attached v5 patch?


Thanks,
-- 
kou
>From 205ef88c66cf1050eedfc1e72d951de93a02e53a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Fri, 15 Mar 2024 18:27:30 +0900
Subject: [PATCH v5] meson: Restore implicit warning/debug/optimize flags for
 extensions

Meson specifies warning/debug/optimize flags such as "-Wall", "-g" and
"-O2" automatically based on "--warnlevel" and "--buildtype"
options. And we use "--warning_level=1" and
"--buildtype=debugoptimized" by default.

We don't specify warning/debug/optimize flags explicitly to build
PostgreSQL with Meson. Because Meson does it automatically as we said.
But Meson doesn't care about flags in Makefile.global and
pg_config. So we need to care about them manually.

This changes do it. They detect warning/debug/optimize flags based on
warning_level/debug/optimization option values because Meson doesn't
tell us flags Meson guessed.
---
 meson.build | 41 +
 src/include/meson.build |  4 ++--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 5acf083ce3c..11bd56f79a7 100644
--- a/meson.build
+++ b/meson.build
@@ -1848,6 +1848,47 @@ endif
 vectorize_cflags = cc.get_supported_arguments(['-ftree-vectorize'])
 unroll_loops_cflags = cc.get_supported_arguments(['-funroll-loops'])
 
+# They aren't used for building PostgreSQL itself because Meson does
+# everything internally. They are used by extensions via pg_config or
+# Makefile.global.
+common_builtin_flags = []
+
+warning_level = get_option('warning_level')
+# See https://mesonbuild.com/Builtin-options.html#details-for-warning_level for
+# warning_level values.
+#
+# We don't use "/W*" flags here because we don't need to care about MSVC here.
+#
+# We don't have "warning_level == 3" and "warning_level ==
+# 'everything'" here because we don't use these warning levels.
+if warning_level == '1'
+  common_builtin_flags += ['-Wall']
+elif warning_level == '2'
+  common_builtin_flags += ['-Wall', '-Wextra']
+endif
+
+if get_option('debug')
+  common_builtin_flags += ['-g']
+endif
+
+optimization = get_option('optimization')
+if optimization == '0'
+  common_builtin_flags += ['-O0']
+elif optimization == '1'
+  common_builtin_flags += ['-O1']
+elif optimization == '2'
+  common_builtin_flags += ['-O2']
+elif optimization == '3'
+  common_builtin_flags += ['-O3']
+elif optimization == 's'
+  common_builtin_flags += ['-Os']
+endif
+
+cflags_builtin = cc.get_supported_arguments(common_builtin_flags)
+if llvm.found()
+  cxxflags_builtin = cpp.get_supported_arguments(common_builtin_flags)
+endif
+
 common_warning_flags = [
   '-Wmissing-prototypes',
   '-Wpointer-arith',
diff --git a/src/include/meson.build b/src/include/meson.build
index a28f115d867..58b7a9c1e7e 100644
--- a/src/include/meson.build
+++ b/src/include/meson.build
@@ -44,9 +44,9 @@ config_paths_data.set_quoted('MANDIR', dir_prefix / dir_man)
 
 var_cc = ' '.join(cc.cmd_array())
 var_cpp = ' '.join(cc.cmd_array() + ['-E'])
-var_cflags = ' '.join(cflags + cflags_warn + get_option('c_args'))
+var_cflags = ' '.join(cflags + cflags_builtin + cflags_warn + get_option('c_args'))
 if llvm.found()
-  var_cxxflags = ' '.join(cxxflags + cxxflags_warn + get_option('cpp_args'))
+  var_cxxflags = ' '.join(cxxflags + cxxflags_builtin + cxxflags_warn + get_option('cpp_args'))
 else
   var_cxxflags = ''
 endif
-- 
2.43.0

Re: Commitfest Manager for March

2024-04-02 Thread Sutou Kouhei

Hi,

In 
  "Re: Commitfest Manager for March" on Mon, 1 Apr 2024 11:57:27 +0300,
  Heikki Linnakangas  wrote:

> On 01/04/2024 09:05, Andrey M. Borodin wrote:
>> I think it makes sense to close this commitfest only after Feature
>> Freeze on April 8, 2024 at 0:00 AoE.
>> What do you think?
> 
> +1. IIRC that's how it's been done in last commitfest in previous
> years too.

Thanks for extending the period.

Could someone review my patches for this commitfest?

1. https://commitfest.postgresql.org/47/4681/
   Make COPY format extendable: Extract COPY TO format implementations
   Status: Needs review

2. https://commitfest.postgresql.org/47/4791/
   meson: Specify -Wformat as a common warning flag for extensions
   Status: Ready for Committer

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-20 Thread Sutou Kouhei

Hi,

Could someone review the v17 patch to proceed this?

The v17 patch:
https://www.postgresql.org/message-id/flat/20240305.171808.667980402249336456.kou%40clear-code.com#d2ee079b75ebcf00c410300ecc4a357a

Some profiles by Michael:
https://www.postgresql.org/message-id/flat/ZelfYatRdVZN3FbE%40paquier.xyz#eccfd1a0131af93c48026d691cc247f4

Thanks,
-- 
kou

In <20240308.092254.359611633589181574@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 08 Mar 2024 09:22:54 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 7 Mar 2024 15:32:01 +0900,
>   Michael Paquier  wrote:
> 
>> While on it, here are some profiles based on HEAD and v17 with the
>> previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
>> 
> ...
>> 
>> So, in short, and that's not really a surprise, there is no effect
>> once we use the dispatching with the routines only when a format would
>> want to plug-in with the APIs, but a custom format would still have a
>> penalty of a few percents for both if bottlenecked on CPU.
> 
> Thanks for sharing these profiles!
> I agree with you.
> 
> This shows that the v17 approach doesn't affect the current
> text/csv/binary implementations. (The v17 approach just adds
> 2 new structs, Copy{From,To}Rountine, without changing the
> current text/csv/binary implementations.)
> 
> Can we push the v17 patch and proceed following
> implementations? Could someone (especially a PostgreSQL
> committer) take a look at this for double-check?
> 
> 
> Thanks,
> -- 
> kou
> 
>

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-03-15 Thread Sutou Kouhei

Hi,

In <49e97fd0-c17e-4cbc-aeee-80ac51400...@eisentraut.org>
  "Re: meson: Specify -Wformat as a common warning flag for extensions" on Wed, 
13 Mar 2024 08:38:28 +0100,
  Peter Eisentraut  wrote:

> I think the fix then is to put -Wall into CFLAGS in
> Makefile.global. Looking at a diff of Makefile.global between an
> autoconf and a meson build, I also see that under meson, CFLAGS
> doesn't get -O2 -g (or similar, depending on settings).  This
> presumably has the same underlying issue that meson handles those
> flags internally.
> 
> For someone who wants to write a fix for this, the relevant variable
> is var_cflags in our meson scripts.  And var_cxxflags as well.

How about the attached v4 patch?


Thanks,
-- 
kou
>From a515720338dc49e764f598021200316c6d01ddf9 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Fri, 15 Mar 2024 18:27:30 +0900
Subject: [PATCH v4] meson: Restore implicit warning/debug/optimize flags for
 extensions

Meson specifies warning/debug/optimize flags such as "-Wall", "-g" and
"-O2" automatically based on "--warnlevel" and "--buildtype"
options. And we use "--warning_level=1" and
"--buildtype=debugoptimized" by default.

We don't specify warning/debug/optimize flags explicitly to build
PostgreSQL with Meson. Because Meson does it automatically as we said.
But Meson doesn't care about flags in Makefile.global and
pg_config. So we need to care about them manually.

This changes do it. They detect warning/debug/optimize flags based on
warning_level/debug/optimization option values because Meson doesn't
tell us flags Meson guessed.
---
 meson.build | 40 
 src/include/meson.build |  4 ++--
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index c8fdfeb0ec..3c5c449a0a 100644
--- a/meson.build
+++ b/meson.build
@@ -1824,6 +1824,46 @@ endif
 vectorize_cflags = cc.get_supported_arguments(['-ftree-vectorize'])
 unroll_loops_cflags = cc.get_supported_arguments(['-funroll-loops'])
 
+# They aren't used for building PostgreSQL itself because Meson does
+# everything internally. They are used by extensions via pg_config or
+# Makefile.global.
+common_builtin_flags = []
+
+warning_level = get_option('warning_level')
+# See https://mesonbuild.com/Builtin-options.html#details-for-warning_level for
+# warning_level values.
+if warning_level == '1'
+  common_builtin_flags += ['-Wall', '/W2']
+elif warning_level == '2'
+  common_builtin_flags += ['-Wall', '-Wextra', '/W3']
+elif warning_level == '3'
+  common_builtin_flags += ['-Wall', '-Wextra', '-Wpedantic', '/W4']
+elif warning_level == 'everything'
+  common_builtin_flags += ['-Weverything', '/Wall']
+endif
+
+if get_option('debug')
+  common_builtin_flags += ['-g']
+endif
+
+optimization = get_option('optimization')
+if optimization == '0'
+  common_builtin_flags += ['-O0']
+elif optimization == '1'
+  common_builtin_flags += ['-O1']
+elif optimization == '2'
+  common_builtin_flags += ['-O2']
+elif optimization == '3'
+  common_builtin_flags += ['-O3']
+elif optimization == 's'
+  common_builtin_flags += ['-Os']
+endif
+
+cflags_builtin = cc.get_supported_arguments(common_builtin_flags)
+if llvm.found()
+  cxxflags_builtin = cpp.get_supported_arguments(common_builtin_flags)
+endif
+
 common_warning_flags = [
   '-Wmissing-prototypes',
   '-Wpointer-arith',
diff --git a/src/include/meson.build b/src/include/meson.build
index a28f115d86..58b7a9c1e7 100644
--- a/src/include/meson.build
+++ b/src/include/meson.build
@@ -44,9 +44,9 @@ config_paths_data.set_quoted('MANDIR', dir_prefix / dir_man)
 
 var_cc = ' '.join(cc.cmd_array())
 var_cpp = ' '.join(cc.cmd_array() + ['-E'])
-var_cflags = ' '.join(cflags + cflags_warn + get_option('c_args'))
+var_cflags = ' '.join(cflags + cflags_builtin + cflags_warn + get_option('c_args'))
 if llvm.found()
-  var_cxxflags = ' '.join(cxxflags + cxxflags_warn + get_option('cpp_args'))
+  var_cxxflags = ' '.join(cxxflags + cxxflags_builtin + cxxflags_warn + get_option('cpp_args'))
 else
   var_cxxflags = ''
 endif
-- 
2.43.0

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-15 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 13 Mar 2024 16:00:46 +0800,
  jian he  wrote:

>> More use cases will help us which callbacks are needed. We
>> will be able to collect more use cases by providing basic
>> callbacks.

> Let's say the customized format is "csv1", but it is just analogous to
> the csv format.
> people should be able to create an extension, with serval C functions,
> then they can do `copy (select 1 ) to stdout (format 'csv1');`
> but the output will be exact same as `copy (select 1 ) to stdout
> (format 'csv');`

Thanks for sharing one use case but I think that we need
real-world use cases to consider our APIs.

For example, JSON support that is currently discussing in
another thread is a real-world use case. My Apache Arrow
support is also another real-world use case.


Thanks,
-- 
kou

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-03-13 Thread Sutou Kouhei

Hi,

In 
  "Re: meson: Specify -Wformat as a common warning flag for extensions" on Wed, 
13 Mar 2024 00:43:11 -0500,
  "Tristan Partin"  wrote:

> Perhaps adding some more clarification in the comments that I wrote.
> 
> -  # -Wformat-security requires -Wformat, so check for it
> + # -Wformat-secuirty requires -Wformat. We compile with -Wall in + #
> Postgres, which includes -Wformat=1. -Wformat is shorthand for + #
> -Wformat=1. The set of flags which includes -Wformat-security is + #
> persisted into pg_config --cflags, which is commonly used by + #
> PGXS-based extensions. The lack of -Wformat in the persisted flags
> + # will produce a warning on many GCC versions, so even though adding
> + # -Wformat here is a no-op for Postgres, it silences other use
> cases.
> 
> That might be too long-winded though :).

Thanks for the wording! I used it for the v3 patch.


Thanks,
-- 
kou

>From 0ba2e6dd55b00ee8a57313a629a1e5fa8c9e40cc Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Wed, 13 Mar 2024 16:10:34 +0900
Subject: [PATCH v3] Add -Wformat to common warning flags

We specify -Wformat-security as a common warning flag explicitly. GCC
requires -Wformat to be added for -Wformat-security to take effect. If
-Wformat-security is used without -Wformat, GCC shows the following
warning:

cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]

Note that this is not needed for PostgreSQL itself because PostgreSQL
uses -Wall, which includes -Wformat=1. -Wformat is shorthand for
-Wformat=1. These flags without -Wall are persisted into "pg_config
--cflags", which is commonly used by PGXS-based extensions. So
PGXS-based extensions will get the warning without -Wformat.

Co-authored-by: Tristan Partin 
---
 configure| 99 
 configure.ac | 10 ++
 meson.build  |  9 +
 3 files changed, 118 insertions(+)

diff --git a/configure b/configure
index 70a1968003..7b0fda3825 100755
--- a/configure
+++ b/configure
@@ -6013,6 +6013,105 @@ if test x"$pgac_cv_prog_CXX_cxxflags__Wshadow_compatible_local" = x"yes"; then
 fi
 
 
+  # -Wformat-security requires -Wformat. We compile with -Wall in
+  # PostgreSQL, which includes -Wformat=1. -Wformat is shorthand for
+  # -Wformat=1. The set of flags which includes -Wformat-security is
+  # persisted into pg_config --cflags, which is commonly used by
+  # PGXS-based extensions. The lack of -Wformat in the persisted flags
+  # will produce a warning on many GCC versions, so even though adding
+  # -Wformat here is a no-op for PostgreSQL, it silences other use
+  # cases., so check for it.
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -Wformat, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -Wformat, for CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__Wformat+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS} -Wformat"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__Wformat=yes
+else
+  pgac_cv_prog_CC_cflags__Wformat=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__Wformat" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__Wformat" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__Wformat" = x"yes"; then
+  CFLAGS="${CFLAGS} -Wformat"
+fi
+
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -Wformat, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports -Wformat, for CXXFLAGS... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__Wformat+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS} -Wformat"
+ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+ac_cxx_werror_flag=yes
+ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  pgac_cv_prog_CXX_cxxflags__Wformat=yes
+else
+  pgac_cv_prog_CXX_cxxflags__Wformat=

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-03-12 Thread Sutou Kouhei

Hi,

In 
  "Re: meson: Specify -Wformat as a common warning flag for extensions" on Fri, 
08 Mar 2024 10:05:27 -0600,
  "Tristan Partin"  wrote:

> Ok, I figured this out. -Wall implies -Wformat=1. We set warning_level
> to 1 in the Meson project() call, which implies -Wall, and set -Wall
> in CFLAGS for autoconf. That's the reason we don't get issues building
> Postgres. A user making use of the pg_config --cflags option, as Sutou
> is, *will* run into the aforementioned issues, since we don't
> propogate -Wall into pg_config.
> 
>   $ gcc $(pg_config --cflags) -E - < /dev/null > /dev/null
>   cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’
>   [-Wformat-security]
>   $ gcc -Wall $(pg_config --cflags) -E - < /dev/null > /dev/null
>   (nothing printed)

Thanks for explaining this. You're right. This is the reason
why we don't need this for PostgreSQL itself but we need
this for PostgreSQL extensions. Sorry. I should have
explained this in the first e-mail...


What should we do to proceed this patch?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-10 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 11 Mar 2024 08:00:00 +0800,
  jian he  wrote:

> Hi, here are my cents:
> Currently in v17, we have 3 extra functions within DoCopyTo
> CopyToStart, one time, start, doing some preliminary work.
> CopyToOneRow, doing the repetitive work, called many times, row by row.
> CopyToEnd, one time doing the closing work.
> 
> seems to need a function pointer for processing the format and other options.
> or maybe the reason is we need a one time function call before doing DoCopyTo,
> like one time initialization.

I know that JSON format wants it but can we defer it? We can
add more options later. I want to proceed this improvement
step by step.

More use cases will help us which callbacks are needed. We
will be able to collect more use cases by providing basic
callbacks.

> generally in v17, the code pattern looks like this.
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> maybe we need another bool flag like `bool buildin_format`.
> if the copy format is {csv|text|binary}  then buildin_format is true else 
> false.
> 
> so the code pattern would be:
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine && !buildin_format)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> 
> otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer
> to distinguish native copy format and extensible supported format,
> like I mentioned above?

Hmm. I may miss something but I think that we don't need the
bool flag. Because we don't set cstate->routine for native
copy formats. So we can distinguish native copy format and
extensible supported format by checking only
cstate->routine.


Thanks,
-- 
kou

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-03-08 Thread Sutou Kouhei

Hi,

In 
  "Re: meson: Specify -Wformat as a common warning flag for extensions" on Fri, 
8 Mar 2024 15:32:22 +0900,
  Michael Paquier  wrote:

> Are there version and/or
> environment requirements to be aware of?

I'm using Debian GNU/Linux sid and I can reproduce with gcc
8-13:

$ for x in {8..13}; do; echo gcc-${x}; gcc-${x} -Wformat-security -E - < 
/dev/null > /dev/null; done
gcc-8
cc1: warning: -Wformat-security ignored without -Wformat [-Wformat-security]
gcc-9
cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]
gcc-10
cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]
gcc-11
cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]
gcc-12
cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]
gcc-13
cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]
$

I tried this on Ubuntu 22.04 too but this isn't reproduced:

$ gcc-11 -Wformat-security -E - < /dev/null > /dev/null
$

It seems that Ubuntu enables -Wformat by default:

$ gcc-11 -Wno-format -Wformat-security -E - < /dev/null > /dev/null
cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]

I tried this on AlmaLinux 9 too and this is reproduced:

$ gcc --version
gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -Wformat-security -E - < /dev/null > /dev/null
cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]

> Forcing -Wformat implies more stuff that can be disabled with
> -Wno-format-contains-nul, -Wno-format-extra-args, and
> -Wno-format-zero-length, but the thing is that we're usually very
> conservative with such additions in the scripts.  See also
> 8b6f5f25102f, done, I guess, as an answer to this thread:
> https://www.postgresql.org/message-id/4D431505.9010002%40dunslane.net

I think that this is not a problem. Because the comment
added by 8b6f5f25102f ("This was included in -Wall/-Wformat
in older GCC versions") implies that we want to always use
-Wformat-security. -Wformat-security isn't worked without
-Wformat:

https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wformat-security

> If -Wformat is specified, also warn about uses of format
> functions that represent possible security problems.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-07 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 7 Mar 2024 15:32:01 +0900,
  Michael Paquier  wrote:

> While on it, here are some profiles based on HEAD and v17 with the
> previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
> 
...
> 
> So, in short, and that's not really a surprise, there is no effect
> once we use the dispatching with the routines only when a format would
> want to plug-in with the APIs, but a custom format would still have a
> penalty of a few percents for both if bottlenecked on CPU.

Thanks for sharing these profiles!
I agree with you.

This shows that the v17 approach doesn't affect the current
text/csv/binary implementations. (The v17 approach just adds
2 new structs, Copy{From,To}Rountine, without changing the
current text/csv/binary implementations.)

Can we push the v17 patch and proceed following
implementations? Could someone (especially a PostgreSQL
committer) take a look at this for double-check?

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-05 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 5 Mar 2024 15:16:33 +0900,
  Michael Paquier  wrote:

> CopyOneRowTo() could do something like that to avoid the extra
> indentation:
> if (cstate->routine)
> {
> cstate->routine->CopyToOneRow(cstate, slot);
> MemoryContextSwitchTo(oldcontext);
> return;
> }

OK. The v17 patch uses this style. Others are same as the
v16.

> I didn't know this trick.  That's indeed nice..  I may use that for
> other stuff to make patches more presentable to the eyes.  And that's
> available as well with `git diff`.

:-)

> If we basically agree about this part, how would the rest work out
> with this set of APIs and the possibility to plug in a custom value
> for FORMAT to do a pg_proc lookup, including an example of how these
> APIs can be used?

I'll send the following patches after this patch is
merged. They are based on the v6 patch[1]:

1. Add copy_handler
   * This also adds a pg_proc lookup for custom FORMAT
   * This also adds a test for copy_handler
2. Export CopyToStateData
   * We need it to implement custom copy TO handler
3. Add needed APIs to implement custom copy TO handler
   * Add CopyToStateData::opaque
   * Export CopySendEndOfRow()
4. Export CopyFromStateData
   * We need it to implement custom copy FROM handler
5. Add needed APIs to implement custom copy FROM handler
   * Add CopyFromStateData::opaque
   * Export CopyReadBinaryData()

[1] 
https://www.postgresql.org/message-id/flat/20240124.144936.67229716500876806.kou%40clear-code.com#f1ad092fc5e81fe38d3c376559efd52c


Thanks,
-- 
kou
>From a78b8ee88575e2c2873afc3acf3c8c4e535becf0 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v17] Add CopyFromRoutine/CopyToRountine

They are for implementing custom COPY FROM/TO format. But this is not
enough to implement custom COPY FROM/TO format yet. We'll export some
APIs to receive/send data and add "format" option to COPY FROM/TO
later.

Existing text/csv/binary format implementations don't use
CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.

[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz

Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyfrom.c  |  24 +-
 src/backend/commands/copyfromparse.c |   5 ++
 src/backend/commands/copyto.c|  25 +-
 src/include/commands/copyapi.h   | 100 +++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 155 insertions(+), 5 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c3bc897028..9bf2f6497e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1623,12 +1623,22 @@ BeginCopyFrom(ParseState *pstate,
 
 		/* Fetch the input function and typioparam info */
 		if (cstate->opts.binary)
+		{
 			getTypeBinaryInputInfo(att->atttypid,
    _func_oid, [attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
+		else if (cstate->routine)
+			cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+			_functions[attnum - 1],
+			[attnum - 1]);
+
 		else
+		{
 			getTypeInputInfo(att->atttypid,
 			 _func_oid, [attnum - 1]);
-		fmgr_info(in_func_oid, _functions[attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
 
 		/* Get default info if available */
 		defexprs[attnum - 1] = NULL;
@@ -1768,10 +1778,13 @@ BeginCopyFrom(ParseState *pstate,
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
-
-	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	else if (cstate->routine)
 	{
+		cstate->routine->CopyFromStart(cstate, tupDesc);
+	}
+	else
+	{
+		/* create workspace for CopyReadAttributes results */
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
 		cstate->max_fields = attr_count;
@@ -1789,6 +1802,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+	if (cstate->routine)
+		cstate->routine->CopyFromEnd(cstate);
+
 	/* No COPY FROM related resources except memory. */
 	if (cstate->is_program)
 	{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..8b15080585 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -978,6 +978,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
 		Assert(fieldno == attr_count);
 	}
+	else if (cstate->routine)
+	{
+		if (!cstate->routin

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-03 Thread Sutou Kouhei

Hi,

In <20240301.154443.618034282613922707@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 01 Mar 2024 15:44:43 +0900 (JST),
  Sutou Kouhei  wrote:

>> I guess so.  It does not make much of a difference, though.  The thing
>> is that the dispatch caused by the custom callbacks called for each
>> row is noticeable in any profiles I'm taking (not that much in the
>> worst-case scenarios, still a few percents), meaning that this impacts
>> the performance for all the in-core formats (text, csv, binary) as
>> long as we refactor text/csv/binary to use the routines of copyapi.h.
>> I don't really see a way forward, except if we don't dispatch the
>> in-core formats to not impact the default cases.  That makes the code
>> a bit less elegant, but equally efficient for the existing formats.
> 
> It's an option based on your profile result but your
> execution result also shows that v15 is faster than HEAD [1]:
> 
>> I am getting faster runtimes with v15 (6232ms in average)
>> vs HEAD (6550ms) at 5M rows with COPY TO
> 
> [1] 
> https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889
> 
> I think that faster runtime is beneficial than mysterious
> profile for users. So I think that we can merge v15 to
> master.

If this is a blocker of making COPY format extendable, can
we defer moving the existing text/csv/binary format
implementations to Copy{From,To}Routine for now as Michael
suggested to proceed making COPY format extendable? (Can we
add Copy{From,To}Routine without changing the existing
text/csv/binary format implementations?)

I attach a patch for it.

There is a large hunk for CopyOneRowTo() that is caused by
indent change. I also attach "...-w.patch" that uses "git
-w" to remove space only changes. "...-w.patch" is only for
review. We should use .patch without -w for push.


Thanks,
-- 
kou
>From 6a5bfc8e104f0a339b421028e9fec69a4d092671 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v16] Add CopyFromRoutine/CopyToRountine

They are for implementing custom COPY FROM/TO format. But this is not
enough to implement custom COPY FROM/TO format yet. We'll export some
APIs to receive/send data and add "format" option to COPY FROM/TO
later.

Existing text/csv/binary format implementations don't use
CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.

[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz

Note that this doesn't change existing text/csv/binary format
implementations. There are many diffs for CopyOneRowTo() but they're
caused by indentation. They don't change implementations.
---
 src/backend/commands/copyfrom.c  |  24 +-
 src/backend/commands/copyfromparse.c |   5 ++
 src/backend/commands/copyto.c| 103 ++-
 src/include/commands/copyapi.h   | 100 ++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 193 insertions(+), 45 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c3bc897028..9bf2f6497e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1623,12 +1623,22 @@ BeginCopyFrom(ParseState *pstate,
 
 		/* Fetch the input function and typioparam info */
 		if (cstate->opts.binary)
+		{
 			getTypeBinaryInputInfo(att->atttypid,
    _func_oid, [attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
+		else if (cstate->routine)
+			cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+			_functions[attnum - 1],
+			[attnum - 1]);
+
 		else
+		{
 			getTypeInputInfo(att->atttypid,
 			 _func_oid, [attnum - 1]);
-		fmgr_info(in_func_oid, _functions[attnum - 1]);
+			fmgr_info(in_func_oid, _functions[attnum - 1]);
+		}
 
 		/* Get default info if available */
 		defexprs[attnum - 1] = NULL;
@@ -1768,10 +1778,13 @@ BeginCopyFrom(ParseState *pstate,
 		/* Read and verify binary header */
 		ReceiveCopyBinaryHeader(cstate);
 	}
-
-	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
+	else if (cstate->routine)
 	{
+		cstate->routine->CopyFromStart(cstate, tupDesc);
+	}
+	else
+	{
+		/* create workspace for CopyReadAttributes results */
 		AttrNumber	attr_count = list_length(cstate->attnumlist);
 
 		cstate->max_fields = attr_count;
@@ -1789,6 +1802,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+	if (cstate->routine)

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-29 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 1 Mar 2024 14:31:38 +0900,
  Michael Paquier  wrote:

> I guess so.  It does not make much of a difference, though.  The thing
> is that the dispatch caused by the custom callbacks called for each
> row is noticeable in any profiles I'm taking (not that much in the
> worst-case scenarios, still a few percents), meaning that this impacts
> the performance for all the in-core formats (text, csv, binary) as
> long as we refactor text/csv/binary to use the routines of copyapi.h.
> I don't really see a way forward, except if we don't dispatch the
> in-core formats to not impact the default cases.  That makes the code
> a bit less elegant, but equally efficient for the existing formats.

It's an option based on your profile result but your
execution result also shows that v15 is faster than HEAD [1]:

> I am getting faster runtimes with v15 (6232ms in average)
> vs HEAD (6550ms) at 5M rows with COPY TO

[1] 
https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889

I think that faster runtime is beneficial than mysterious
profile for users. So I think that we can merge v15 to
master.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-29 Thread Sutou Kouhei

Hi,

In <20240222.183948.518018047578925034@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 22 Feb 2024 18:39:48 +0900 (JST),
  Sutou Kouhei  wrote:

> How about adding "is_csv" to CopyReadline() and
> CopyReadLineText() too?

I tried this on my environment. This is a change for COPY
FROM not COPY TO but this decreases COPY TO
performance with [1]... Hmm...

master:   697.693 msec (the best case)
v15:  576.374 msec (the best case)
v15+this: 593.559 msec (the best case)

[1] COPY (SELECT 
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
 generate_series(1, 100::int4)) TO '/dev/null' \watch c=15

So I think that v15 is good.


perf result of master:

# Children  Self  Command   Shared Object  Symbol   

#       .  
.
#
31.39%14.54%  postgres  postgres   [.] CopyOneRowTo
|--17.00%--CopyOneRowTo
|  |--10.61%--FunctionCall1Coll
|  |   --8.40%--int2out
|  | |--2.58%--pg_ltoa
|  | |   --0.68%--pg_ultoa_n
|  | |--1.11%--pg_ultoa_n
|  | |--0.83%--AllocSetAlloc
|  | 
|--0.69%--__memcpy_avx_unaligned_erms (inlined)
|  | |--0.58%--FunctionCall1Coll
|  |  --0.55%--memcpy@plt
|  |--3.25%--appendBinaryStringInfo
|  |   --0.56%--pg_ultoa_n
|   --0.69%--CopyAttributeOutText

perf result of v15:

# Children  Self  Command   Shared Object  Symbol   

#       .  
.
#
25.60%10.47%  postgres  postgres   [.] CopyToTextOneRow
|--15.39%--CopyToTextOneRow
|  |--10.44%--FunctionCall1Coll
|  |  |--7.25%--int2out
|  |  |  |--2.60%--pg_ltoa
|  |  |  |   --0.71%--pg_ultoa_n
|  |  |  |--0.90%--FunctionCall1Coll
|  |  |  |--0.84%--pg_ultoa_n
|  |  |   --0.66%--AllocSetAlloc
|  |  |--0.79%--ExecProjectSet
|  |   --0.68%--int4out
|  |--2.50%--appendBinaryStringInfo
|   --0.53%--CopyAttributeOutText


The profiles on Michael's environment [2] showed that
CopyOneRow() % was increased by v15. But it
(CopyToTextOneRow() % not CopyOneRow() %) wasn't increased
by v15. It's decreased instead.

[2] 
https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889

So I think that v15 doesn't have performance regression but
my environment isn't suitable for benchmark...


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-22 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 22 Feb 2024 15:44:16 +0900,
  Michael Paquier  wrote:

> I was comparing what you have here, and what's been attached by Andres
> at [1] and the top of the changes on my development branch at [2]
> (v3-0008, mostly).  And, it strikes me that there is no need to do any
> major changes in any of the callbacks proposed up to v13 and v14 in
> this thread, as all the changes proposed want to plug in more data
> into each StateData for COPY FROM and COPY TO, the best part being
> that v3-0008 can just reuse the proposed callbacks as-is.  v1-0001
> from Sutou-san would need one slight tweak in the per-row callback,
> still that's minor.

I think so too. But I thought that some minor conflicts will
be happen with this and the v15. So I worked on this before
the v15.

We agreed that this optimization doesn't block v15: [1]
So we can work on the v15 without this optimization for now.

[1] 
https://www.postgresql.org/message-id/flat/20240219195351.5vy7cdl3wxia66kg%40awork3.anarazel.de#20f9677e074fb0f8c5bb3994ef059a15

> I have been spending more time on the patch to introduce the COPY
> APIs, leading me to the v15 attached, where I have replaced the
> previous attribute callbacks for the output representation and the
> reads with hardcoded routines that should be optimized by compilers,
> and I have done more profiling with -O2.

Thanks! I wanted to work on it but I didn't have enough time
for it in a few days...

I've reviewed the v15.


> @@ -751,8 +751,9 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int 
> nbytes)
>   *
>   * NOTE: force_not_null option are not applied to the returned fields.
>   */
> -bool
> -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> +static bool

"inline" is missing here.

> +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields,
> +   bool is_csv)
>  {
>   int fldct;


How about adding "is_csv" to CopyReadline() and
CopyReadLineText() too?


diff --git a/src/backend/commands/copyfromparse.c 
b/src/backend/commands/copyfromparse.c
index 25b8d4bc52..79fabecc69 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -150,8 +150,8 @@ static const char BinarySignature[11] = 
"PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static inline bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static inline int CopyReadAttributesText(CopyFromState cstate);
 static inline int CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -770,7 +770,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, 
int *nfields,
tupDesc = RelationGetDescr(cstate->rel);
 
cstate->cur_lineno++;
-   done = CopyReadLine(cstate);
+   done = CopyReadLine(cstate, is_csv);
 
if (cstate->opts.header_line == COPY_HEADER_MATCH)
{
@@ -823,7 +823,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, 
int *nfields,
cstate->cur_lineno++;
 
/* Actually read the line into memory here */
-   done = CopyReadLine(cstate);
+   done = CopyReadLine(cstate, is_csv);
 
/*
 * EOF at start of line means we're done.  If we see EOF after some
@@ -1133,8 +1133,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * by newline.  The terminating newline or EOF marker is not included
  * in the final value of line_buf.
  */
-static bool
-CopyReadLine(CopyFromState cstate)
+static inline bool
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
boolresult;
 
@@ -1142,7 +1142,7 @@ CopyReadLine(CopyFromState cstate)
cstate->line_buf_valid = false;
 
/* Parse data and transfer into line_buf */
-   result = CopyReadLineText(cstate);
+   result = CopyReadLineText(cstate, is_csv);
 
if (result)
{
@@ -1209,8 +1209,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
char   *copy_input_buf;
int input_buf_ptr;
@@ -1226,7 +1226,7 @@ CopyReadLineText(CopyFromState cstate)
charquotec = '\0';
charescapec = '\0';
 
-   if (cstate->opts.csv_mode)
+   if (is_csv)
{
quotec = cstate->opts.quote[0];
escapec = cstate->opts.escape[0];
@@ -1306,7 +1306,7 @@ CopyReadLineText(CopyFromState cstate)
prev_raw_ptr =

Re: meson: Specify -Wformat as a common warning flag for extensions

2024-02-21 Thread Sutou Kouhei

Hi,

Could someone take a look at this?

Patch is attached in the original e-mail:
https://www.postgresql.org/message-id/20240122.141139.931086145628347157.kou%40clear-code.com


Thanks,
-- 
kou

In <20240122.141139.931086145628347157@clear-code.com>
  "meson: Specify -Wformat as a common warning flag for extensions" on Mon, 22 
Jan 2024 14:11:39 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> I'm an extension developer. If I use PostgreSQL built with
> Meson, I get the following warning:
> 
> cc1: warning: '-Wformat-security' ignored without '-Wformat' 
> [-Wformat-security]
> 
> Because "pg_config --cflags" includes -Wformat-security but
> doesn't include -Wformat.
> 
> Can we specify -Wformat as a common warning flag too? If we
> do it, "pg_config --cflags" includes both of
> -Wformat-security and -Wformat. So I don't get the warning.
> 
> 
> Thanks,
> -- 
> kou

Re: Why is pq_begintypsend so slow?

2024-02-21 Thread Sutou Kouhei

Hi,

In <20240219195351.5vy7cdl3wxia6...@awork3.anarazel.de>
  "Re: Why is pq_begintypsend so slow?" on Mon, 19 Feb 2024 11:53:51 -0800,
  Andres Freund  wrote:

>> I don't have a strong opinion how to implement this
>> optimization as I said above. It seems that you like your
>> approach. So I withdraw [1]. Could you complete this
>> optimization? Can we proceed making COPY format extensible
>> without this optimization? It seems that it'll take a little
>> time to complete this optimization because your patch is
>> still WIP. And it seems that you can work on it after making
>> COPY format extensible.
> 
> I don't think optimizing this aspect needs to block making copy extensible.

OK. I'll work on making copy extensible without this
optimization.

> I don't know how much time/energy I'll have to focus on this in the near
> term. I really just reposted this because the earlier patches were relevant
> for the discussion in another thread.  If you want to pick the COPY part up,
> feel free.

OK. I may work on this after I complete making copy
extensible if you haven't completed this yet.


Thanks,
-- 
kou

Re: Why is pq_begintypsend so slow?

2024-02-18 Thread Sutou Kouhei

Hi,

In <20240218200906.zvihkrs46yl2j...@awork3.anarazel.de>
  "Re: Why is pq_begintypsend so slow?" on Sun, 18 Feb 2024 12:09:06 -0800,
  Andres Freund  wrote:

>> [1] 
>> https://www.postgresql.org/message-id/flat/20240215.153421.96888103784986803.kou%40clear-code.com#34df359e6d255795d16814ce138cc995

>> Do we need one FunctionCallInfoBaseData for each attribute?
>> How about sharing one FunctionCallInfoBaseData by all
>> attributes like [1]?
> 
> That makes no sense to me. You're throwing away most of the possible gains by
> having to update the FunctionCallInfo fields on every call. You're saving
> neglegible amounts of memory at a substantial runtime cost.

The number of updated fields of your approach and [1] are
same:

Your approach: 6 (I think that "fcinfo->isnull = false" is
needed though.)

+   fcinfo->args[0].value = CStringGetDatum(string);
+   fcinfo->args[0].isnull = false;
+   fcinfo->args[1].value = 
ObjectIdGetDatum(attr->typioparam);
+   fcinfo->args[1].isnull = false;
+   fcinfo->args[2].value = 
Int32GetDatum(attr->typmod);
+   fcinfo->args[2].isnull = false;

[1]: 6 (including "fcinfo->isnull = false")

+   fcinfo->flinfo = flinfo;
+   fcinfo->context = escontext;
+   fcinfo->isnull = false;
+   fcinfo->args[0].value = CStringGetDatum(str);
+   fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+   fcinfo->args[2].value = Int32GetDatum(typmod);


>> Inlining InputFuncallCallSafe() here to use pre-initialized
>> fcinfo will decrease maintainability. Because we abstract
>> InputFunctionCall family in fmgr.c. If we define a
>> InputFunctionCall variant here, we need to change both of
>> fmgr.c and here when InputFunctionCall family is changed.
>> How about defining details in fmgr.c and call it here
>> instead like [1]?
> 
> I'm not sure I buy that that is a problem. It's not like my approach was
> actually bypassing fmgr abstractions alltogether - instead it just used the
> lower level APIs, because it's a performance sensitive area.

[1] provides some optimized abstractions, which are
implemented with lower level APIs, without breaking the
abstractions.

Note that I don't have a strong opinion how to implement
this optimization. If other developers think this approach
makes sense for this optimization, I don't object it.

>> @@ -1966,7 +1992,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo 
>> *flinfo,
>>  if (fld_size == -1)
>>  {
>>  *isnull = true;
>> -return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
>> +return ReceiveFunctionCall(fcinfo->flinfo, NULL, 
>> attr->typioparam, attr->typmod);
>> 
>> Why pre-initialized fcinfo isn't used here?
> 
> Because it's a prototype and because I don't think it's a common path.

How about adding a comment why we don't need to optimize
this case?


I don't have a strong opinion how to implement this
optimization as I said above. It seems that you like your
approach. So I withdraw [1]. Could you complete this
optimization? Can we proceed making COPY format extensible
without this optimization? It seems that it'll take a little
time to complete this optimization because your patch is
still WIP. And it seems that you can work on it after making
COPY format extensible.


Thanks,
-- 
kou

Re: Why is pq_begintypsend so slow?

2024-02-18 Thread Sutou Kouhei

Hi,

In <20240218015955.rmw5mcmobt5hb...@awork3.anarazel.de>
  "Re: Why is pq_begintypsend so slow?" on Sat, 17 Feb 2024 17:59:55 -0800,
  Andres Freund  wrote:

> v3-0008-wip-make-in-out-send-recv-calls-in-copy.c-cheaper.patch

It seems that this is an alternative approach of [1].

[1] 
https://www.postgresql.org/message-id/flat/20240215.153421.96888103784986803.kou%40clear-code.com#34df359e6d255795d16814ce138cc995


+typedef struct CopyInAttributeInfo
+{
+   AttrNumber  num;
+   const char *name;
+
+   Oid typioparam;
+   int32   typmod;
+
+   FmgrInfoin_finfo;
+   union
+   {
+   FunctionCallInfoBaseData fcinfo;
+   charfcinfo_data[SizeForFunctionCallInfo(3)];
+   }   in_fcinfo;

Do we need one FunctionCallInfoBaseData for each attribute?
How about sharing one FunctionCallInfoBaseData by all
attributes like [1]?


@@ -956,20 +956,47 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
values[m] = ExecEvalExpr(defexprs[m], econtext, 
[m]);
}
-
-   /*
-* If ON_ERROR is specified with IGNORE, skip rows with 
soft
-* errors
-*/
-   else if (!InputFunctionCallSafe(_functions[m],
-   
string,
-   
typioparams[m],
-   
att->atttypmod,
-   
(Node *) cstate->escontext,
-   
[m]))

Inlining InputFuncallCallSafe() here to use pre-initialized
fcinfo will decrease maintainability. Because we abstract
InputFunctionCall family in fmgr.c. If we define a
InputFunctionCall variant here, we need to change both of
fmgr.c and here when InputFunctionCall family is changed.
How about defining details in fmgr.c and call it here
instead like [1]?

+   fcinfo->args[0].value = CStringGetDatum(string);
+   fcinfo->args[0].isnull = false;
+   fcinfo->args[1].value = 
ObjectIdGetDatum(attr->typioparam);
+   fcinfo->args[1].isnull = false;
+   fcinfo->args[2].value = 
Int32GetDatum(attr->typmod);
+   fcinfo->args[2].isnull = false;

I think that "fcinfo->isnull = false;" is also needed like
[1].

@@ -1966,7 +1992,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo 
*flinfo,
if (fld_size == -1)
{
*isnull = true;
-   return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+   return ReceiveFunctionCall(fcinfo->flinfo, NULL, 
attr->typioparam, attr->typmod);

Why pre-initialized fcinfo isn't used here?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-15 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 15 Feb 2024 17:09:20 +0800,
  jian he  wrote:

> My environment is slow (around 10x) but consistent.
> I see around 2-3 percent increase consistently.
> (with patch 7369.068 ms, without patch 7574.802 ms)

Thanks for sharing your numbers! It will help us to
determine whether these changes improve performance or not.

> the patchset looks good in my eyes, i can understand it.
> however I cannot apply it cleanly against the HEAD.

Hmm, I used 9bc1eee988c31e66a27e007d41020664df490214 as the
base version. But both patches based on the same
revision. So we may not be able to apply both patches at
once cleanly.

> +/*
> + * Prepare callinfo for InputFunctionCallSafeWithInfo to reuse one callinfo
> + * instead of initializing it for each call. This is for performance.
> + */
> +FunctionCallInfoBaseData *
> +PrepareInputFunctionCallInfo(void)
> +{
> + FunctionCallInfoBaseData *fcinfo;
> +
> + fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
> 
> just wondering, I saw other similar places using palloc0,
> do we need to use palloc0?

I think that we don't need to use palloc0() here because the
following InitFunctionCallInfoData() call initializes all
members explicitly.

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-14 Thread Sutou Kouhei

Hi,

In <20240213.173340.1518143507526518973@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 13 Feb 2024 17:33:40 +0900 (JST),
  Sutou Kouhei  wrote:

> I'll reply other comments later...

I've read other comments and my answers for them are same as
Michael's one.


I'll prepare the v15 patch with static inline functions and
fixed arguments after the fcinfo cache patches are merged. I
think that the v15 patch will be conflicted with fcinfo
cache patches.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-14 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 14 Feb 2024 15:52:36 +0900,
  Michael Paquier  wrote:

>> How about InputFunctionCallSafeWithInfo(),
>> InputFunctionCallSafeInfo() or
>> InputFunctionCallInfoCallSafe()?
> 
> WithInfo() would not be a new thing.  There are a couple of APIs named
> like this when manipulating catalogs, so that sounds kind of a good
> choice from here.

Thanks for the info. Let's use InputFunctionCallSafeWithInfo().
See that attached patch:
v2-0001-Reuse-fcinfo-used-in-COPY-FROM.patch

I also attach a patch for COPY TO:
v1-0001-Reuse-fcinfo-used-in-COPY-TO.patch

I measured the COPY TO patch on my environment with:
COPY (SELECT 
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
 generate_series(1, 100::int4)) TO '/dev/null' \watch c=5

master:
740.066ms
734.884ms
738.579ms
734.170ms
727.953ms

patched:
730.714ms
741.483ms
714.149ms
715.436ms
713.578ms

It seems that it improves performance a bit but my
environment isn't suitable for benchmark. So they may not
be valid numbers.


Thanks,
-- 
kou
>From b677732f46f735a5601b889f79671e91be41 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Thu, 15 Feb 2024 15:01:08 +0900
Subject: [PATCH v2] Reuse fcinfo used in COPY FROM

Each NextCopyFrom() calls input functions or binary-input
functions. We can reuse fcinfo for them instead of creating a local
fcinfo for each call. This will improve performance.
---
 src/backend/commands/copyfrom.c  |   4 +
 src/backend/commands/copyfromparse.c |  20 ++--
 src/backend/utils/fmgr/fmgr.c| 126 +++
 src/include/commands/copyfrom_internal.h |   1 +
 src/include/fmgr.h   |  12 +++
 5 files changed, 154 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..ed375c012e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
 	/* We keep those variables in cstate. */
 	cstate->in_functions = in_functions;
 	cstate->typioparams = typioparams;
+	if (cstate->opts.binary)
+		cstate->fcinfo = PrepareReceiveFunctionCallInfo();
+	else
+		cstate->fcinfo = PrepareInputFunctionCallInfo();
 	cstate->defmap = defmap;
 	cstate->defexprs = defexprs;
 	cstate->volatile_defexprs = volatile_defexprs;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..7907e16ea8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -861,6 +861,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 num_defaults = cstate->num_defaults;
 	FmgrInfo   *in_functions = cstate->in_functions;
 	Oid		   *typioparams = cstate->typioparams;
+	FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
 	int			i;
 	int		   *defmap = cstate->defmap;
 	ExprState **defexprs = cstate->defexprs;
@@ -961,12 +962,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
 			 * errors
 			 */
-			else if (!InputFunctionCallSafe(_functions[m],
-			string,
-			typioparams[m],
-			att->atttypmod,
-			(Node *) cstate->escontext,
-			[m]))
+			else if (!InputFunctionCallSafeWithInfo(fcinfo,
+	_functions[m],
+	string,
+	typioparams[m],
+	att->atttypmod,
+	(Node *) cstate->escontext,
+	[m]))
 			{
 cstate->num_errors++;
 return true;
@@ -1966,7 +1968,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	if (fld_size == -1)
 	{
 		*isnull = true;
-		return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+		return ReceiveFunctionCallWithInfo(cstate->fcinfo, flinfo, NULL, typioparam, typmod);
 	}
 	if (fld_size < 0)
 		ereport(ERROR,
@@ -1987,8 +1989,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	cstate->attribute_buf.data[fld_size] = '\0';
 
 	/* Call the column type's binary input converter */
-	result = ReceiveFunctionCall(flinfo, >attribute_buf,
- typioparam, typmod);
+	result = ReceiveFunctionCallWithInfo(cstate->fcinfo, flinfo, >attribute_buf,
+		 typioparam, typmod);
 
 	/* Trouble if it didn't eat the whole buffer */
 	if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..14c3ed2bdb 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1672,6 +1672,73 @@ DirectInputFunctionCallSafe(PGFunction func, char *str,
 	return true;
 }
 
+/*
+ * Prepare callinfo f

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 14 Feb 2024 12:28:38 +0900,
  Michael Paquier  wrote:

>> How about the attached patch approach? If it's a desired
>> approach, I can also write a separated patch for COPY TO.
> 
> Hmm, I have not studied that much, but my first impression was that we
> would not require any new facility in fmgr.c, but perhaps you're right
> and it's more elegant to pass a InitFunctionCallInfoData this way.

I'm not familiar with the fmgr.c related code base but it
seems that we abstract {,binary-}input function call by
fmgr.c. So I think that it's better that we follow the
design. (If there is a person who knows the fmgr.c related
code base, please help us.)

> PrepareInputFunctionCallInfo() looks OK as a name, but I'm less a fan
> of PreparedInputFunctionCallSafe() and its "Prepared" part.  How about
> something like ExecuteInputFunctionCallSafe()?

I understand the feeling. SQL uses "prepared" for "prepared
statement". There are similar function names such as
InputFunctionCall()/InputFunctionCallSafe()/DirectInputFunctionCallSafe(). They
execute (call) an input function but they use "call" not
"execute" for it... So "Execute...Call..." may be
redundant...

How about InputFunctionCallSafeWithInfo(),
InputFunctionCallSafeInfo() or
InputFunctionCallInfoCallSafe()?

> I may be able to look more at that next week, and I would surely check
> the impact of that with a simple COPY query throttled by CPU (more
> rows and more attributes the better).

Thanks!

>Note that I've reverted 06bd311bce24 for
> the moment, as this is just getting in the way of the main patch, and
> that was non-optimal once there is a per-row callback.

Thanks for sharing the information. I'll rebase on master
when I create the v15 patch.

>> diff --git a/src/backend/commands/copyfrom.c 
>> b/src/backend/commands/copyfrom.c
>> index 41f6bc43e4..a43c853e99 100644
>> --- a/src/backend/commands/copyfrom.c
>> +++ b/src/backend/commands/copyfrom.c
>> @@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
>>  /* We keep those variables in cstate. */
>>  cstate->in_functions = in_functions;
>>  cstate->typioparams = typioparams;
>> +if (cstate->opts.binary)
>> +cstate->fcinfo = PrepareInputFunctionCallInfo();
>> +else
>> +cstate->fcinfo = PrepareReceiveFunctionCallInfo();
> 
> Perhaps we'd better avoid more callbacks like that, for now.

I'll not use a callback for this. I'll not change this part
after we introduce Copy{To,From}Routine. cstate->fcinfo
isn't used some custom COPY format handlers such as Apache
Arrow handler like cstate->in_functions and
cstate->typioparams. But they will be always allocated. It's
a bit wasteful for those handlers but we may not care about
it. So we can always use "if (state->opts.binary)" condition
here.

BTW... This part was wrong... Sorry... It should be:

if (cstate->opts.binary)
cstate->fcinfo = PrepareReceiveFunctionCallInfo();
else
cstate->fcinfo = PrepareInputFunctionCallInfo();

Thanks,
-- 
kou

Re: confusing / inefficient "need_transcoding" handling in copy

2024-02-13 Thread Sutou Kouhei

Hi,

In 
  "Re: confusing / inefficient "need_transcoding" handling in copy" on Wed, 14 
Feb 2024 06:56:16 +0900,
  Michael Paquier  wrote:

> We have a couple of non-ASCII characters in the tests, but I suspect
> that this one will not be digested correctly everywhere, even if
> EUC_JP should be OK to use for the check.  How about writing an
> arbitrary sequence of bytes into a temporary file that gets used for 
> the COPY FROM instead?  See for example how we do that with
> abs_builddir in copy.sql.

It makes sense. How about the attached patch?


Thanks,
-- 
kou
>From 6eb9669f97c54f8b85fac63db40ad80664692d12 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Wed, 14 Feb 2024 11:44:13 +0900
Subject: [PATCH v2] Add a test for invalid encoding for COPY FROM

The test data use an UTF-8 character (U+3042 HIRAGANA LETTER A) but
the test specifies EUC_JP. So it's an invalid data.
---
 src/test/regress/expected/copyencoding.out | 13 +
 src/test/regress/parallel_schedule |  2 +-
 src/test/regress/sql/copyencoding.sql  | 15 +++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/copyencoding.out
 create mode 100644 src/test/regress/sql/copyencoding.sql

diff --git a/src/test/regress/expected/copyencoding.out b/src/test/regress/expected/copyencoding.out
new file mode 100644
index 00..32a9d918fa
--- /dev/null
+++ b/src/test/regress/expected/copyencoding.out
@@ -0,0 +1,13 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+-- directory paths are passed to us in environment variables
+\getenv abs_builddir PG_ABS_BUILDDIR
+CREATE TABLE test (t text);
+\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv'
+-- U+3042 HIRAGANA LETTER A
+COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP');
+ERROR:  invalid byte sequence for encoding "EUC_JP": 0xe3 0x81
+CONTEXT:  COPY test, line 1
+DROP TABLE test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..238cef28c4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -36,7 +36,7 @@ test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comment
 # execute two copy tests in parallel, to check that copy itself
 # is concurrent safe.
 # --
-test: copy copyselect copydml insert insert_conflict
+test: copy copyselect copydml copyencoding insert insert_conflict
 
 # --
 # More groups of parallel tests
diff --git a/src/test/regress/sql/copyencoding.sql b/src/test/regress/sql/copyencoding.sql
new file mode 100644
index 00..89e2124996
--- /dev/null
+++ b/src/test/regress/sql/copyencoding.sql
@@ -0,0 +1,15 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+
+-- directory paths are passed to us in environment variables
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+CREATE TABLE test (t text);
+
+\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv'
+-- U+3042 HIRAGANA LETTER A
+COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP');
+
+DROP TABLE test;
-- 
2.43.0

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Sutou Kouhei

Hi,

In <20240209192705.5qdilvviq3py2...@awork3.anarazel.de>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 9 Feb 2024 11:27:05 -0800,
  Andres Freund  wrote:

>> +static void
>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>> +   FmgrInfo *finfo, Oid *typioparam)
>> +{
>> +Oid func_oid;
>> +
>> +getTypeInputInfo(atttypid, _oid, typioparam);
>> +fmgr_info(func_oid, finfo);
>> +}
> 
> FWIW, we should really change the copy code to initialize FunctionCallInfoData
> instead of re-initializing that on every call, realy makes a difference
> performance wise.

How about the attached patch approach? If it's a desired
approach, I can also write a separated patch for COPY TO.

>> +cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
>> +/* Set read attribute callback */
>> +if (cstate->opts.csv_mode)
>> +cstate->copy_read_attributes = CopyReadAttributesCSV;
>> +else
>> +cstate->copy_read_attributes = CopyReadAttributesText;
>> +}
> 
> Isn't this precisely repeating the mistake of 2889fd23be56?

What do you think about the approach in my previous mail's
attachments?
https://www.postgresql.org/message-id/flat/20240209.163205.704848659612151781.kou%40clear-code.com#dbb1f8d7f2f0e8fe3c7e37a757fcfc54

If it's a desired approach, I can prepare a v15 patch set
based on the v14 patch set and the approach.


I'll reply other comments later...


Thanks,
-- 
kou
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 41f6bc43e4..a43c853e99 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
 	/* We keep those variables in cstate. */
 	cstate->in_functions = in_functions;
 	cstate->typioparams = typioparams;
+	if (cstate->opts.binary)
+		cstate->fcinfo = PrepareInputFunctionCallInfo();
+	else
+		cstate->fcinfo = PrepareReceiveFunctionCallInfo();
 	cstate->defmap = defmap;
 	cstate->defexprs = defexprs;
 	cstate->volatile_defexprs = volatile_defexprs;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 906756362e..e372e5efb8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -853,6 +853,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 num_defaults = cstate->num_defaults;
 	FmgrInfo   *in_functions = cstate->in_functions;
 	Oid		   *typioparams = cstate->typioparams;
+	FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
 	int			i;
 	int		   *defmap = cstate->defmap;
 	ExprState **defexprs = cstate->defexprs;
@@ -953,12 +954,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
 			 * errors
 			 */
-			else if (!InputFunctionCallSafe(_functions[m],
-			string,
-			typioparams[m],
-			att->atttypmod,
-			(Node *) cstate->escontext,
-			[m]))
+			else if (!PreparedInputFunctionCallSafe(fcinfo,
+	_functions[m],
+	string,
+	typioparams[m],
+	att->atttypmod,
+	(Node *) cstate->escontext,
+	[m]))
 			{
 cstate->num_errors++;
 return true;
@@ -1958,7 +1960,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	if (fld_size == -1)
 	{
 		*isnull = true;
-		return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+		return PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, NULL, typioparam, typmod);
 	}
 	if (fld_size < 0)
 		ereport(ERROR,
@@ -1979,8 +1981,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	cstate->attribute_buf.data[fld_size] = '\0';
 
 	/* Call the column type's binary input converter */
-	result = ReceiveFunctionCall(flinfo, >attribute_buf,
- typioparam, typmod);
+	result = PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, >attribute_buf,
+		 typioparam, typmod);
 
 	/* Trouble if it didn't eat the whole buffer */
 	if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..b0b5310219 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1672,6 +1672,73 @@ DirectInputFunctionCallSafe(PGFunction func, char *str,
 	return true;
 }
 
+/*
+ * Prepare callinfo for PreparedInputFunctionCallSafe to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareInputFunctionCallInfo(void)
+{
+	FunctionCallInfoBaseData *fcinfo;
+
+	fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
+	InitFunctionCallInfoData(*fcinfo, NULL, 3, InvalidOid, NULL, NULL);
+	fcinfo->args[0].isnull = false;
+	fcinfo->args[1].isnull = false;
+	fcinfo->args[2].isnull = false;
+	return fcinfo;
+}
+
+/*
+ * Call

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-08 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 9 Feb 2024 13:21:34 +0900,
  Michael Paquier  wrote:

>> - Revisit what we have here, looking at more profiles to see how HEAD
>> an v13 compare.  It looks like we are on a good path, but let's tackle
>> things one step at a time.
> 
> And attached is a v14 that's rebased on HEAD.

Thanks!

> A next step I think we could take is to split the binary-only and the
> text/csv-only data in each cstate into their own structure to make the
> structure, with an opaque pointer that custom formats could use, but a
> lot of fields are shared as well.

It'll make COPY code base cleaner but it may decrease
performance. How about just adding an opaque pointer to each
cstate as the next step and then try the split?

My suggestion:
1. Introduce Copy{To,From}Routine
   (We can do it based on the v14 patch.)
2. Add an opaque pointer to Copy{To,From}Routine
   (This must not have performance impact.)
3.a. Split format specific data to the opaque space
3.b. Add support for registering custom format handler by
 creating a function
4. ...

>This patch is already complicated
> enough IMO, so I'm OK to leave it out for the moment, and focus on
> making this infra pluggable as a next step.

I agree with you.

> Are there any comments about this v14?  Sutou-san?

Here are my comments:


+   /* Set read attribute callback */
+   if (cstate->opts.csv_mode)
+   cstate->copy_read_attributes = CopyReadAttributesCSV;
+   else
+   cstate->copy_read_attributes = CopyReadAttributesText;

I think that we should not use this approach for
performance. We need to use "static inline" and constant
argument instead something like the attached
remove-copy-read-attributes.diff.

We have similar codes for
CopyReadLine()/CopyReadLineText(). The attached
remove-copy-read-attributes-and-optimize-copy-read-line.diff
also applies the same optimization to
CopyReadLine()/CopyReadLineText().

I hope that this improved performance of COPY FROM.

+/*
+ * Routines assigned to each format.
++

Garbage "+"

+ * CSV and text share the same implementation, at the exception of the
+ * copy_read_attributes callback.
+ */


+/*
+ * CopyToTextOneRow
+ *
+ * Process one row for text/CSV format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate,
+TupleTableSlot *slot)
+{
...
+   if (cstate->opts.csv_mode)
+   CopyAttributeOutCSV(cstate, string,
+   
cstate->opts.force_quote_flags[attnum - 1]);
+   else
+   CopyAttributeOutText(cstate, string);
...

How about use "static inline" and constant argument approach
here too?

static inline void
CopyToTextBasedOneRow(CopyToState cstate,
  TupleTableSlot *slot,
  bool csv_mode)
{
...
if (cstate->opts.csv_mode)
CopyAttributeOutCSV(cstate, string,

cstate->opts.force_quote_flags[attnum - 1]);
else
CopyAttributeOutText(cstate, string);
...
}

static void
CopyToTextOneRow(CopyToState cstate,
 TupleTableSlot *slot,
 bool csv_mode)
{
CopyToTextBasedOneRow(cstate, slot, false);
}

static void
CopyToCSVOneRow(CopyToState cstate,
TupleTableSlot *slot,
bool csv_mode)
{
CopyToTextBasedOneRow(cstate, slot, true);
}

static const CopyToRoutine CopyCSVRoutineText = {
...
.CopyToOneRow = CopyToCSVOneRow,
...
};


Thanks,
-- 
kou
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a90b7189b5..6e244fb443 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -158,12 +158,6 @@ CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
 	attr_count = list_length(cstate->attnumlist);
 	cstate->max_fields = attr_count;
 	cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-
-	/* Set read attribute callback */
-	if (cstate->opts.csv_mode)
-		cstate->copy_read_attributes = CopyReadAttributesCSV;
-	else
-		cstate->copy_read_attributes = CopyReadAttributesText;
 }
 
 /*
@@ -221,9 +215,8 @@ CopyFromBinaryEnd(CopyFromState cstate)
 
 /*
  * Routines assigned to each format.
-+
  * CSV and text share the same implementation, at the exception of the
- * copy_read_attributes callback.
+ * CopyFromOneRow callback.
  */
 static const CopyFromRoutine CopyFromRoutineText = {
 	.CopyFromInFunc = CopyFromTextInFunc,
@@ -235,7 +228,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 static

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-08 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 7 Feb 2024 13:33:18 +0900,
  Michael Paquier  wrote:

> Hmm.  That explains why I was not seeing any differences with this
> callback then.  It seems to me that the order of actions to take is
> clear, like:
> - Revert 2889fd23be56 to keep a clean state of the tree, now done with
> 1aa8324b81fa.

Done.

> - Dive into the strlen() issue, as it really looks like this can
> create more simplifications for the patch discussed on this thread
> with COPY TO.

Done: b619852086ed2b5df76631f5678f60d3bebd3745

> - Revisit what we have here, looking at more profiles to see how HEAD
> an v13 compare.  It looks like we are on a good path, but let's tackle
> things one step at a time.

Are you already working on this? Do you want me to write the
next patch based on the current master?


Thanks,
-- 
kou

Re: confusing / inefficient "need_transcoding" handling in copy

2024-02-08 Thread Sutou Kouhei

Hi,

In <20240208.172501.2177371292839763981@clear-code.com>
  "Re: confusing / inefficient "need_transcoding" handling in copy" on Thu, 08 
Feb 2024 17:25:01 +0900 (JST),
  Sutou Kouhei  wrote:

> How about the following to avoid needless transcoding?

Oh, sorry. I missed the Michael's patch:
https://www.postgresql.org/message-id/flat/ZcR9Q9hJ8GedFSCd%40paquier.xyz#e73272b042a22befac7a95f7bcb4fb9a

I withdraw my change.


Thanks,
-- 
kou

Re: confusing / inefficient "need_transcoding" handling in copy

2024-02-08 Thread Sutou Kouhei

Hi,

In <20240206222445.hzq22pb2nye7r...@awork3.anarazel.de>
  "Re: confusing / inefficient "need_transcoding" handling in copy" on Tue, 6 
Feb 2024 14:24:45 -0800,
  Andres Freund  wrote:

> One unfortunate issue: We don't have any tests verifying that COPY FROM
> catches encoding issues.

How about the attached patch for it?


How about the following to avoid needless transcoding?


diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index bd4710a79b..80ec26cafe 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -612,13 +612,14 @@ BeginCopyTo(ParseState *pstate,
cstate->file_encoding = cstate->opts.file_encoding;
 
/*
-* Set up encoding conversion info.  Even if the file and server 
encodings
-* are the same, we must apply pg_any_to_server() to validate data in
-* multibyte encodings.
+* Set up encoding conversion info. We use pg_server_to_any() for the
+* conversion. pg_server_to_any() does nothing with an encoding that
+* equals to GetDatabaseEncoding() and PG_SQL_ASCII. If
+* cstate->file_encoding equals to GetDatabaseEncoding() and 
PG_SQL_ASCII,
+* we don't need to transcode.
 */
-   cstate->need_transcoding =
-   (cstate->file_encoding != GetDatabaseEncoding() ||
-pg_database_encoding_max_length() > 1);
+   cstate->need_transcoding = !(cstate->file_encoding == 
GetDatabaseEncoding() ||
+
cstate->file_encoding == PG_SQL_ASCII);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = 
PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 


Numbers on my environment:

master:  861.646ms
patched: 697.547ms

SQL:
COPY (SELECT 
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
 generate_series(1, 100::int4)) TO '/dev/null' \watch c=5


Thanks,
-- 
kou
From 387f11bde4eb928af23f6a66101aa9d63b3dcfdf Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Thu, 8 Feb 2024 17:17:25 +0900
Subject: [PATCH v1] Add a test for invalid encoding for COPY FROM

The test data uses UTF-8 "hello" in Japanese but the test specifies
EUC_JP. So it's an invalid data.
---
 src/test/regress/expected/copyencoding.out |  8 
 src/test/regress/parallel_schedule |  2 +-
 src/test/regress/sql/copyencoding.sql  | 10 ++
 3 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/copyencoding.out
 create mode 100644 src/test/regress/sql/copyencoding.sql

diff --git a/src/test/regress/expected/copyencoding.out b/src/test/regress/expected/copyencoding.out
new file mode 100644
index 00..aa4ecea09e
--- /dev/null
+++ b/src/test/regress/expected/copyencoding.out
@@ -0,0 +1,8 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+CREATE TABLE test (t text);
+COPY test FROM stdin WITH (ENCODING 'EUC_JP');
+ERROR:  invalid byte sequence for encoding "EUC_JP": 0xe3 0x81
+CONTEXT:  COPY test, line 1
+DROP TABLE test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..238cef28c4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -36,7 +36,7 @@ test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comment
 # execute two copy tests in parallel, to check that copy itself
 # is concurrent safe.
 # --
-test: copy copyselect copydml insert insert_conflict
+test: copy copyselect copydml copyencoding insert insert_conflict
 
 # --
 # More groups of parallel tests
diff --git a/src/test/regress/sql/copyencoding.sql b/src/test/regress/sql/copyencoding.sql
new file mode 100644
index 00..07c85e988b
--- /dev/null
+++ b/src/test/regress/sql/copyencoding.sql
@@ -0,0 +1,10 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+
+CREATE TABLE test (t text);
+COPY test FROM stdin WITH (ENCODING 'EUC_JP');
+こんにちは
+\.
+
+DROP TABLE test;
-- 
2.43.0

Re: meson: catalog/syscache_ids.h isn't installed

2024-02-07 Thread Sutou Kouhei

Hi,

In <4b60e9bd-426c-4d4b-bbbd-1abd06fa0...@eisentraut.org>
  "Re: meson: catalog/syscache_ids.h isn't installed" on Mon, 5 Feb 2024 
17:53:52 +0100,
  Peter Eisentraut  wrote:

> On 05.02.24 02:29, Sutou Kouhei wrote:
>> catalog/syscache_ids.h is referred by utils/syscache.h but
>> it's not installed with Meson.
> 
> This has been fixed.  (Somebody else reported it in a different thread
> also.)

Thanks!

Commit: 1ae5ace7558ea949d2f94af2fd5eb145d5558659
Thread: 
https://www.postgresql.org/message-id/CAJ7c6TMDGmAiozDjJQ3%3DP3cd-1BidC_GpitcAuU0aqq-r1eSoQ%40mail.gmail.com

-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-05 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 5 Feb 2024 16:14:08 +0900,
  Michael Paquier  wrote:

> So, I've looked at all that today, and finished by applying two
> patches as of 2889fd23be56 and 95fb5b49024a to get some of the
> weirdness with the workhorse routines out of the way.

Thanks!

> As this is called within the OneRow routine, I can live with that.  If
> there is an opposition to that, we could just attach it within the
> routines.

I don't object the approach.

> I am attaching a v12 which is close to what I want it to be, with
> much more documentation and comments.  There are two things that I've
> changed compared to the previous versions though:
> 1) I have added a callback to set up the input and output functions
> rather than attach that in the Start callback.

I'm OK with this. I just don't use them in Apache Arrow COPY
FORMAT extension.

> - No need for plugins to think about how to allocate this data.  v11
> and other versions were doing things the wrong way by allocating this
> stuff in the wrong memory context as we switch to the COPY context
> when we are in the Start routines.

Oh, sorry. I missed it when I moved them.

> 2) I have backpedaled on the postpare callback, which did not bring
> much in clarity IMO while being a CSV-only callback.  Note that we
> have in copyfromparse.c more paths that are only for CSV but the past
> versions of the patch never cared about that.  This makes the text and
> CSV implementations much closer to each other, as a result.

Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in
CopyReadLineText(). The postpare callback is for
optimization. If it doesn't improve performance, we don't
need to introduce it.

We may want to try eliminating cstate->opts.csv_mode in
CopyReadLineText() for performance. But we don't need to
do this in introducing CopyFromRoutine. We can defer it.

So I don't object removing the postpare callback.

>  Let me know if you have
> comments about all that.

Here are some comments for the patch:

+   /*
+* Called when COPY FROM is started to set up the input functions
+* associated to the relation's attributes writing to.  `fmgr_info` can 
be

fmgr_info ->
finfo

+* optionally filled to provide the catalog information of the input
+* function.  `typioparam` can be optinally filled to define the OID of

optinally ->
optionally

+* the type to pass to the input function.  `atttypid` is the OID of 
data
+* type used by the relation's attribute.
+*/
+   void(*CopyFromInFunc) (Oid atttypid, FmgrInfo *finfo,
+  Oid 
*typioparam);

How about passing CopyFromState cstate too like other
callbacks for consistency?

+   /*
+* Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+*
+* 'econtext' is used to evaluate default expression for each column 
that
+* is either not read from the file or is using the DEFAULT option of 
COPY

or is ->
or

(I'm not sure...)

+* FROM.  It is NULL if no default values are used.
+*
+* Returns false if there are no more tuples to copy.
+*/
+   bool(*CopyFromOneRow) (CopyFromState cstate, ExprContext 
*econtext,
+  Datum 
*values, bool *nulls);

+typedef struct CopyToRoutine
+{
+   /*
+* Called when COPY TO is started to set up the output functions
+* associated to the relation's attributes reading from.  `fmgr_info` 
can

fmgr_info ->
finfo

+* be optionally filled. `atttypid` is the OID of data type used by the
+* relation's attribute.
+*/
+   void(*CopyToOutFunc) (Oid atttypid, FmgrInfo *finfo);

How about passing CopyToState cstate too like other
callbacks for consistency?


@@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 extern int CopyReadAttributesCSV(CopyFromState cstate);
 extern int CopyReadAttributesText(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->OneRow */

CopyFromRoutine->OneRow ->
CopyFromRoutine->CopyFromOneRow

+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+  Datum *values, bool 
*nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+Datum *values, 
bool *nulls);
+
 #endif /* COPYFROM_INTERNAL_H 
*/

+/*
+ * CopyFromTextStart

CopyFromTextStart ->
CopyFromBinaryStart

+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+   /* Read and verify binary header */
+

meson: catalog/syscache_ids.h isn't installed

2024-02-04 Thread Sutou Kouhei

Hi,

catalog/syscache_ids.h is referred by utils/syscache.h but
it's not installed with Meson.

FYI:
* 9b1a6f50b91dca6610932650c8c81a3c924259f9
  It uses catalog/syscache_ids.h in utils/syscache.h but
  catalog/syscache_ids.h isn't installed.
* 6eb6086faa3842c2a38a1ee2f97bf9a42ce27610
  It changes a Makefile to install catalog/syscache_ids.h but
  it doesn't change meson.build.


diff --git a/src/include/catalog/meson.build b/src/include/catalog/meson.build
index 6be76dca1d..0bf6e112d5 100644
--- a/src/include/catalog/meson.build
+++ b/src/include/catalog/meson.build
@@ -114,7 +114,7 @@ output_install = [
   dir_data,
   dir_data,
   dir_include_server / 'catalog',
-  false,
+  dir_include_server / 'catalog',
   dir_include_server / 'catalog',
   dir_include_server / 'catalog',
 ]


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-02 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 17:04:28 +0900,
  Michael Paquier  wrote:

> One idea I was considering is whether we should use a special value in
> the "format" DefElem, say "custom:$my_custom_format" where it would be
> possible to bypass the formay check when processing options and find
> the routines after processing all the options.  I'm not wedded to
> that, but attaching the routines to the state data is IMO the correct
> thing, because this has nothing to do with CopyFormatOptions.

Thanks for sharing your idea.
Let's discuss how to support custom options after we
complete the current performance changes.

>> I'm OK with the approach. But how about adding the extra
>> callbacks to Copy{From,To}StateData not
>> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
>> CopyFromStateData::data_source_cb? They are only needed for
>> "text" and "csv". So we don't need to add them to
>> Copy{From,To}Routines to keep required callback minimum.
> 
> And set them in cstate while we are in the Start routine, right?

I imagined that it's done around the following part:

@@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
/* Extract options from the statement node tree */
ProcessCopyOptions(pstate, >opts, true /* is_from */ , options);
 
+   /* Set format routine */
+   cstate->routine = CopyFromGetRoutine(cstate->opts);
+
/* Process the target relation */
cstate->rel = rel;
 

Example1:

/* Set format routine */
cstate->routine = CopyFromGetRoutine(cstate->opts);
if (!cstate->opts.binary)
if (cstate->opts.csv_mode)
cstate->copy_read_attributes = CopyReadAttributesCSV;
else
cstate->copy_read_attributes = CopyReadAttributesText;

Example2:

static void
CopyFromSetRoutine(CopyFromState cstate)
{
if (cstate->opts.csv_mode)
{
cstate->routine = 
cstate->copy_read_attributes = CopyReadAttributesCSV;
}
else if (cstate.binary)
cstate->routine = 
else
{
cstate->routine = 
cstate->copy_read_attributes = CopyReadAttributesText;
}
}

BeginCopyFrom()
{
/* Set format routine */
CopyFromSetRoutine(cstate);
}


But I don't object your original approach. If we have the
extra callbacks in Copy{From,To}Routines, I just don't use
them for my custom format extension.

>> What is the better next action for us? Do you want to
>> complete the WIP v11 patch set by yourself (and commit it)?
>> Or should I take over it?
> 
> I was planning to work on that, but wanted to be sure how you felt
> about the problem with text and csv first.

OK.
My opinion is the above. I have an idea how to implement it
but it's not a strong idea. You can choose whichever you like.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 15:27:15 +0800,
  Junwang Zhao  wrote:

> I agree CopyToRoutine should be placed into CopyToStateData, but
> why set it after ProcessCopyOptions, the implementation of
> CopyToGetRoutine doesn't make sense if we want to support custom
> format in the future.
> 
> Seems the refactor of v11 only considered performance but not
> *extendable copy format*.

Right.
We focus on performance for now. And then we will focus on
extendability. [1]

[1] 
https://www.postgresql.org/message-id/flat/20240130.171511.2014195814665030502.kou%40clear-code.com#757a48c273f140081656ec8eb69f502b

> If V7 and V10 have no performance reduction, then I think V6 is also
> good with performance, since most of the time goes to CopyToOneRow
> and CopyFromOneRow.

Don't worry. I'll re-submit changes in the v6 patch set
again after the current patch set that focuses on
performance is merged.

> I just think we should take the *extendable* into consideration at
> the beginning.

Introducing Copy{To,From}Routine is also valuable for
extendability. We can improve extendability later. Let's
focus on only performance for now to introduce
Copy{To,From}Routine.

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 15:21:31 +0900,
  Michael Paquier  wrote:

> I have done a review of v10, see v11 attached which is still WIP, with
> the patches for COPY TO and COPY FROM merged together.  Note that I'm
> thinking to merge them into a single commit.

OK. I don't have a strong opinion for commit unit.

> @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
>  boolconvert_selectively;/* do selective binary conversion? */
>  CopyOnErrorChoice on_error; /* what to do when error happened */
>  List   *convert_select; /* list of column names (can be NIL) */
> +constCopyToRoutine *to_routine;/* callback routines for COPY 
> TO */
>  } CopyFormatOptions;
> 
> Adding the routines to the structure for the format options is in my
> opinion incorrect.  The elements of this structure are first processed
> in the option deparsing path, and then we need to use the options to
> guess which routines we need.

This was discussed with Sawada-san a bit before. [1][2]

[1] 
https://www.postgresql.org/message-id/flat/CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf%2BgXEk9Mg%40mail.gmail.com#bfd19262d261c67058fdb8d64e6a723c
[2] 
https://www.postgresql.org/message-id/flat/20240130.144531.1257430878438173740.kou%40clear-code.com#fc55392d77f400fc74e42686fe7e348a

I kept the routines in CopyFormatOptions for custom option
processing. But I should have not cared about it in this
patch set because this patch set doesn't include custom
option processing.

So I'm OK that we move the routines to
Copy{From,To}StateData.

> This also led to a strange separation with
> ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
> the hole in-between.

They also for custom option processing. We don't need to
care about them in this patch set too.

> copyapi.h needs more documentation, like what is expected for
> extension developers when using these, what are the arguments, etc.  I
> have added what I had in mind for now.

Thanks! I'm not good at writing documentation in English...

> +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, 
> int m);
> 
> CopyReadAttributes and PostpareColumnValue are also callbacks specific
> to text and csv, except that they are used within the per-row
> callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
> It seems to me that it would be less confusing to store pointers to
> them in the routine structures, where the final picture involves not
> having multiple layers of APIs like CopyToCSVStart,
> CopyAttributeOutTextValue, etc.  These *have* to be documented
> properly in copyapi.h, and this is much easier now that cstate stores
> the routine pointers.  That would also make simpler function stacks.
> Note that I have not changed that in the v11 attached.
> 
> This business with the extra callbacks required for csv and text is my
> main point of contention, but I'd be OK once the model of the APIs is
> more linear, with everything in Copy{From,To}State.  The changes would
> be rather simple, and I'd be OK to put my hands on it.  Just,
> Sutou-san, would you agree with my last point about these extra
> callbacks?

I'm OK with the approach. But how about adding the extra
callbacks to Copy{From,To}StateData not
Copy{From,To}Routines like CopyToStateData::data_dest_cb and
CopyFromStateData::data_source_cb? They are only needed for
"text" and "csv". So we don't need to add them to
Copy{From,To}Routines to keep required callback minimum.

What is the better next action for us? Do you want to
complete the WIP v11 patch set by yourself (and commit it)?
Or should I take over it?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 2 Feb 2024 06:51:02 +0900,
  Michael Paquier  wrote:

> On Fri, Feb 02, 2024 at 12:19:51AM +0900, Sutou Kouhei wrote:
>> Here are some numbers on my local machine (Note that my
>> local machine isn't suitable for benchmark as I said
>> before. Each number is median of "\watch 15" results):
>>>
>> I'll measure again on my local machine later. I'll stop
>> other processes such as Web browser, editor and so on as
>> much as possible when I do.
> 
> Thanks for compiling some numbers.  This is showing a lot of variance.
> Expecially, these two lines in table 2 are showing surprising results
> for v7:
>   direction format  n_columns master v7v10
>fromcsv  1917.973   1695.401871.991
>from binary  1841.104   1422.012820.786

Here are more numbers:

1:
 direction format  n_columns master v7v10
to   text  1   1053.844978.998956.575
tocsv  1   1091.316   1020.584   1098.314
to binary  1   1034.685969.224980.458
to   text 10   4216.264   3886.515   4111.417
tocsv 10   4649.228   4530.882   4682.988
to binary 10   4219.2284189.99   4211.942
  from   text  1851.697896.968890.458
  fromcsv  1890.229936.231 887.15
  from binary  1784.407 817.07938.736
  from   text 10   2549.056   2233.899   2630.892
  fromcsv 10   2809.441   2868.411   2895.196
  from binary 10   2985.674   3027.522 3397.5

2:
 direction format  n_columns master v7v10
to   text  1   1013.764   1011.968940.855
tocsv  1   1060.431   1065.4681040.68
to binary  1   1013.652   1009.956965.675
to   text 10   4411.484   4031.571   3896.836
tocsv 10   4739.6254715.81   4631.002
to binary 10   4374.077   4357.942   4227.215
  from   text  1955.078922.346866.222
  fromcsv  1   1040.717986.524905.657
  from binary  1849.316864.859833.152
  from   text 10   2703.209   2361.651   2533.992
  fromcsv 102990.35   3059.167   2930.632
  from binary 10   3008.375   3368.714   3055.723

3:
 direction format  n_columns master v7v10
to   text  1   1084.756   1003.822994.409
tocsv  1 1092.4   1062.536   1079.027
to binary  1   1046.774994.168993.633
to   text 104363.51   3978.205   4124.359
tocsv 10   4866.762   4616.001   4715.052
to binary 10   4382.412   4363.269   4213.456
  from   text  1852.976907.315860.749
  fromcsv  1925.187962.632897.833
  from binary  1824.997897.046828.231
  from   text 102591.07   2358.541   2540.431
  fromcsv 10   2907.033   3018.486   2915.997
  from binary 10   3069.0273209.21   3119.128

Other processes are stopped while I measure them. But I'm
not sure these numbers are more reliable than before...

> I am going to try to plug in some rusage() calls in the backend for
> the COPY paths.  I hope that gives more precision about the backend
> activity.  I'll post that with more numbers.

Thanks. It'll help us.


-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-01 Thread Sutou Kouhei

Hi,

Thanks for preparing benchmark.

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 1 Feb 2024 12:49:59 +0900,
  Michael Paquier  wrote:

> On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote:
>> And here are the results I get for text and binary (ms, average of 15
>> queries after discarding the three highest and three lowest values):
>>   test   | master |  v7  | v10  
>> -++--+--
>>  from_bin_1col   | 1575   | 1546 | 1575
>>  from_bin_10col  | 5364   | 5208 | 5230
>>  from_text_1col  | 1690   | 1715 | 1684
>>  from_text_10col | 4875   | 4793 | 4757
>>  to_bin_1col | 1717   | 1730 | 1731
>>  to_bin_10col| 7728   | 7707 | 7513
>>  to_text_1col| 1710   | 1730 | 1698
>>  to_text_10col   | 5998   | 5960 | 5987
> 
> Here are some numbers from a second local machine:
>   test   | master |  v7  | v10  
> -++--+--
>  from_bin_1col   | 508| 467  | 461
>  from_bin_10col  | 2192   | 2083 | 2098
>  from_text_1col  | 510| 499  | 517
>  from_text_10col | 1970   | 1678 | 1654
>  to_bin_1col | 575| 577  | 573
>  to_bin_10col| 2680   | 2678 | 2722
>  to_text_1col| 516| 506  | 527
>  to_text_10col   | 2250   | 2245 | 2235
> 
> This is confirming a speedup with COPY FROM for both text and binary,
> with more impact with a larger number of attributes.  That's harder to
> conclude about COPY TO in both cases, but at least I'm not seeing any
> regression even with some variance caused by what looks like noise.
> We need more numbers from more people.  Sutou-san or Sawada-san, or
> any volunteers?

Here are some numbers on my local machine (Note that my
local machine isn't suitable for benchmark as I said
before. Each number is median of "\watch 15" results):

1:
 direction format  n_columns master v7v10
to   text  1   1077.254   1016.953   1028.434
tocsv  11079.88   1055.5451053.95
to binary  1   1051.2471033.931003.44
to   text 10   4373.168   3980.4423955.94
tocsv 10   4753.842 4719.2   4677.643
to binary 10   4598.374   4431.238   4285.757
  from   text  1875.729916.526869.283
  fromcsv  1909.355   1001.277918.655
  from binary  1872.943907.778859.433
  from   text 10   2594.429   2345.292   2587.603
  fromcsv 10   2968.972   3039.544   2964.468
  from binary 103072.01   3109.267   3093.983

2:
 direction format  n_columns master v7v10
to   text  1   1061.908988.768978.291
tocsv  1   1095.109   1037.015   1041.613
to binary  1   1076.992   1000.212983.318
to   text 10   4336.517   3901.833   3841.789
tocsv 10   4679.411   4640.975   4570.774
to binary 104465.04   4508.063   4261.749
  from   text  1866.689 917.54830.417
  fromcsv  1917.973   1695.401871.991
  from binary  1841.104   1422.012820.786
  from   text 10   2523.607   3147.738   2517.505
  fromcsv 10   2917.018   3042.685   2950.338
  from binary 10   2998.051   3128.542   3018.954

3:
 direction format  n_columns master v7v10
to   text  1   1021.168   1031.183962.945
tocsv  1   1076.549   1069.661   1060.258
to binary  1   1024.611   1022.143975.768
to   text 104327.24   3936.703   4049.893
tocsv 10   4620.436   4531.676   4685.672
to binary 10   4457.165   4390.992   4301.463
  from   text  1887.532907.365888.892
  fromcsv  1945.1671012.29895.921
  from binary  1 853.06854.652849.661
  from   text 10   2660.509   2304.256   2527.071
  fromcsv 10   2913.644   2968.204   2935.081
  from binary 10   3020.812   3081.162   3090.803

I'll measure again on my local machine later. I'll stop
other processes such as Web browser, editor and so on as
much as possible when I do.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 30 Jan 2024 17:37:35 +0900,
  Michael Paquier  wrote:

>> We use defel->location for an error message. (We don't need
>> to set location for the default "text" DefElem.)
> 
> Yeah, but you should not need to have this error in the paths that set
> the callback routines in opts_out if the same validation happens a few
> lines before, in copy.c.

Ah, yes. defel->location is used in later patches. For
example, it's used when a COPY handler for the specified
FORMAT isn't found.

>   I am really worried about the complexities
> this thread is getting into because we are trying to shape the
> callbacks in the most generic way possible based on *two* use cases.
> This is going to be a never-ending discussion.  I'd rather get some
> simple basics, and then we can discuss if tweaking the callbacks is
> really necessary or not.  Even after introducing the pg_proc lookups
> to get custom callbacks.

I understand your concern. Let's introduce minimal callbacks
as the first step. I think that we completed our design
discussion for this feature. We can choose minimal callbacks
based on the discussion.

> The custom options don't seem like an absolute strong
> requirement for the first shot with the callbacks or even the
> possibility to retrieve the callbacks from a function call.  I mean,
> you could provide some control with SET commands and a few GUCs, at
> least, even if that would be strange.  Manipulations with a list of
> DefElems is the intuitive way to have custom options at query level,
> but we also have to guess the set of callbacks from this list of
> DefElems coming from the query.  You see my point, I am not sure 
> if it would be the best thing to process twice the options, especially
> when it comes to decide if a DefElem should be valid or not depending
> on the callbacks used.  Or we could use a kind of "special" DefElem
> where we could store a set of key:value fed to a callback :)

Interesting. Let's remove custom options support from the
initial minimal callbacks.

>> Can I prepare the v10 patch set as "the v7 patch set" + "the
>> removal of the "if" checks in NextCopyFromRawFields()"?
>> (+ reverting opts_out->{csv_mode,binary} changes in
>> ProcessCopyOptions().)
> 
> Yep, if I got it that would make sense to me.  If you can do that,
> that would help quite a bit.  :)

I've prepared the v10 patch set. Could you try this?

Changes since the v7 patch set:

0001:

* Remove CopyToProcessOption() callback
* Remove CopyToGetFormat() callback
* Revert passing CopyToState to ProcessCopyOptions()
* Revert moving "opts_out->{csv_mode,binary} = true" to
  ProcessCopyOptionFormatTo()
* Change to receive "const char *format" instead "DefElem  *defel"
  by ProcessCopyOptionFormatTo()

0002:

* Remove CopyFromProcessOption() callback
* Remove CopyFromGetFormat() callback
* Change to receive "const char *format" instead "DefElem
  *defel" by ProcessCopyOptionFormatFrom()
* Remove "if (cstate->opts.csv_mode)" branches from
  NextCopyFromRawFields()

FYI: Here are Copy{From,To}Routine in the v10 patch set. I
think that only Copy{From,To}OneRow are minimal callbacks
for the performance gain. But can we keep Copy{From,To}Start
and Copy{From,To}End for consistency? We can remove a few
{csv_mode,binary} conditions by Copy{From,To}{Start,End}. It
doesn't depend on the number of COPY target tuples. So they
will not affect performance.

/* Routines for a COPY FROM format implementation. */
typedef struct CopyFromRoutine
{
/*
 * Called when COPY FROM is started. This will initialize something and
 * receive a header.
 */
void(*CopyFromStart) (CopyFromState cstate, TupleDesc 
tupDesc);

/* Copy one row. It returns false if no more tuples. */
bool(*CopyFromOneRow) (CopyFromState cstate, ExprContext 
*econtext, Datum *values, bool *nulls);

/* Called when COPY FROM is ended. This will finalize something. */
void(*CopyFromEnd) (CopyFromState cstate);
}   CopyFromRoutine;

/* Routines for a COPY TO format implementation. */
typedef struct CopyToRoutine
{
/* Called when COPY TO is started. This will send a header. */
void(*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);

/* Copy one row for COPY TO. */
void(*CopyToOneRow) (CopyToState cstate, TupleTableSlot 
*slot);

/* Called when COPY TO is ended. This will send a trailer. */
void(*CopyToEnd) (CopyToState cstate);
}   CopyToRoutine;

Thanks,
-- 
kou
>From

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-30 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 30 Jan 2024 16:20:54 +0900,
  Michael Paquier  wrote:

>>> +if (!format_specified)
>>> +/* Set the default format. */
>>> +ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
>>> +
>>> 
>>> I think we can pass "text" in this case instead of NULL. That way,
>>> ProcessCopyOptionFormatTo doesn't need to handle NULL case.
>> 
>> Yes, we can do it. But it needs a DefElem allocation. Is it
>> acceptable?
> 
> I don't think that there is a need for a DelElem at all here?

We use defel->location for an error message. (We don't need
to set location for the default "text" DefElem.)

>While I
> am OK with the choice of calling CopyToInit() in the
> ProcessCopyOption*() routines that exist to keep the set of callbacks
> local to copyto.c and copyfrom.c, I think that this should not bother
> about setting opts_out->csv_mode or opts_out->csv_mode but just set 
> the opts_out->{to,from}_routine callbacks.

OK. I'll keep opts_out->{csv_mode,binary} in copy.c.

>  Now, all the Init() callbacks are empty for the
> in-core callbacks, so I think that we should just remove it entirely
> for now.  Let's keep the core patch a maximum simple.  It is always
> possible to build on top of it depending on what people need.  It's
> been mentioned that JSON would want that, but this also proves that we
> just don't care about that for all the in-core callbacks, as well.  I
> would choose a minimalistic design for now.

OK. Let's remove Init() callbacks from the first patch set.

> I would be really tempted to put my hands on this patch to put into
> shape a minimal set of changes because I'm caring quite a lot about
> the performance gains reported with the removal of the "if" checks in
> the per-row callbacks, and that's one goal of this thread quite
> independent on the extensibility.  Sutou-san, would you be OK with
> that?

Yes, sure.
(We want to focus on the performance gains in the first
patch set and then focus on extensibility again, right?)

For the purpose, I think that the v7 patch set is more
suitable than the v9 patch set. The v7 patch set doesn't
include Init() callbacks, custom options validation support
or extra Copy{In,Out}Response support. But the v7 patch set
misses the removal of the "if" checks in
NextCopyFromRawFields() that exists in the v9 patch set. I'm
not sure how much performance will improve by this but it
may be worth a try.

Can I prepare the v10 patch set as "the v7 patch set" + "the
removal of the "if" checks in NextCopyFromRawFields()"?
(+ reverting opts_out->{csv_mode,binary} changes in
ProcessCopyOptions().)

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-29 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 30 Jan 2024 11:11:59 +0900,
  Masahiko Sawada  wrote:

> ---
> +if (!format_specified)
> +/* Set the default format. */
> +ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
> +
> 
> I think we can pass "text" in this case instead of NULL. That way,
> ProcessCopyOptionFormatTo doesn't need to handle NULL case.

Yes, we can do it. But it needs a DefElem allocation. Is it
acceptable?

> We need curly brackets for this "if branch" as follows:
> 
> if (!format_specifed)
> {
> /* Set the default format. */
> ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
> }

Oh, sorry. I assumed that pgindent adjusts the style too.

> ---
> +/* Process not built-in options. */
> +foreach(option, unknown_options)
> +{
> +DefElem*defel = lfirst_node(DefElem, option);
> +bool   processed = false;
> +
> +if (!is_from)
> +processed =
> opts_out->to_routine->CopyToProcessOption(cstate, defel);
> +if (!processed)
> +ereport(ERROR,
> +(errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("option \"%s\" not 
> recognized",
> +defel->defname),
> + parser_errposition(pstate,
> defel->location)));
> +}
> +list_free(unknown_options);
> 
> I think we can check the duplicated options in the core as we discussed.

Oh, sorry. I missed the part. I'll implement it.

> ---
> +static void
> +CopyToTextBasedInit(CopyToState cstate)
> +{
> +}
> 
> and
> 
> +static void
> +CopyToBinaryInit(CopyToState cstate)
> +{
> +}
> 
> Do we really need separate callbacks for the same behavior? I think we
> can have a common init function say CopyToBuitinInit() that does
> nothing. Or we can make the init callback optional.
> 
> The same is true for process-option callback.

OK. I'll make them optional.

> ---
>  List  *convert_select; /* list of column names (can be NIL) */
> +const  CopyToRoutine *to_routine;  /* callback
> routines for COPY TO */
>  } CopyFormatOptions;
> 
> I think CopyToStateData is a better place to have CopyToRoutine.
> copy_data_dest_cb is also there.

We can do it but ProcessCopyOptions() accepts NULL
CopyToState for file_fdw. Can we create an empty
CopyToStateData internally like we did for opts_out in
ProcessCopyOptions()? (But it requires exporting
CopyToStateData. We'll export it in a later patch but it's
not yet in 0001.)

> The 0002 patch replaces the options checks with
> ProcessCopyOptionFormatFrom(). However, both
> ProcessCopyOptionFormatTo() and ProcessCOpyOptionFormatFrom() would
> set format-related options such as opts_out->csv_mode etc, which seems
> not elegant. IIUC the reason why we process only the "format" option
> first is to set the callback functions and call the init callback. So
> I think we don't necessarily need to do both setting callbacks and
> setting format-related options together. Probably we can do only the
> callback stuff first and then set format-related options in the
> original place we used to do?

If we do it, we need to write the (strcmp(format, "csv") ==
0) condition in copyto.c and copy.c. I wanted to avoid it. I
think that the duplication (setting opts_out->csv_mode in
copyto.c and copyfrom.c) is not a problem. But it's not a
strong opinion. If (strcmp(format, "csv") == 0) duplication
is better than opts_out->csv_mode = true duplication, I'll
do it.

BTW, if we want to make the CSV format implementation more
modularized, we will remove opts_out->csv_mode, move CSV
related options to CopyToCSVProcessOption() and keep CSV
related options in its opaque space. For example,
opts_out->force_quote exists in COPY TO opaque space but
doesn't exist in COPY FROM opaque space because it's not
used in COPY FROM.


> +static void
> +CopyToTextBasedFillCopyOutResponse(CopyToState cstate, StringInfoData *buf)
> +{
> +int16  format = 0;
> +intnatts = list_length(cstate->attnumlist);
> +inti;
> +
> +pq_sendbyte(buf, format);  /* overall format */
> +pq_sendint16(buf, natts);
> +for (i = 0; i < natts; i++)
> +pq_sendint16(buf, format); /* per-column formats */
> +}
> 
> This function and CopyToBinaryFillCopyOutResponse() fill three things:
> overall format, the number of columns, and per-column formats. While
> this approach is flexible, extensions will have to understand the
> format of CopyOutResponse message. An alternative is to have one or
> more callbacks that return these three things.

Yes, we can choose the approach. I don't have a strong
opinion on which approach to

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-29 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 29 Jan 2024 11:37:07 +0800,
  Junwang Zhao  wrote:

>> > > Does it make sense to pass only non-builtin options to the custom
>> > > format callback after parsing and evaluating the builtin options? That
>> > > is, we parse and evaluate only the builtin options and populate
>> > > opts_out first, then pass each rest option to the custom format
>> > > handler callback. The callback can refer to the builtin option values.
>>
>> What I imagined is that while parsing the all specified options, we
>> evaluate builtin options and we add non-builtin options to another
>> list. Then when parsing a non-builtin option, we check if this option
>> already exists in the list. If there is, we raise the "option %s not
>> recognized" error.". Once we complete checking all options, we pass
>> each option in the list to the callback.

I implemented this idea and the following ideas:

1. Add init callback for initialization
2. Change GetFormat() to FillCopyXXXResponse()
   because JSON format always use 1 column
3. FROM only: Eliminate more cstate->opts.csv_mode branches
   (This is for performance.)

See the attached v9 patch set for details. Changes since v7:

0001:

* Move CopyToProcessOption() calls to the end of
  ProcessCopyOptions() for easy to option validation
* Add CopyToState::CopyToInit() and call it in
  ProcessCopyOptionFormatTo()
* Change CopyToState::CopyToGetFormat() to
  CopyToState::CopyToFillCopyOutResponse() and use it in
  SendCopyBegin()

0002:

* Move CopyFromProcessOption() calls to the end of
  ProcessCopyOptions() for easy to option validation
* Add CopyFromState::CopyFromInit() and call it in
  ProcessCopyOptionFormatFrom()
* Change CopyFromState::CopyFromGetFormat() to
  CopyFromState::CopyFromFillCopyOutResponse() and use it in
  ReceiveCopyBegin()
* Rename NextCopyFromRawFields() to
  NextCopyFromRawFieldsInternal() and pass the read
  attributes callback explicitly to eliminate more
  cstate->opts.csv_mode branches

Thanks,
-- 
kou
>From c136833f4a385574474b246a381014abeb631377 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Fri, 26 Jan 2024 16:46:51 +0900
Subject: [PATCH v9 1/2] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToRoutine can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

CREATE TABLE data (int32 integer);
SELECT setseed(0.29);
INSERT INTO data
  SELECT random() * 1
FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

format,elapsed time (ms)
text,10.561
csv,10.868
binary,10.287

100K with this change:

format,elapsed time (ms)
text,9.962
csv,10.453
binary,9.473

1M without this change:

format,elapsed time (ms)
text,103.265
csv,109.789
binary,104.078

1M with this change:

format,elapsed time (ms)
text,98.612
csv,101.908
binary,94.456

10M without this change:

format,elapsed time (ms)
text,1060.614
csv,1065.272
binary,1025.875

10M with this change:

format,elapsed time (ms)
text,1020.050
csv,1031.279
binary,954.792
---
 contrib/file_fdw/file_fdw.c |   2 +-
 src/backend/commands/copy.c |  82 -
 src/backend/commands/copyfrom.c |   2 +-
 src/backend/commands/copyto.c   | 587 +++-
 src/include/commands/copy.h |   8 +-
 src/include/commands/copyapi.h  |  62 
 6 files changed, 560 insertions(+), 183 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --gi

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-28 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Sat, 27 Jan 2024 14:15:02 +0800,
  Junwang Zhao  wrote:

> I have been working on a *COPY TO JSON* extension since yesterday,
> which is based on your V6 patch set, I'd like to give you more input
> so you can make better decisions about the implementation(with only
> pg-copy-arrow you might not get everything considered).

Thanks!

> 0009 is some changes made by me, I changed CopyToGetFormat to
> CopyToSendCopyBegin because pg_copy_json need to send different bytes
> in SendCopyBegin, get the format code along is not enough

Oh, I haven't cared about the case.
How about the following API instead?

static void
SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;

pq_beginmessage(, PqMsg_CopyOutResponse);
cstate->opts.to_routine->CopyToFillCopyOutResponse(cstate, );
pq_endmessage();
cstate->copy_dest = COPY_FRONTEND;
}

static void
CopyToJsonFillCopyOutResponse(CopyToState cstate, StringInfoData )
{
int16   format = 0;

pq_sendbyte(, format);  /* overall format */
/*
 * JSON mode is always one non-binary column
 */
pq_sendint16(, 1);
pq_sendint16(, format);
}

> 00010 is the pg_copy_json extension, I think this should be a good
> case which can utilize the *extendable copy format* feature

It seems that it's convenient that we have one more callback
for initializing CopyToState::opaque. It's called only once
when Copy{To,From}Routine is chosen:

typedef struct CopyToRoutine
{
void(*CopyToInit) (CopyToState cstate);
...
};

void
ProcessCopyOptions(ParseState *pstate,
   CopyFormatOptions *opts_out,
   bool is_from,
   void *cstate,
   List *options)
{
...
foreach(option, options)
{
DefElem*defel = lfirst_node(DefElem, option);

if (strcmp(defel->defname, "format") == 0)
{
...
opts_out->to_routine = 
opts_out->to_routine->CopyToInit(cstate);
...
}
}
...
}


>  maybe we
> should delete copy_test_format if we have this extension as an
> example?

I haven't read the COPY TO format json thread[1] carefully
(sorry), but we may add the JSON format as a built-in
format. If we do it, copy_test_format is useful to test the
extension API.

[1] 
https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 26 Jan 2024 16:41:50 +0800,
  Junwang Zhao  wrote:

> CopyToProcessOption()/CopyFromProcessOption() can only handle
> single option, and store the options in the opaque field,  but it can not
> check the relation of two options, for example, considering json format,
> the `header` option can not be handled by these two functions.
> 
> I want to find a way when the user specifies the header option, customer
> handler can error out.

Ah, you want to use a built-in option (such as "header")
value from a custom handler, right? Hmm, it may be better
that we call CopyToProcessOption()/CopyFromProcessOption()
for all options including built-in options.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 26 Jan 2024 08:35:19 +0900,
  Michael Paquier  wrote:

>> OK. How about the following for the fetch function
>> signature?
>> 
>> extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);
> 
> Or CopyToRoutineGet()?  I am not wedded to my suggestion, got a bad
> history with naming things around here.

Thanks for the suggestion.
I rethink about this and use the following:

+extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions 
*opts_out, DefElem *defel);

It's not a fetch function. It sets CopyToRoutine opts_out
instead. But it hides CopyToRoutine* to copyto.c. Is it
acceptable?

>> OK. I'll focus on 0001 and 0005 for now. I'll restart
>> 0002-0004/0006-0008 after 0001 and 0005 are accepted.
> 
> Once you get these, I'd be interested in re-doing an evaluation of
> COPY TO and more tests with COPY FROM while running Postgres on
> scissors.  One thing I was thinking to use here is my blackhole_am for
> COPY FROM:
> https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am

Thanks!

Could you evaluate the attached patch set with COPY FROM?

I attach v7 patch set. It includes only the 0001 and 0005
parts in v6 patch set because we focus on them for now.

0001: This is based on 0001 in v6.

Changes since v6:

* Fix comment style
* Hide CopyToRoutine{Text,CSV,Binary}
* Add more comments
* Eliminate "if (cstate->opts.csv_mode)" branches from "text"
  and "csv" callbacks
* Remove CopyTo*_function typedefs
* Update benchmark results in commit message but the results
  are measured on my environment that isn't suitable for
  accurate benchmark

0002: This is based on 0005 in v6.

Changes since v6:

* Fix comment style
* Hide CopyFromRoutine{Text,CSV,Binary}
* Add more comments
* Eliminate a "if (cstate->opts.csv_mode)" branch from "text"
  and "csv" callbacks
  * NOTE: We can eliminate more "if (cstate->opts.csv_mode)" branches
such as one in NextCopyFromRawFields(). Should we do it
in this feature improvement (make COPY format
extendable)? Can we defer this as a separated improvement?
* Remove CopyFrom*_function typedefs

Thanks,
-- 
kou
>From 3e75129c2e9d9d34eebb6ef31b4fe6579f9eb02d Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Fri, 26 Jan 2024 16:46:51 +0900
Subject: [PATCH v7 1/2] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToRoutine can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

CREATE TABLE data (int32 integer);
SELECT setseed(0.29);
INSERT INTO data
  SELECT random() * 1
FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

format,elapsed time (ms)
text,10.561
csv,10.868
binary,10.287

100K with this change:

format,elapsed time (ms)
text,9.962
csv,10.453
binary,9.473

1M without this change:

format,elapsed time (ms)
text,103.265
csv,109.789
binary,104.078

1M with this change:

format,elapsed time (ms)
text,98.612
csv,101.908
binary,94.456

10M without this change:

format,elapsed time (ms)
text,1060.614
csv,1065.272
binary,1025.875

10M with this change:

format,elapsed time (ms)
text,1020.050
csv,1031.279
binary,954.792
---
 contrib/file_fdw/file_fdw.c |   2 +-
 src/backend/commands/copy.c |  71 ++--
 src/backend/commands/copyfrom.c |   2 +-
 src/backend/commands/copyto.c   | 551 +++-
 src/include/commands/copy.h |   8 +-
 src/

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-26 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 26 Jan 2024 16:18:14 +0800,
  Junwang Zhao  wrote:

> In the current implementation, there is no way that one can check
> incompatibility
> options in ProcessCopyOptions, we can postpone the check in CopyFromStart
> or CopyToStart, but I think it is a little bit late. Do you think
> adding an extra
> check for incompatible options hook is acceptable (PFA)?

Thanks for the suggestion! But I think that a custom handler
can do it in
CopyToProcessOption()/CopyFromProcessOption(). What do you
think about this? Or could you share a sample COPY TO/FROM
WITH() SQL you think?

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 13:36:03 +0900,
  Masahiko Sawada  wrote:

> I've experimented with a similar optimization for csv
> and text format; have different callbacks for text and csv format and
> remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> for that. Here are results:
> 
> HEAD w/ 0001 patch + remove branches:
> binary 2824.502 ms
> text 2715.264 ms
> csv 2803.381 ms
> 
> The numbers look better now. I'm not sure these are within a noise
> range but it might be worth considering having different callbacks for
> text and csv formats.

Wow! Interesting. I tried the approach before but I didn't
see any difference by the approach. But it may depend on my
environment.

I'll import the approach to the next patch set so that
others can try the approach easily.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 12:17:55 +0900,
  Michael Paquier  wrote:

> +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem 
> *defel);
> +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
> +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot 
> *slot);
> +typedef void (*CopyToEnd_function) (CopyToState cstate);
> 
> We don't really need a set of typedefs here, let's put the definitions
> in the CopyToRoutine struct instead.

OK. I'll do it.

> +extern CopyToRoutine CopyToRoutineText;
> +extern CopyToRoutine CopyToRoutineCSV;
> +extern CopyToRoutine CopyToRoutineBinary;
> 
> All that should IMO remain in copyto.c and copyfrom.c in the initial
> patch doing the refactoring.  Why not using a fetch function instead
> that uses a string in input?  Then you can call that once after
> parsing the List of options in ProcessCopyOptions().

OK. How about the following for the fetch function
signature?

extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);

We may introduce an enum and use it:

typedef enum CopyBuiltinFormat
{
COPY_BUILTIN_FORMAT_TEXT = 0,
COPY_BUILTIN_FORMAT_CSV,
COPY_BUILTIN_FORMAT_BINARY,
} CopyBuiltinFormat;

extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);

> +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
> + * move the code to here later. */
> Some areas, like this comment, are written in an incorrect format.

Oh, sorry. I assumed that the comment style was adjusted by
pgindent.

I'll use the following style:

/*
 * ...
 */

> +getTypeBinaryOutputInfo(attr->atttypid, _func_oid, );
> +fmgr_info(out_func_oid, >out_functions[attnum - 1]);
> 
> Actually, this split is interesting.  It is possible for a custom
> format to plug in a custom set of out functions.  Did you make use of
> something custom for your own stuff?

I didn't. My PoC custom COPY format handler for Apache Arrow
just handles integer and text for now. It doesn't use
cstate->out_functions because cstate->out_functions may not
return a valid binary format value for Apache Arrow. So it
formats each value by itself.

I'll chose one of them for a custom type (that isn't
supported by Apache Arrow, e.g. PostGIS types):

1. Report an unsupported error
2. Call output function for Apache Arrow provided by the
   custom type

>   Actually, could it make sense to
> split the assignment of cstate->out_functions into its own callback?

Yes. Because we need to use getTypeBinaryOutputInfo() for
"binary" and use getTypeOutputInfo() for "text" and "csv".

> Sure, that's part of the start phase, but at least it would make clear
> that a custom method *has* to assign these OIDs to work.  The patch
> implies that as a rule, without a comment that CopyToStart *must* set
> up these OIDs.

CopyToStart doesn't need to set up them if the handler
doesn't use cstate->out_functions.

> I think that 0001 and 0005 should be handled first, as pieces
> independent of the rest.  Then we could move on with 0002~0004 and
> 0006~0008.

OK. I'll focus on 0001 and 0005 for now. I'll restart
0002-0004/0006-0008 after 0001 and 0005 are accepted.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

Thanks for trying these patches!

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 10:53:58 +0800,
  jian he  wrote:

> COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5

Wow! I didn't know the "\watch count="!
I'll use it.

> Time: 668.996 ms
> Time: 596.254 ms
> Time: 592.723 ms
> Time: 591.663 ms
> Time: 590.803 ms

It seems that 5 times isn't enough for this case as Michael
said. But thanks for trying!

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Sutou Kouhei

Hi,

In <20240124.144936.67229716500876806@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 24 Jan 2024 14:49:36 +0900 (JST),
  Sutou Kouhei  wrote:

> I've implemented custom COPY format feature based on the
> current design discussion. See the attached patches for
> details.

I forgot to mention one note. Documentation isn't included
in these patches. I'll write it after all (or some) patches
are merged. Is it OK?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-24 Thread Sutou Kouhei

Hi,

In <10025bac-158c-ffe7-fbec-32b426291...@dunslane.net>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 24 Jan 2024 07:15:55 -0500,
  Andrew Dunstan  wrote:

> 
> On 2024-01-24 We 03:11, Michael Paquier wrote:
>> On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote:
>>> For COPY TO:
>>>
>>> 0001: This adds CopyToRoutine and use it for text/csv/binary
>>> formats. No implementation change. This just move codes.
>> 10M without this change:
>>
>>  format,elapsed time (ms)
>>  text,1090.763
>>  csv,1136.103
>>  binary,1137.141
>>
>> 10M with this change:
>>
>>  format,elapsed time (ms)
>>  text,1082.654
>>  csv,1196.991
>>  binary,1069.697
>>
>> These numbers point out that binary is faster by 6%, csv is slower by
>> 5%, while text stays around what looks like noise range.  That's not
>> negligible.  Are these numbers reproducible?  If they are, that could
>> be a problem for anybody doing bulk-loading of large data sets.  I am
>> not sure to understand where the improvement for binary comes from by
>> reading the patch, but perhaps perf would tell more for each format?
>> The loss with csv could be blamed on the extra manipulations of the
>> function pointers, likely.
> 
> 
> I don't think that's at all acceptable.
> 
> We've spent quite a lot of blood sweat and tears over the years to make COPY
> fast, and we should not sacrifice any of that lightly.

These numbers aren't reproducible. Because these benchmarks
executed on my normal machine not a machine only for
benchmarking. The machine runs another processes such as
editor and Web browser.

For example, here are some results with master
(94edfe250c6a200d2067b0debfe00b4122e9b11e):

Format,N records,Elapsed time (ms)
csv,1000,1073.715
csv,1000,1022.830
csv,1000,1073.584
csv,1000,1090.651
csv,1000,1052.259

Here are some results with master + the 0001 patch:

Format,N records,Elapsed time (ms)
csv,1000,1025.356
csv,1000,1067.202
csv,1000,1014.563
csv,1000,1032.088
csv,1000,1058.110


I uploaded my benchmark script so that you can run the same
benchmark on your machine:

https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5

Could anyone try the benchmark with master and master+0001?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-23 Thread Sutou Kouhei

Hi,

I've implemented custom COPY format feature based on the
current design discussion. See the attached patches for
details.

I also implemented a PoC COPY format handler for Apache
Arrow with this implementation and it worked.
https://github.com/kou/pg-copy-arrow

The patches implement not only custom COPY TO format feature
but also custom COPY FROM format feature.

0001-0004 is for COPY TO and 0005-0008 is for COPY FROM.

For COPY TO:

0001: This adds CopyToRoutine and use it for text/csv/binary
formats. No implementation change. This just move codes.

0002: This adds support for adding custom COPY TO format by
"CREATE FUNCTION ${FORMAT_NAME}". This uses the same
approach provided by Sawada-san[1] but this doesn't
introduce a wrapper CopyRoutine struct for
Copy{To,From}Routine. Because I noticed that a wrapper
CopyRoutine struct is needless. Copy handler can just return
CopyToRoutine or CopyFromRtouine because both of them have
NodeTag. We can distinct a returned struct by the NodeTag.

[1] 
https://www.postgresql.org/message-id/cad21aocunywhird3gapzwe6s9jg1wzxj3cr6vgn36ddhegj...@mail.gmail.com

0003: This exports CopyToStateData. No implementation change
except CopyDest enum values. I changed COPY_ prefix to
COPY_DEST_ to avoid name conflict with CopySource enum
values. This just moves codes.

0004: This adds CopyToState::opaque and exports
CopySendEndOfRow(). CopySendEndOfRow() is renamed to
CopyToStateFlush().

For COPY FROM:

0005: Same as 0001 but for COPY FROM. This adds
CopyFromRoutine and use it for text/csv/binary formats. No
implementation change. This just move codes.

0006: Same as 0002 but for COPY FROM. This adds support for
adding custom COPY FROM format by "CREATE FUNCTION
${FORMAT_NAME}".

0007: Same as 0003 but for COPY FROM. This exports
CopyFromStateData. No implementation change except
CopySource enum values. I changed COPY_ prefix to
COPY_SOURCE_ to align CopyDest changes in 0003. This just
moves codes.

0008: Same as 0004 but for COPY FROM. This adds
CopyFromState::opaque and exports
CopyReadBinaryData(). CopyReadBinaryData() is renamed to
CopyFromStateRead().

Thanks,
-- 
kou

In <20240115.152702.2011620917962812379@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 15 Jan 2024 15:27:02 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> If there are no more comments for the current design, I'll
> start implementing this feature with the following
> approaches for "Discussing" items:
> 
>> 3.1 Should we use one function(internal) for COPY TO/FROM
>> handlers or two function(internal)s (one is for COPY TO
>> handler and another is for COPY FROM handler)?
>> [4]
> 
> I'll choose "one function(internal) for COPY TO/FROM handlers".
> 
>> 3.4 Should we export Copy{To,From}State? Or should we just
>> provide getters/setters to access Copy{To,From}State
>> internal?
>> [10]
> 
> I'll export Copy{To,From}State.
> 
> 
> Thanks,
> -- 
> kou
> 
> In <20240112.144615.157925223373344229@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 12 Jan 2024 14:46:15 +0900 (JST),
>   Sutou Kouhei  wrote:
> 
>> Hi,
>> 
>> Here is the current summary for a this discussion to make
>> COPY format extendable. It's for reaching consensus and
>> starting implementing the feature. (I'll start implementing
>> the feature once we reach consensus.) If you have any
>> opinion, please share it.
>> 
>> Confirmed:
>> 
>> 1.1 Making COPY format extendable will not reduce performance.
>> [1]
>> 
>> Decisions:
>> 
>> 2.1 Use separated handler for COPY TO and COPY FROM because
>> our COPY TO implementation (copyto.c) and COPY FROM
>> implementation (coypfrom.c) are separated.
>> [2]
>> 
>> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
>> just use a function(internal) that returns a handler instead.
>> [3]
>> 
>> 2.3 The implementation must include documentation.
>> [5]
>> 
>> 2.4 The implementation must include test.
>> [6]
>> 
>> 2.5 The implementation should be consist of small patches
>> for easy to review.
>> [6]
>> 
>> 2.7 Copy{To,From}State must have a opaque space for
>> handlers.
>> [8]
>> 
>> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
>> handlers.
>> [8]
>> 
>> 2.9 Make "format" in PgMsg_CopyOutResponse message
>> extendable.
>> [9]
>> 
>> 2.10 Make makeNode() call avoidable in function(internal)
>>

meson: Specify -Wformat as a common warning flag for extensions

2024-01-21 Thread Sutou Kouhei

Hi,

I'm an extension developer. If I use PostgreSQL built with
Meson, I get the following warning:

cc1: warning: '-Wformat-security' ignored without '-Wformat' 
[-Wformat-security]

Because "pg_config --cflags" includes -Wformat-security but
doesn't include -Wformat.

Can we specify -Wformat as a common warning flag too? If we
do it, "pg_config --cflags" includes both of
-Wformat-security and -Wformat. So I don't get the warning.


Thanks,
-- 
kou
>From 0913033512c9b75ee3d2941c89ff8696f3c5f53b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 22 Jan 2024 13:51:58 +0900
Subject: [PATCH v1] meson: Specify -Wformat explicitly for extensions

We specify -Wformat-security as a common warning flag explicitly. Our
common warning flags are used by extensions via
pgxs/src/Makefile.global or "pg_config --cflags".

If -Wformat-security is used without -Wall/-Wformat, GCC shows the
following warning:

cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]

We can't assume that all extensions use -Wall/-Wformat. So specifying
only -Wformat-security may cause the warning.

If we specify -Wformat explicitly, the warning isn't shown.
---
 meson.build | 8 
 1 file changed, 8 insertions(+)

diff --git a/meson.build b/meson.build
index 55184db248..de6ce778fc 100644
--- a/meson.build
+++ b/meson.build
@@ -1822,6 +1822,14 @@ common_warning_flags = [
   '-Wimplicit-fallthrough=3',
   '-Wcast-function-type',
   '-Wshadow=compatible-local',
+  # This is for preventing the "cc1: warning: '-Wformat-security'
+  # ignored without '-Wformat' [-Wformat-security]" warning. We don't
+  # need this for PostgreSQL itself. This is just for
+  # extensions. Extensions use "pg_config --cflags" to build
+  # themselves. If extensions use only -Wformat-security, the warning
+  # is shown. If we have this here, extensions use both of -Wformat
+  # and -Wformat-security. So the warning isn't shown.
+  '-Wformat',
   # This was included in -Wall/-Wformat in older GCC versions
   '-Wformat-security',
 ]
-- 
2.43.0

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-15 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 15 Jan 2024 16:03:41 +0900,
  Masahiko Sawada  wrote:

>> Defining one more static const struct instead of providing a
>> convenient (but a bit tricky) macro may be straightforward:
>>
>> static const CopyToFormatRoutine testfmt_copyto_routine = {
>> .type = T_CopyToFormatRoutine,
>> .start_fn = testfmt_copyto_start,
>> .onerow_fn = testfmt_copyto_onerow,
>> .end_fn = testfmt_copyto_end
>> };
>>
>> static const CopyFormatRoutine testfmt_copyto_handler = {
>> .type = T_CopyFormatRoutine,
>> .is_from = false,
>> .routine = (Node *) _copyto_routine
>> };
> 
> Yeah, IIUC this is the option 2 you mentioned[1]. I think we can go
> with this idea as it's the simplest.
>
> [1] 
> https://www.postgresql.org/message-id/20240110.120034.501385498034538233.kou%40clear-code.com

Ah, you're right. I forgot it...

>  That is CopyFormatRoutine will be like:
> 
> typedef struct CopyFormatRoutine
> {
> NodeTag type;
> 
> /* either CopyToFormatRoutine or CopyFromFormatRoutine */
> Node   *routine;
> }   CopyFormatRoutine;
> 
> And the core can check the node type of the 'routine7 in the
> CopyFormatRoutine returned by extensions.

It makes sense.


If no more comments about the current design, I'll start
implementing this feature based on the current design.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Sutou Kouhei

Hi,

If there are no more comments for the current design, I'll
start implementing this feature with the following
approaches for "Discussing" items:

> 3.1 Should we use one function(internal) for COPY TO/FROM
> handlers or two function(internal)s (one is for COPY TO
> handler and another is for COPY FROM handler)?
> [4]

I'll choose "one function(internal) for COPY TO/FROM handlers".

> 3.4 Should we export Copy{To,From}State? Or should we just
> provide getters/setters to access Copy{To,From}State
> internal?
> [10]

I'll export Copy{To,From}State.


Thanks,
-- 
kou

In <20240112.144615.157925223373344229@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 12 Jan 2024 14:46:15 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> Here is the current summary for a this discussion to make
> COPY format extendable. It's for reaching consensus and
> starting implementing the feature. (I'll start implementing
> the feature once we reach consensus.) If you have any
> opinion, please share it.
> 
> Confirmed:
> 
> 1.1 Making COPY format extendable will not reduce performance.
> [1]
> 
> Decisions:
> 
> 2.1 Use separated handler for COPY TO and COPY FROM because
> our COPY TO implementation (copyto.c) and COPY FROM
> implementation (coypfrom.c) are separated.
> [2]
> 
> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
> just use a function(internal) that returns a handler instead.
> [3]
> 
> 2.3 The implementation must include documentation.
> [5]
> 
> 2.4 The implementation must include test.
> [6]
> 
> 2.5 The implementation should be consist of small patches
> for easy to review.
> [6]
> 
> 2.7 Copy{To,From}State must have a opaque space for
> handlers.
> [8]
> 
> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
> handlers.
> [8]
> 
> 2.9 Make "format" in PgMsg_CopyOutResponse message
> extendable.
> [9]
> 
> 2.10 Make makeNode() call avoidable in function(internal)
>  that returns COPY TO/FROM handler.
>  [9]
> 
> 2.11 Custom COPY TO/FROM handlers must be able to parse
>  their options.
>  [11]
> 
> Discussing:
> 
> 3.1 Should we use one function(internal) for COPY TO/FROM
> handlers or two function(internal)s (one is for COPY TO
> handler and another is for COPY FROM handler)?
> [4]
> 
> 3.2 If we use separated function(internal) for COPY TO/FROM
> handlers, we need to define naming rule. For example,
> _to(internal) for COPY TO handler and
> _from(internal) for COPY FROM handler.
> [4]
> 
> 3.3 Should we use prefix or suffix for function(internal)
> name to avoid name conflict with other handlers such as
> tablesample handlers?
> [7]
> 
> 3.4 Should we export Copy{To,From}State? Or should we just
> provide getters/setters to access Copy{To,From}State
> internal?
> [10]
> 
> 
> [1] 
> https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
> [2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
> [3] 
> https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
> [4] 
> https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
> [5] 
> https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
> [7] 
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
> [8] 
> https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
> [9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
> [10] 
> https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
> [11] 
> https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com
> 
> 
> Thanks,
> -- 
> kou
> 
>

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-14 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 12 Jan 2024 14:40:41 +0800,
  Junwang Zhao  wrote:

>> Could you clarify what should we discuss? We should require
>> that COPY TO/FROM handlers should use PostgreSQL's memory
>> context for all internal memory allocations?
> 
> Yes, handlers should use PostgreSQL's memory context, and I think
> creating other memory context under CopyToStateData.copycontext
> should be suggested for handler creators, so I proposed exporting
> CopyToStateData to public header.

I see.

We can provide a getter for CopyToStateData::copycontext if
we don't want to export CopyToStateData. Note that I don't
have a strong opinion whether we should export
CopyToStateData or not.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-11 Thread Sutou Kouhei

Hi,

Here is the current summary for a this discussion to make
COPY format extendable. It's for reaching consensus and
starting implementing the feature. (I'll start implementing
the feature once we reach consensus.) If you have any
opinion, please share it.

Confirmed:

1.1 Making COPY format extendable will not reduce performance.
[1]

Decisions:

2.1 Use separated handler for COPY TO and COPY FROM because
our COPY TO implementation (copyto.c) and COPY FROM
implementation (coypfrom.c) are separated.
[2]

2.2 Don't use system catalog for COPY TO/FROM handlers. We can
just use a function(internal) that returns a handler instead.
[3]

2.3 The implementation must include documentation.
[5]

2.4 The implementation must include test.
[6]

2.5 The implementation should be consist of small patches
for easy to review.
[6]

2.7 Copy{To,From}State must have a opaque space for
handlers.
[8]

2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
handlers.
[8]

2.9 Make "format" in PgMsg_CopyOutResponse message
extendable.
[9]

2.10 Make makeNode() call avoidable in function(internal)
 that returns COPY TO/FROM handler.
 [9]

2.11 Custom COPY TO/FROM handlers must be able to parse
 their options.
 [11]

Discussing:

3.1 Should we use one function(internal) for COPY TO/FROM
handlers or two function(internal)s (one is for COPY TO
handler and another is for COPY FROM handler)?
[4]

3.2 If we use separated function(internal) for COPY TO/FROM
handlers, we need to define naming rule. For example,
_to(internal) for COPY TO handler and
_from(internal) for COPY FROM handler.
[4]

3.3 Should we use prefix or suffix for function(internal)
name to avoid name conflict with other handlers such as
tablesample handlers?
[7]

3.4 Should we export Copy{To,From}State? Or should we just
provide getters/setters to access Copy{To,From}State
internal?
[10]


[1] 
https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
[2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
[3] 
https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
[7] 
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
[8] 
https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
[9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
[10] 
https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
[11] 
https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-10 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 10 Jan 2024 16:53:48 +0900,
  Masahiko Sawada  wrote:

>> Interesting. But I feel that it introduces another (a bit)
>> tricky mechanism...
> 
> Right. On the other hand, I don't think the idea 3 is good for the
> same reason Michael-san pointed out before[1][2].
>
> [1] https://www.postgresql.org/message-id/ZXEUIy6wl4jHy6Nm%40paquier.xyz
> [2] https://www.postgresql.org/message-id/ZXKm9tmnSPIVrqZz%40paquier.xyz

I think that the important part of the Michael-san's opinion
is "keep COPY TO implementation and COPY FROM implementation
separated for maintainability".

The patch focused in [1][2] uses one routine for both of
COPY TO and COPY FROM. If we use the approach, we need to
change one common routine from copyto.c and copyfrom.c (or
export callbacks from copyto.c and copyfrom.c and use them
in copyto.c to construct one common routine). It's
the problem.

The idea 3 still has separated routines for COPY TO and COPY
FROM. So I think that it still keeps COPY TO implementation
and COPY FROM implementation separated.

>> BTW, we also need to set .type:
>>
>>  .routine = COPYTO_ROUTINE (
>>  .type = T_CopyToFormatRoutine,
>>  .start_fn = testfmt_copyto_start,
>>  .onerow_fn = testfmt_copyto_onerow,
>>  .end_fn = testfmt_copyto_end
>>  )
> 
> I think it's fine as the same is true for table AM.

Ah, sorry. I should have said explicitly. I don't this that
it's not a problem. I just wanted to say that it's missing.

Defining one more static const struct instead of providing a
convenient (but a bit tricky) macro may be straightforward:

static const CopyToFormatRoutine testfmt_copyto_routine = {
.type = T_CopyToFormatRoutine,
.start_fn = testfmt_copyto_start,
.onerow_fn = testfmt_copyto_onerow,
.end_fn = testfmt_copyto_end
};

static const CopyFormatRoutine testfmt_copyto_handler = {
.type = T_CopyFormatRoutine,
.is_from = false,
.routine = (Node *) _copyto_routine
};

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-09 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 10 Jan 2024 15:33:22 +0900,
  Masahiko Sawada  wrote:

>> We can use the satic const struct approach by choosing one
>> of the followings:
>>
>> ...
>>
>> 4. ... other idea?
> 
> It's a just idea but the fourth idea is to provide a convenient macro
> to make it easy to construct the CopyFormatRoutine. For example,
> 
> #define COPYTO_ROUTINE(...) (Node *) &(CopyToFormatRoutine) {__VA_ARGS__}
> 
> static const CopyFormatRoutine testfmt_copyto_handler = {
> .type = T_CopyFormatRoutine,
> .is_from = true,
> .routine = COPYTO_ROUTINE (
> .start_fn = testfmt_copyto_start,
> .onerow_fn = testfmt_copyto_onerow,
> .end_fn = testfmt_copyto_end
> )
> };
> 
> Datum
> copy_testfmt_handler(PG_FUNCTION_ARGS)
> {
> PG_RETURN_POINTER(& testfmt_copyto_handler);
> }

Interesting. But I feel that it introduces another (a bit)
tricky mechanism...

BTW, we also need to set .type:

 .routine = COPYTO_ROUTINE (
 .type = T_CopyToFormatRoutine,
 .start_fn = testfmt_copyto_start,
 .onerow_fn = testfmt_copyto_onerow,
 .end_fn = testfmt_copyto_end
 )


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-09 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 22 Dec 2023 10:58:05 +0800,
  Junwang Zhao  wrote:

>> 1. Add an opaque space for custom COPY TO handler
>>* Add CopyToState{Get,Set}Opaque()
>>
>> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>>
>> 2. Export CopyToState::attnumlist
>>* Add CopyToStateGetAttNumList()
>>
>> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>>
>> 3. Export CopySend*()
>>* Rename CopySend*() to CopyToStateSend*() and export them
>>* Exception: CopySendEndOfRow() to CopyToStateFlush() because
>>  it just flushes the internal buffer now.
>>
>> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>>
> I guess the purpose of these helpers is to avoid expose CopyToState to
> copy.h,

Yes.

> but I
> think expose CopyToState to user might make life easier, users might want to 
> use
> the memory contexts of the structure (though I agree not all the
> fields are necessary
> for extension handers).

OK. I don't object it as I said in another e-mail:
https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72

>> 2. Need an opaque space like IndexScanDesc::opaque does
>>
>>* A custom COPY TO handler needs to keep its data
> 
> I once thought users might want to parse their own options, maybe this
> is a use case for this opaque space.

Good catch! I forgot to suggest a callback for custom format
options. How about the following API?


...
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem 
*defel);

...
typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem 
*defel);

typedef struct CopyToFormatRoutine
{
...
CopyToProcessOption_function process_option_fn;
} CopyToFormatRoutine;

typedef struct CopyFromFormatRoutine
{
...
CopyFromProcessOption_function process_option_fn;
} CopyFromFormatRoutine;



diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e7597894bf..1aa8b62551 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -416,6 +416,7 @@ void
 ProcessCopyOptions(ParseState *pstate,
   CopyFormatOptions *opts_out,
   bool is_from,
+  void *cstate, /* CopyToState* for !is_from, 
CopyFromState* for is_from */
   List *options)
 {
boolformat_specified = false;
@@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
 parser_errposition(pstate, 
defel->location)));
}
else
-   ereport(ERROR,
-   (errcode(ERRCODE_SYNTAX_ERROR),
-errmsg("option \"%s\" not recognized",
-   defel->defname),
-parser_errposition(pstate, 
defel->location)));
+   {
+   bool processed;
+   if (is_from)
+   processed = 
opts_out->from_ops->process_option_fn(cstate, defel);
+   else
+   processed = 
opts_out->to_ops->process_option_fn(cstate, defel);
+   if (!processed)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("option \"%s\" not 
recognized",
+   defel->defname),
+parser_errposition(pstate, 
defel->location)));
+   }
}
 
/*


> For the name, I thought private_data might be a better candidate than
> opaque, but I do not insist.

I don't have a strong opinion for this. Here are the number
of headers that use "private_data" and "opaque":

$ grep -r private_data --files-with-matches src/include | wc -l
6
$ grep -r opaque --files-with-matches src/include | wc -l
38

It seems that we use "opaque" than "private_data" in general.

but it seems that we use
"opaque" than "private_data" in our code.

> Do you use the arrow library to control the memory?

Yes.

> Is there a way that
> we can let the arrow use postgres' memory context?

Yes. Apache Arrow C++ provides a memory pool feature and we
can implement PostgreSQL's memory context based memory pool
for this. (But this is a custom COPY TO/FROM handler's
implementation details.)

>I'm not sure this
> is necessary, just raise the question for discussion.

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-09 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 22 Dec 2023 10:48:18 +0900,
  Masahiko Sawada  wrote:

>> I needed to extend the patch:
>>
>> 1. Add an opaque space for custom COPY TO handler
>>* Add CopyToState{Get,Set}Opaque()
>>
>> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>>
>> 2. Export CopyToState::attnumlist
>>* Add CopyToStateGetAttNumList()
>>
>> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
> 
> I think we can move CopyToState to copy.h and we don't need to have
> set/get functions for its fields.

I don't object the idea if other PostgreSQL developers
prefer the approach. Is there any PostgreSQL developer who
objects that we export Copy{To,From}StateData as public API?


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-09 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 22 Dec 2023 10:00:24 +0900,
  Michael Paquier  wrote:

>> 3. Export CopySend*()
>> 
>>* If we like minimum API, we just need to export
>>  CopySendData() and CopySendEndOfRow(). But
>>  CopySend{String,Char,Int32,Int16}() will be convenient
>>  custom COPY TO handlers. (A custom COPY TO handler for
>>  Apache Arrow doesn't need them.)
> 
> Hmm.  Not sure on this one.  This may come down to externalize the
> manipulation of fe_msgbuf.  Particularly, could it be possible that
> some custom formats don't care at all about the network order?

It means that all custom formats should control byte order
by themselves instead of using CopySendInt*() that always
use network byte order, right? It makes sense. Let's export
only CopySendData() and CopySendEndOfRow().


>> 1. What value should be used for "format" in
>>PgMsg_CopyOutResponse message?
>> 
>>It's 1 for binary format and 0 for text/csv format.
>> 
>>Should we make it customizable by custom COPY TO handler?
>>If so, what value should be used for this?
> 
> Interesting point.  It looks very tempting to give more flexibility to
> people who'd like to use their own code as we have one byte in the
> protocol but just use 0/1.  Hence it feels natural to have a callback
> for that.

OK. Let's add a callback something like:

typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);

> It also means that we may want to think harder about copy_is_binary in
> libpq in the future step.  Now, having a backend implementation does
> not need any libpq bits, either, because a client stack may just want
> to speak the Postgres protocol directly.  Perhaps a custom COPY
> implementation would be OK with how things are in libpq, as well,
> tweaking its way through with just text or binary.

Can we defer this discussion after we commit a basic custom
COPY format handler mechanism?

>> 2. Do we need more tries for design discussion for the first
>>implementation? If we need, what should we try?
> 
> A makeNode() is used with an allocation in the current memory context
> in the function returning the handler.  I would have assume that this
> stuff returns a handler as a const struct like table AMs.

If we use this approach, we can't use the Sawada-san's
idea[1] that provides a convenient API to hide
CopyFormatRoutine internal. The idea provides
MakeCopy{To,From}FormatRoutine(). They return a new
CopyFormatRoutine* with suitable is_from member. They can't
use static const CopyFormatRoutine because they may be called
multiple times in the same process.

We can use the satic const struct approach by choosing one
of the followings:

1. Use separated function for COPY {TO,FROM} format handlers
   as I suggested.

2. Don't provide convenient API. Developers construct
   CopyFormatRoutine by themselves. But it may be a bit
   tricky.

3. Similar to 2. but don't use a bit tricky approach (don't
   embed Copy{To,From}FormatRoutine nodes into
   CopyFormatRoutine).

   Use unified function for COPY {TO,FROM} format handlers
   but CopyFormatRoutine always have both of COPY {TO,FROM}
   format routines and these routines aren't nodes:

   typedef struct CopyToFormatRoutine
   {
   CopyToStart_function start_fn;
   CopyToOneRow_function onerow_fn;
   CopyToEnd_function end_fn;
   } CopyToFormatRoutine;

   /* XXX: just copied from COPY TO routines */
   typedef struct CopyFromFormatRoutine
   {
   CopyFromStart_function start_fn;
   CopyFromOneRow_function onerow_fn;
   CopyFromEnd_function end_fn;
   } CopyFromFormatRoutine;

   typedef struct CopyFormatRoutine
   {
   NodeTag  type;

   CopyToFormatRoutine to_routine;
   CopyFromFormatRoutine   from_routine;
   } CopyFormatRoutine;

   

   static const CopyFormatRoutine testfmt_handler = {
   .type = T_CopyFormatRoutine,
   .to_routine = {
   .start_fn = testfmt_copyto_start,
   .onerow_fn = testfmt_copyto_onerow,
   .end_fn = testfmt_copyto_end,
   },
   .from_routine = {
   .start_fn = testfmt_copyfrom_start,
   .onerow_fn = testfmt_copyfrom_onerow,
   .end_fn = testfmt_copyfrom_end,
   },
   };

   PG_FUNCTION_INFO_V1(copy_testfmt_handler);
   Datum
   copy_testfmt_handler(PG_FUNCTION_ARGS)
   {
   PG_RETURN_POINTER(_handler);
   }

4. ... other idea?


[1] 
https://www.postgresql.org/message-id/flat/CAD21AoDs9cOjuVbA_krGizAdc50KE%2BFjAuEXWF0NZwbMnc7F3Q%40mail.gmail.com#71bb03d9237252382b245dd33e705a3a


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-21 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 11 Dec 2023 23:31:29 +0900,
  Masahiko Sawada  wrote:

> I've sketched the above idea including a test module in
> src/test/module/test_copy_format, based on v2 patch. It's not splitted
> and is dirty so just for discussion.

I implemented a sample COPY TO handler for Apache Arrow that
supports only integer and text.

I needed to extend the patch:

1. Add an opaque space for custom COPY TO handler
   * Add CopyToState{Get,Set}Opaque()

https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944

2. Export CopyToState::attnumlist
   * Add CopyToStateGetAttNumList()

https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688

3. Export CopySend*()
   * Rename CopySend*() to CopyToStateSend*() and export them
   * Exception: CopySendEndOfRow() to CopyToStateFlush() because
 it just flushes the internal buffer now.

https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e

The attached patch is based on the Sawada-san's patch and
includes the above changes. Note that this patch is also
dirty so just for discussion.

My suggestions from this experience:

1. Split COPY handler to COPY TO handler and COPY FROM handler

   * CopyFormatRoutine is a bit tricky. An extension needs
 to create a CopyFormatRoutine node and
 a CopyToFormatRoutine node.

   * If we just require "copy_to_${FORMAT}(internal)"
 function and "copy_from_${FORMAT}(internal)" function,
 we can remove the tricky approach. And it also avoid
 name collisions with other handler such as tablesample
 handler.
 See also:

https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9

2. Need an opaque space like IndexScanDesc::opaque does

   * A custom COPY TO handler needs to keep its data

3. Export CopySend*()

   * If we like minimum API, we just need to export
 CopySendData() and CopySendEndOfRow(). But
 CopySend{String,Char,Int32,Int16}() will be convenient
 custom COPY TO handlers. (A custom COPY TO handler for
 Apache Arrow doesn't need them.)

Questions:

1. What value should be used for "format" in
   PgMsg_CopyOutResponse message?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144

   It's 1 for binary format and 0 for text/csv format.

   Should we make it customizable by custom COPY TO handler?
   If so, what value should be used for this?

2. Do we need more tries for design discussion for the first
   implementation? If we need, what should we try?

Thanks,
-- 
kou
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b562..e7597894bf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -23,6 +23,7 @@
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -32,6 +33,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteHandler.h"
 #include "utils/acl.h"
@@ -427,6 +429,8 @@ ProcessCopyOptions(ParseState *pstate,

 	opts_out->file_encoding = -1;

+	/* Text is the default format. */
+	opts_out->to_ops = 
 	/* Extract options from the statement node tree */
 	foreach(option, options)
 	{
@@ -442,9 +446,26 @@ ProcessCopyOptions(ParseState *pstate,
 			if (strcmp(fmt, "text") == 0)
  /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
+			{
 opts_out->csv_mode = true;
+opts_out->to_ops = 
+			}
 			else if (strcmp(fmt, "binary") == 0)
+			{
 opts_out->binary = true;
+opts_out->to_ops = 
+			}
+			else if (!is_from)
+			{
+/*
+ * XXX: Currently we support custom COPY format only for COPY
+ * TO.
+ *
+ * XXX: need to check the combination of the existing options
+ * and a custom format (e.g., FREEZE)?
+ */
+opts_out->to_ops = GetCopyToFormatRoutine(fmt);
+			}
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -864,3 +885,62 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)

 	return attnums;
 }
+
+static CopyFormatRoutine *
+GetCopyFormatRoutine(char *format_name, bool is_from)
+{
+	Oid			handlerOid;
+	Oid			funcargtypes[1];
+	CopyFormatRoutine *cp;
+	Datum		datum;
+
+	funcargtypes[0] = INTERNALOID;
+	handlerOid = LookupFuncName(list_make1(makeString(format_name)), 1,
+funcargtypes, true);
+
+	if (!OidIsValid(handlerOid))
+		ereport(ERROR,
+(errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("COPY format \"%s\" not recognized", format_name)));
+
+	datum =

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-14 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 15 Dec 2023 11:27:30 +0800,
  Junwang Zhao  wrote:

>> > Adding a prefix or suffix would be one option but to give extensions
>> > more flexibility, another option would be to support format = 'custom'
>> > and add the "handler" option to specify a copy handler function name
>> > to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom',
>> > HANDLER = 'arrow_copy_handler').
>>
> I like the prefix/suffix idea, easy to implement. *custom* is not a FORMAT,
> and user has to know the name of the specific handler names, not
> intuitive.

Ah! I misunderstood this idea. "custom" is the special
format to use "HANDLER". I thought that we can use it like

   (FORMAT = 'arrow', HANDLER = 'arrow_copy_handler_impl1')

and

   (FORMAT = 'arrow', HANDLER = 'arrow_copy_handler_impl2')

.

>> Interesting. If we use this option, users can choose an COPY
>> FORMAT implementation they like from multiple
>> implementations. For example, a developer may implement a
>> COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON
>> related API and another developer may implement a handler
>> with simdjson[1] which is a fast JSON parser. Users can
>> choose whichever they like.
> Not sure about this, why not move Json copy handler to contrib
> as an example for others, any extensions share the same format
> function name and just install one? No bound would implement
> another CSV or TEXT copy handler IMHO.

I should have used a different format not JSON as an example
for easy to understand. I just wanted to say that extension
developers can implement another implementation without
conflicting another implementation.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-14 Thread Sutou Kouhei

Hi,

In 
 

  "RE: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 12 Dec 2023 02:31:53 +,
  "Hayato Kuroda (Fujitsu)"  wrote:

>> Can we discuss how to proceed this improvement?
>> 
>> There are 2 approaches for it:
>> 
>> 1. Do the followings concurrently:
>>a. Implementing small changes that got a consensus and
>>   merge them step-by-step
>>   (e.g. We got a consensus that we need to extract the
>>   current format related routines.)
>>b. Discuss design
>> 
>>(v1-v3 patches use this approach.)
>> 
>> 2. Implement one (large) complete patch set with design
>>discussion and merge it
>> 
>>(v4- patches use this approach.)
>> 
>> Which approach is preferred? (Or should we choose another
>> approach?)
>> 
>> I thought that 1. is preferred because it will reduce review
>> cost. So I chose 1.
> 
> I'm ok to use approach 1, but could you please divide a large patch? E.g.,
> 
> 0001. defines an infrastructure for copy-API
> 0002. adjusts current codes to use APIs
> 0003. adds a test module in src/test/modules or contrib.
> ...
> 
> This approach helps reviewers to see patches deeper. Separated patches can be
> combined when they are close to committable.

It seems that I should have chosen another approach based on
comments so far:

3. Do the followings in order:
   a. Implement a workable (but maybe dirty and/or incomplete)
  implementation to discuss design like [1], discuss
  design with it and get a consensus on design
   b. Implement small patches based on the design

[1]: 
https://www.postgresql.org/message-id/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA%40mail.gmail.com
 

I'll implement a custom COPY FORMAT handler with [1] and
provide a feedback with the experience. (It's for a.)


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-14 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 15 Dec 2023 05:19:43 +0900,
  Masahiko Sawada  wrote:

> To avoid collisions, extensions can be created in a
> different schema than public.

Thanks. I didn't notice it.

> And note that built-in format copy handler doesn't need to
> declare its handler function.

Right. I know it.

> Adding a prefix or suffix would be one option but to give extensions
> more flexibility, another option would be to support format = 'custom'
> and add the "handler" option to specify a copy handler function name
> to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom',
> HANDLER = 'arrow_copy_handler').

Interesting. If we use this option, users can choose an COPY
FORMAT implementation they like from multiple
implementations. For example, a developer may implement a
COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON
related API and another developer may implement a handler
with simdjson[1] which is a fast JSON parser. Users can
choose whichever they like.

But specifying HANDLER = '...' explicitly is a bit
inconvenient. Because only one handler will be installed in
most use cases. In the case, users don't need to choose one
handler.

If we choose this option, it may be better that we also
provide a mechanism that can work without HANDLER. Searching
a function by name like tablesample method does is an option.

[1]: https://github.com/simdjson/simdjson

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-14 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 11 Dec 2023 10:57:15 +0900,
  Masahiko Sawada  wrote:

> IIUC we cannot create two same name functions with the same arguments
> but a different return value type in the first place. It seems to me
> to be an overkill to change such a design.

Oh, sorry. I didn't notice it.

> Another idea is to encapsulate copy_to/from_handler by a super class
> like copy_handler. The handler function is called with an argument,
> say copyto, and returns copy_handler encapsulating either
> copy_to/from_handler depending on the argument.

It's for using "${copy_format_name}" such as "json" and
"parquet" as a function name, right? If we use the
"${copy_format_name}" approach, we can't use function names
that are already used by tablesample method handler such as
"system" and "bernoulli" for COPY FORMAT name. Because both
of tablesample method handler function and COPY FORMAT
handler function use "(internal)" as arguments.

I think that tablesample method names and COPY FORMAT names
will not be conflicted but the limitation (using the same
namespace for tablesample method and COPY FORMAT) is
unnecessary limitation.

How about using prefix ("copy_to_${copy_format_name}" or
something) or suffix ("${copy_format_name}_copy_to" or
something) for function names? For example,
"copy_to_json"/"copy_from_json" for "json" COPY FORMAT.

("copy_${copy_format_name}" that returns copy_handler
encapsulating either copy_to/from_handler depending on the
argument may be an option.)

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-09 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Sat, 9 Dec 2023 12:38:46 +0100,
  Hannu Krosing  wrote:

> Please also see my presentation slides from last years PostgreSQL
> Conference in Berlin (attached)

Thanks for sharing your idea here.

> The main Idea is to make not just "format", but also "transport" and
> "stream processing" extendable via virtual function tables.

"Transport" and "stream processing" are out of scope in this
thread. How about starting new threads for them and discuss
them there?

> Btw, will any of you here be in Prague next week ?

Sorry. No.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-09 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 8 Dec 2023 15:42:06 +0900,
  Masahiko Sawada  wrote:

> So a custom COPY extension would not be able to define SQL functions
> just like arrow(internal) for example. We might need to define a rule
> like the function returning copy_in/out_handler must be defined as
> _to(internal) and _from(internal).

We may not need to add "_to"/"_from" suffix by checking both
of argument type and return type. Because we use different
return type for copy_in/out_handler.

But the current LookupFuncName() family doesn't check return
type. If we use this approach, we need to improve the
current LookupFuncName() family too.

Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-09 Thread Sutou Kouhei

Hi,

Thanks for reviewing our latest patch!

In 
 

  "RE: Make COPY format extendable: Extract COPY TO format implementations" on 
Sat, 9 Dec 2023 02:43:49 +,
  "Hayato Kuroda (Fujitsu)"  wrote:

> (I remember that this theme was talked at Japan PostgreSQL conference)

Yes. I should have talked to you more at the conference...
I will do it next time!


Can we discuss how to proceed this improvement?

There are 2 approaches for it:

1. Do the followings concurrently:
   a. Implementing small changes that got a consensus and
  merge them step-by-step
  (e.g. We got a consensus that we need to extract the
  current format related routines.)
   b. Discuss design

   (v1-v3 patches use this approach.)

2. Implement one (large) complete patch set with design
   discussion and merge it

   (v4- patches use this approach.)

Which approach is preferred? (Or should we choose another
approach?)

I thought that 1. is preferred because it will reduce review
cost. So I chose 1.

If 2. is preferred, I'll use 2. (I'll add more changes to
the latest patch.)


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-06 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 6 Dec 2023 22:07:51 +0800,
  Junwang Zhao  wrote:

> Should we extract both *copy to* and *copy from* for the first step, in that
> case we can add the pg_copy_handler catalog smoothly later.

I don't object it (mixing TO/FROM changes to one patch) but
it may make review difficult. Is it acceptable?

FYI: I planed that I implement TO part, and then FROM part,
and then unify TO/FROM parts if needed. [1]

> Attached V4 adds 'extract copy from' and it passed the cirrus ci,
> please take a look.

Thanks. Here are my comments:

> + /*
> + * Error is relevant to a particular line.
> + *
> + * If line_buf still contains the correct line, print it.
> + */
> + if (cstate->line_buf_valid)

We need to fix the indentation.

> +CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
> +{
> + FmgrInfo   *in_functions;
> + Oid*typioparams;
> + Oid in_func_oid;
> + AttrNumber  num_phys_attrs;
> +
> + /*
> +  * Pick up the required catalog information for each attribute in the
> +  * relation, including the input function, the element type (to pass to
> +  * the input function), and info about defaults and constraints. (Which
> +  * input function we use depends on text/binary format choice.)
> +  */
> + num_phys_attrs = tupDesc->natts;
> + in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> + typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));

We need to update the comment because defaults and
constraints aren't picked up here.

> +CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc)
...
> + /*
> +  * Pick up the required catalog information for each attribute in the
> +  * relation, including the input function, the element type (to pass to
> +  * the input function), and info about defaults and constraints. (Which
> +  * input function we use depends on text/binary format choice.)
> +  */
> + in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> + typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));

ditto.


> @@ -1716,15 +1776,6 @@ BeginCopyFrom(ParseState *pstate,
>   ReceiveCopyBinaryHeader(cstate);
>   }

I think that this block should be moved to
CopyFromFormatBinaryStart() too. But we need to run it after
we setup inputs such as data_source_cb, pipe and filename...

+/* Routines for a COPY HANDLER implementation. */
+typedef struct CopyHandlerOps
+{
+   /* Called when COPY TO is started. This will send a header. */
+   void(*copy_to_start) (CopyToState cstate, TupleDesc 
tupDesc);
+
+   /* Copy one row for COPY TO. */
+   void(*copy_to_one_row) (CopyToState cstate, TupleTableSlot 
*slot);
+
+   /* Called when COPY TO is ended. This will send a trailer. */
+   void(*copy_to_end) (CopyToState cstate);
+
+   void(*copy_from_start) (CopyFromState cstate, TupleDesc 
tupDesc);
+   bool(*copy_from_next) (CopyFromState cstate, ExprContext 
*econtext,
+  Datum 
*values, bool *nulls);
+   void(*copy_from_error_callback) (CopyFromState cstate);
+   void(*copy_from_end) (CopyFromState cstate);
+} CopyHandlerOps;

It seems that "copy_" prefix is redundant. Should we use
"to_start" instead of "copy_to_start" and so on?

BTW, it seems that "COPY FROM (FORMAT json)" may not be implemented. [2]
We may need to care about NULL copy_from_* cases.


> I added a hook *copy_from_end* but this might be removed later if not used.

It may be useful to clean up resources for COPY FROM but the
patch doesn't call the copy_from_end. How about removing it
for now? We can add it and call it from EndCopyFrom() later?
Because it's not needed for now.

I think that we should focus on refactoring instead of
adding a new feature in this patch.


[1]: 
https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
[2]: 
https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-05 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 6 Dec 2023 15:11:34 +0800,
  Junwang Zhao  wrote:

> For the extra curly braces, I mean the following code block in
> CopyToFormatBinaryStart:
> 
> + {<-- I thought this is useless?
> + /* Generate header for a binary copy */
> + int32 tmp;
> +
> + /* Signature */
> + CopySendData(cstate, BinarySignature, 11);
> + /* Flags field */
> + tmp = 0;
> + CopySendInt32(cstate, tmp);
> + /* No header extension */
> + tmp = 0;
> + CopySendInt32(cstate, tmp);
> + }

Oh, I see. I've removed and attach the v3 patch. In general,
I don't change variable name and so on in this patch. I just
move codes in this patch. But I also removed the "tmp"
variable for this case because I think that the name isn't
suitable for larger scope. (I think that "tmp" is acceptable
in a small scope like the above code.)

New code:

/* Generate header for a binary copy */
/* Signature */
CopySendData(cstate, BinarySignature, 11);
/* Flags field */
CopySendInt32(cstate, 0);
/* No header extension */
CopySendInt32(cstate, 0);

Thanks,
-- 
kou
>From 9fe0087d9a6a79a7d1a7d0af63eb16abadbf0d4a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 4 Dec 2023 12:32:54 +0900
Subject: [PATCH v3] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToFormatOps, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToFormatOps can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

CREATE TABLE data (int32 integer);
INSERT INTO data
  SELECT random() * 1
FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

format,elapsed time (ms)
text,22.527
csv,23.822
binary,24.806

100K with this change:

format,elapsed time (ms)
text,22.919
csv,24.643
binary,24.705

1M without this change:

format,elapsed time (ms)
text,223.457
csv,233.583
binary,242.687

1M with this change:

format,elapsed time (ms)
text,224.591
csv,233.964
binary,247.164

10M without this change:

format,elapsed time (ms)
text,2330.383
csv,2411.394
binary,2590.817

10M with this change:

format,elapsed time (ms)
text,2231.307
csv,2408.067
binary,2473.617
---
 src/backend/commands/copy.c   |   8 +
 src/backend/commands/copyto.c | 377 --
 src/include/commands/copy.h   |  27 ++-
 3 files changed, 256 insertions(+), 156 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b562..27a1add456 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,6 +427,8 @@ ProcessCopyOptions(ParseState *pstate,

 	opts_out->file_encoding = -1;

+	/* Text is the default format. */
+	opts_out->to_ops = CopyToFormatOpsText;
 	/* Extract options from the statement node tree */
 	foreach(option, options)
 	{
@@ -442,9 +444,15 @@ ProcessCopyOptions(ParseState *pstate,
 			if (strcmp(fmt, "text") == 0)
  /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
+			{
 opts_out->csv_mode = true;
+opts_out->to_ops = CopyToFormatOpsCSV;
+			}
 			else if (strcmp(fmt, "binary") == 0)
+			{
 opts_out->binary = true;
+opts_out->to_ops = CopyToFormatOpsBinary;
+			}
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047c4a..8f51090a03 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,228 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySen

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-05 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 6 Dec 2023 11:18:35 +0800,
  Junwang Zhao  wrote:

> For the modern formats(parquet, orc, avro, etc.), will they be
> implemented as extensions or in core?

I think that they should be implemented as extensions
because they will depend of external libraries and may not
use C. For example, C++ will be used for Apache Parquet
because the official Apache Parquet C++ implementation
exists but the C implementation doesn't.

(I can implement an extension for Apache Parquet after we
complete this feature. I'll implement an extension for
Apache Arrow with the official Apache Arrow C++
implementation. And it's easy that we convert Apache Arrow
data to Apache Parquet with the official Apache Parquet
implementation.)

> The patch looks good except for a pair of extra curly braces.

Thanks for the review! I attach the v2 patch that removes
extra curly braces for "if (isnull)".

Thanks,
-- 
kou
>From 2cd0d344d68667db71b621a8c94f376ddf1707c3 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 4 Dec 2023 12:32:54 +0900
Subject: [PATCH v2] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToFormatOps, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToFormatOps can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

CREATE TABLE data (int32 integer);
INSERT INTO data
  SELECT random() * 1
FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

format,elapsed time (ms)
text,22.527
csv,23.822
binary,24.806

100K with this change:

format,elapsed time (ms)
text,22.919
csv,24.643
binary,24.705

1M without this change:

format,elapsed time (ms)
text,223.457
csv,233.583
binary,242.687

1M with this change:

format,elapsed time (ms)
text,224.591
csv,233.964
binary,247.164

10M without this change:

format,elapsed time (ms)
text,2330.383
csv,2411.394
binary,2590.817

10M with this change:

format,elapsed time (ms)
text,2231.307
csv,2408.067
binary,2473.617
---
 src/backend/commands/copy.c   |   8 +
 src/backend/commands/copyto.c | 383 --
 src/include/commands/copy.h   |  27 ++-
 3 files changed, 262 insertions(+), 156 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b562..27a1add456 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,6 +427,8 @@ ProcessCopyOptions(ParseState *pstate,

 	opts_out->file_encoding = -1;

+	/* Text is the default format. */
+	opts_out->to_ops = CopyToFormatOpsText;
 	/* Extract options from the statement node tree */
 	foreach(option, options)
 	{
@@ -442,9 +444,15 @@ ProcessCopyOptions(ParseState *pstate,
 			if (strcmp(fmt, "text") == 0)
  /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
+			{
 opts_out->csv_mode = true;
+opts_out->to_ops = CopyToFormatOpsCSV;
+			}
 			else if (strcmp(fmt, "binary") == 0)
+			{
 opts_out->binary = true;
+opts_out->to_ops = CopyToFormatOpsBinary;
+			}
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047c4a..79806b9a1b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,234 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);

+/*
+ * CopyToFormatOps implementations.
+ */
+
+/*
+ * CopyToFormatOps implement

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-05 Thread Sutou Kouhei

Hi,

Thanks for replying to this proposal!

In <20231205182458.GC2757816@nathanxps13>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 5 Dec 2023 12:24:58 -0600,
  Nathan Bossart  wrote:

> I think it makes sense to do this part independently, but we should be
> careful to design this with the follow-up tasks in mind.

OK. I'll keep updating the "TODOs" section in the original
e-mail. It also includes design in the follow-up tasks. We
can discuss the design separately from the patches
submitting. (The current submitted patch just focuses on
refactoring but we can discuss the final design.)

> I assume the performance concerns stem from the use of
> function pointers.  Or was there something else?

I think so too.

The original e-mail that mentioned the performance concern
[1] didn't say about the reason but the use of function
pointers might be concerned.

If the currently supported formats ("text", "csv" and
"binary") are implemented as an extension, it may have more
concerns but we will keep them as built-in formats for
compatibility. So I think that no more concerns exist for
these formats.

[1]: 
https://www.postgresql.org/message-id/flat/3741749.1655952719%40sss.pgh.pa.us#2bb7af4a3d2c7669f9a49808d777a20d

Thanks,
-- 
kou

Make COPY format extendable: Extract COPY TO format implementations

2023-12-03 Thread Sutou Kouhei

Hi,

I want to work on making COPY format extendable. I attach
the first patch for it. I'll send more patches after this is
merged.


Background:

Currently, COPY TO/FROM supports only "text", "csv" and
"binary" formats. There are some requests to support more
COPY formats. For example:

* 2023-11: JSON and JSON lines [1]
* 2022-04: Apache Arrow [2]
* 2018-02: Apache Avro, Apache Parquet and Apache ORC [3]

(FYI: I want to add support for Apache Arrow.)

There were discussions how to add support for more formats. [3][4]
In these discussions, we got a consensus about making COPY
format extendable.

But it seems that nobody works on this yet. So I want to
work on this. (If there is anyone who wants to work on this
together, I'm happy.)


Summary:

The attached patch introduces CopyToFormatOps struct that is
similar to TupleTableSlotOps for TupleTableSlot but
CopyToFormatOps is for COPY TO format. CopyToFormatOps has
routines to implement a COPY TO format.

The attached patch doesn't change:

* the current behavior (all existing tests are still passed
  without changing them)
* the existing "text", "csv" and "binary" format output
  implementations including local variable names (the
  attached patch just move them and adjust indent)
* performance (no significant loss of performance)

In other words, this is just a refactoring for further
changes to make COPY format extendable. If I use "complete
the task and then request reviews for it" approach, it will
be difficult to review because changes for it will be
large. So I want to work on this step by step. Is it
acceptable?

TODOs that should be done in subsequent patches:

* Add some CopyToState readers such as CopyToStateGetDest(),
  CopyToStateGetAttnums() and CopyToStateGetOpts()
  (We will need to consider which APIs should be exported.)
  (This is for implemeing COPY TO format by extension.)
* Export CopySend*() in src/backend/commands/copyto.c
  (This is for implemeing COPY TO format by extension.)
* Add API to register a new COPY TO format implementation
* Add "CREATE XXX" to register a new COPY TO format (or COPY
  TO/FROM format) implementation
  ("CREATE COPY HANDLER" was suggested in [5].)
* Same for COPY FROM


Performance:

We got a consensus about making COPY format extendable but
we should care about performance. [6]

> I think that step 1 ought to be to convert the existing
> formats into plug-ins, and demonstrate that there's no
> significant loss of performance.

So I measured COPY TO time with/without this change. You can
see there is no significant loss of performance.

Data: Random 32 bit integers:

CREATE TABLE data (int32 integer);
INSERT INTO data
  SELECT random() * 1
FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

format,elapsed time (ms)
text,22.527
csv,23.822
binary,24.806

100K with this change:

format,elapsed time (ms)
text,22.919
csv,24.643
binary,24.705

1M without this change:

format,elapsed time (ms)
text,223.457
csv,233.583
binary,242.687

1M with this change:

format,elapsed time (ms)
text,224.591
csv,233.964
binary,247.164

10M without this change:

format,elapsed time (ms)
text,2330.383
csv,2411.394
binary,2590.817

10M with this change:

format,elapsed time (ms)
text,2231.307
csv,2408.067
binary,2473.617


[1]: 
https://www.postgresql.org/message-id/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5
[2]: 
https://www.postgresql.org/message-id/flat/CAGrfaBVyfm0wPzXVqm0%3Dh5uArYh9N_ij%2BsVpUtDHqkB%3DVyB3jw%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
[4]: 
https://www.postgresql.org/message-id/flat/3741749.1655952719%40sss.pgh.pa.us#2bb7af4a3d2c7669f9a49808d777a20d
[5]: 
https://www.postgresql.org/message-id/20180211211235.5x3jywe5z3lkgcsr%40alap3.anarazel.de
[6]: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us


Thanks,
-- 
kou
>From 7f00b2b0fb878ae1c687c151dd751512d02ed83e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 4 Dec 2023 12:32:54 +0900
Subject: [PATCH v1] Extract COPY TO format implementations

This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToFormatOps, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"cs

81 matches

Mail list logo