Re: proposal: possibility to read dumped table's name from file

2023-11-30 Thread Pavel Stehule
čt 30. 11. 2023 v 14:05 odesílatel Daniel Gustafsson 
napsal:

> > On 30 Nov 2023, at 07:13, Pavel Stehule  wrote:
> > čt 30. 11. 2023 v 4:40 odesílatel Tom Lane  t...@sss.pgh.pa.us>> napsal:
> > Daniel Gustafsson mailto:dan...@yesql.se>> writes:
> > > I took another look at this, found some more polish that was needed,
> added
> > > another testcase and ended up pushing it.
> >
> > mamba is unhappy because this uses  functions without
> > casting their arguments to unsigned char:
>
> Thanks for the heads-up.
>
> > here is a patch
>
> I agree with this fix, and have applied it.
>

Thank you

Pavel

>
> --
> Daniel Gustafsson
>
>


Re: proposal: possibility to read dumped table's name from file

2023-11-30 Thread Daniel Gustafsson
> On 30 Nov 2023, at 07:13, Pavel Stehule  wrote:
> čt 30. 11. 2023 v 4:40 odesílatel Tom Lane  > napsal:
> Daniel Gustafsson mailto:dan...@yesql.se>> writes:
> > I took another look at this, found some more polish that was needed, added
> > another testcase and ended up pushing it.
> 
> mamba is unhappy because this uses  functions without
> casting their arguments to unsigned char:

Thanks for the heads-up.

> here is a patch

I agree with this fix, and have applied it.

--
Daniel Gustafsson





Re: proposal: possibility to read dumped table's name from file

2023-11-29 Thread Pavel Stehule
Hi

čt 30. 11. 2023 v 4:40 odesílatel Tom Lane  napsal:

> Daniel Gustafsson  writes:
> > I took another look at this, found some more polish that was needed,
> added
> > another testcase and ended up pushing it.
>
> mamba is unhappy because this uses  functions without
> casting their arguments to unsigned char:
>
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba=2023-11-30%2002%3A53%3A25
>
> (I had not realized that we still had buildfarm animals that would
> complain about this ... but I'm glad we do, because it's a hazard.
> POSIX is quite clear that the behavior is undefined for signed chars.)
>

here is a patch

Regards

Pavel


>
> regards, tom lane
>
commit 29a570f6b003513ef93691d92e01ea80fab11844
Author: ok...@github.com 
Date:   Thu Nov 30 07:10:58 2023 +0100

fix warning array subscript has type 'char'

diff --git a/src/bin/pg_dump/filter.c b/src/bin/pg_dump/filter.c
index dff871c7cd..d79b6da27c 100644
--- a/src/bin/pg_dump/filter.c
+++ b/src/bin/pg_dump/filter.c
@@ -185,14 +185,14 @@ filter_get_keyword(const char **line, int *size)
 	*size = 0;
 
 	/* Skip initial whitespace */
-	while (isspace(*ptr))
+	while (isspace((unsigned char) *ptr))
 		ptr++;
 
-	if (isalpha(*ptr))
+	if (isalpha((unsigned char) *ptr))
 	{
 		result = ptr++;
 
-		while (isalpha(*ptr) || *ptr == '_')
+		while (isalpha((unsigned char) *ptr) || *ptr == '_')
 			ptr++;
 
 		*size = ptr - result;
@@ -301,7 +301,7 @@ read_pattern(FilterStateData *fstate, const char *str, PQExpBuffer pattern)
 	bool		found_space = false;
 
 	/* Skip initial whitespace */
-	while (isspace(*str))
+	while (isspace((unsigned char) *str))
 		str++;
 
 	if (*str == '\0')
@@ -312,7 +312,7 @@ read_pattern(FilterStateData *fstate, const char *str, PQExpBuffer pattern)
 
 	while (*str && *str != '#')
 	{
-		while (*str && !isspace(*str) && !strchr("#,.()\"", *str))
+		while (*str && !isspace((unsigned char) *str) && !strchr("#,.()\"", *str))
 		{
 			/*
 			 * Append space only when it is allowed, and when it was found in
@@ -351,7 +351,7 @@ read_pattern(FilterStateData *fstate, const char *str, PQExpBuffer pattern)
 		found_space = false;
 
 		/* skip ending whitespaces */
-		while (isspace(*str))
+		while (isspace((unsigned char) *str))
 		{
 			found_space = true;
 			str++;
@@ -400,7 +400,7 @@ filter_read_item(FilterStateData *fstate,
 		fstate->lineno++;
 
 		/* Skip initial white spaces */
-		while (isspace(*str))
+		while (isspace((unsigned char) *str))
 			str++;
 
 		/*


Re: proposal: possibility to read dumped table's name from file

2023-11-29 Thread Tom Lane
Daniel Gustafsson  writes:
> I took another look at this, found some more polish that was needed, added
> another testcase and ended up pushing it.

mamba is unhappy because this uses  functions without
casting their arguments to unsigned char:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba=2023-11-30%2002%3A53%3A25

(I had not realized that we still had buildfarm animals that would
complain about this ... but I'm glad we do, because it's a hazard.
POSIX is quite clear that the behavior is undefined for signed chars.)

regards, tom lane




Re: proposal: possibility to read dumped table's name from file

2023-11-29 Thread Pavel Stehule
Hi

st 29. 11. 2023 v 15:44 odesílatel Daniel Gustafsson 
napsal:

> > On 22 Nov 2023, at 05:27, Erik Rijkers  wrote:
> >
> > Op 11/21/23 om 22:10 schreef Daniel Gustafsson:
> >>> On 20 Nov 2023, at 06:20, Pavel Stehule 
> wrote:
> >
> >> The attached is pretty close to a committable patch IMO, review is
> welcome on
> >> both the patch and commit message.  I tried to identify all reviewers
> over the
> >> past 3+ years but I might have missed someone.
>
> I took another look at this, found some more polish that was needed, added
> another testcase and ended up pushing it.
>
> > I've tested this, albeit mostly in the initial iterations  (*shrug* but
> a mention is nice)
>
> As I mentioned above it's easy to miss when reviewing three years worth of
> emails, no-one was intentionally left out.  I went back and looked and
> added
> you as a reviewer. Thanks for letting me know.
>

Thank you very much

Regards

Pavel


> --
> Daniel Gustafsson
>
>


Re: proposal: possibility to read dumped table's name from file

2023-11-29 Thread Daniel Gustafsson
> On 22 Nov 2023, at 05:27, Erik Rijkers  wrote:
> 
> Op 11/21/23 om 22:10 schreef Daniel Gustafsson:
>>> On 20 Nov 2023, at 06:20, Pavel Stehule  wrote:
> 
>> The attached is pretty close to a committable patch IMO, review is welcome on
>> both the patch and commit message.  I tried to identify all reviewers over 
>> the
>> past 3+ years but I might have missed someone.

I took another look at this, found some more polish that was needed, added
another testcase and ended up pushing it.

> I've tested this, albeit mostly in the initial iterations  (*shrug* but a 
> mention is nice)

As I mentioned above it's easy to miss when reviewing three years worth of
emails, no-one was intentionally left out.  I went back and looked and added
you as a reviewer. Thanks for letting me know.

--
Daniel Gustafsson





Re: proposal: possibility to read dumped table's name from file

2023-11-21 Thread Erik Rijkers

Op 11/21/23 om 22:10 schreef Daniel Gustafsson:

On 20 Nov 2023, at 06:20, Pavel Stehule  wrote:





The attached is pretty close to a committable patch IMO, review is welcome on
both the patch and commit message.  I tried to identify all reviewers over the
past 3+ years but I might have missed someone.


I've tested this, albeit mostly in the initial iterations  (*shrug* but 
a mention is nice)


Erik Rijkers



--
Daniel Gustafsson







Re: proposal: possibility to read dumped table's name from file

2023-11-21 Thread Daniel Gustafsson
> On 20 Nov 2023, at 06:20, Pavel Stehule  wrote:

> I was pondering replacing the is_include handling with returning an enum for
> the operation, to keep things more future proof in case we add more operations
> (and also a bit less magic IMHO).
> 
> +1
> 
> I did it.

Nice, I think it's an improvement.

+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
This seems like a copy-pasteo, fixed in the attached.

I've spent some time polishing this version of the patch, among other things
trying to make the docs and --help screen consistent across the tools.  I've
added the diff as a txt file to this email (to keep the CFbot from applying
it), it's mainly reformatting a few comments and making things consistent.

The attached is pretty close to a committable patch IMO, review is welcome on
both the patch and commit message.  I tried to identify all reviewers over the
past 3+ years but I might have missed someone.

--
Daniel Gustafsson

commit 4a3c0bdaf3fd21b75e17244691fbeb9340e960e1
Author: Daniel Gustafsson 
Date:   Tue Nov 21 15:08:27 2023 +0100

Fixups and tweaks

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e2f100d552..0e5ba4f712 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -873,49 +873,52 @@ PostgreSQL documentation
 
  
   
-   extension: data on foreign servers, works like
-   --extension. This keyword can only be
+   extension: extensions, works like the
+   --extension option. This keyword can only be
used with the include keyword.
   
  
  
   
foreign_data: data on foreign servers, works like
-   --include-foreign-data. This keyword can only be
-   used with the include keyword.
+   the --include-foreign-data option. This keyword can
+   only be used with the include keyword.
   
  
  
   
-   table: tables, works like
-   -t/--table
+   table: tables, works like the
+   -t/--table option.
   
  
  
   
-   table_and_children: tables, works like
-   --table-and-children
+   table_and_children: tables including any 
partitions
+   or inheritance child tables, works like the
+   --table-and-children option.
   
  
  
   
-   table_data: table data, works like
-   --exclude-table-data. This keyword can only be
-   used with the exclude keyword.
+   table_data: table data of any tables matching
+   pattern, works like the
+   --exclude-table-data option. This keyword can only
+   be used with the exclude keyword.
   
  
  
   
-   table_data_and_children: table data of any
-   partitions or inheritance child, works like
-   --exclude-table-data-and-children. This keyword 
can only be
-   used with the exclude keyword.
+   table_data_and_children: table data of any tables
+   matching pattern as well as any 
partitions
+   or inheritance children of the table(s), works like the
+   --exclude-table-data-and-children option. This
+   keyword can only be used with the exclude 
keyword.
   
  
  
   
-   schema: schemas, works like
-   -n/--schema
+   schema: schemas, works like the
+   -n/--schema option.
   
  
 
@@ -923,9 +926,9 @@ PostgreSQL documentation
 

 Lines starting with # are considered comments and
-ignored. Comments can be placed after filter as well. Blank lines
-are also ignored. See  for how to
-perform quoting in patterns.
+ignored. Comments can be placed after an object pattern row as well.
+Blank lines are also ignored. See 
+for how to perform quoting in patterns.

 

@@ -1715,8 +1718,8 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
   
-   To dump all tables with names starting with mytable, except for table
-   mytable2, specify a filter file
+   To dump all tables whose names start with mytable, except
+   for table mytable2, specify a filter file
filter.txt like:
 
 include table mytable*
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 75ba03f3ad..4d7c046468 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -130,20 +130,29 @@ PostgreSQL documentation
   

 Specify a filename from which to read patterns for databases excluded
-from the dump. The patterns are interpretted according to the same 
rules
+from the dump. The patterns are interpreted according to the 

Re: proposal: possibility to read dumped table's name from file

2023-11-19 Thread Pavel Stehule
Hi

I was pondering replacing the is_include handling with returning an enum for
>> the operation, to keep things more future proof in case we add more
>> operations
>> (and also a bit less magic IMHO).
>>
>
> +1
>

I did it.

Regards

Pavel


>
> Pavel
>
>
>> --
>> Daniel Gustafsson
>>
>>
From 24631830e190cce814ee60a8050126ebc29b7ffa Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sat, 11 Nov 2023 20:34:34 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 +++
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 472 
 src/bin/pg_dump/filter.h|  70 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 119 ++-
 src/bin/pg_dump/pg_dumpall.c|  68 +-
 src/bin/pg_dump/pg_restore.c| 108 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 773 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1775 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8695571045..e2f100d552 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -836,6 +836,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1168,6 +1268,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
 matches, pg_dump will generate an error
 even 

Re: proposal: possibility to read dumped table's name from file

2023-11-13 Thread Pavel Stehule
po 13. 11. 2023 v 14:39 odesílatel Daniel Gustafsson 
napsal:

> > On 13 Nov 2023, at 14:15, Pavel Stehule  wrote:
> >
> > Hi
> >
> > ne 12. 11. 2023 v 14:17 odesílatel Pavel Stehule <
> pavel.steh...@gmail.com > napsal:
> > Hi
> >
> >
> > What are your thoughts on this version?  It's not in a committable state
> as it
> > needs a bit more comments here and there and a triplecheck that nothing
> was
> > missed in changing this, but I prefer to get your thoughts before
> spending the
> > extra time.
> >
> > I think using pointer to exit function is an elegant solution. I checked
> the code and I found only one issue. I fixed warning
> >
> > [13:57:22.578] time make -s -j${BUILD_JOBS} world-bin
> > [13:58:20.858] filter.c: In function ‘pg_log_filter_error’:
> > [13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’
> might be a candidate for ‘gnu_printf’ format attribute
> [-Werror=suggest-attribute=format]
> > [13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);
> > [13:58:20.858] | ^
> > [13:58:20.858] cc1: all warnings being treated as errors
> >
> > and probably copy/paste bug
> >
> > diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
> > index f647bde28d..ab2abedf5f 100644
> > --- a/src/bin/pg_dump/pg_restore.c
> > +++ b/src/bin/pg_dump/pg_restore.c
> > @@ -535,7 +535,7 @@ read_restore_filters(const char *filename,
> RestoreOptions *opts)
> > case FILTER_OBJECT_TYPE_EXTENSION:
> > case FILTER_OBJECT_TYPE_FOREIGN_DATA:
> > pg_log_filter_error(, _("%s filter for \"%s\"
> is not allowed."),
> > -   "exclude",
> > +   "include",
> >
>  filter_object_type_name(objtype));
> > exit_nicely(1);
> >
> > Regards
> >
> > Pavel
> >
> > next update - fix used, but uninitialized  "is_include" variable, when
> filter is of FILTER_OBJECT_TYPE_NONE
>
> Thanks, the posted patchset was indeed a bit of a sketch, thanks for
> fixing up
> these.  I'll go over it again too to clean it up and try to make into
> something
> committable.
>
> I was pondering replacing the is_include handling with returning an enum
> for
> the operation, to keep things more future proof in case we add more
> operations
> (and also a bit less magic IMHO).
>

+1

Pavel


> --
> Daniel Gustafsson
>
>


Re: proposal: possibility to read dumped table's name from file

2023-11-13 Thread Daniel Gustafsson
> On 13 Nov 2023, at 14:15, Pavel Stehule  wrote:
> 
> Hi
> 
> ne 12. 11. 2023 v 14:17 odesílatel Pavel Stehule  > napsal:
> Hi
> 
> 
> What are your thoughts on this version?  It's not in a committable state as it
> needs a bit more comments here and there and a triplecheck that nothing was
> missed in changing this, but I prefer to get your thoughts before spending the
> extra time. 
> 
> I think using pointer to exit function is an elegant solution. I checked the 
> code and I found only one issue. I fixed warning
> 
> [13:57:22.578] time make -s -j${BUILD_JOBS} world-bin
> [13:58:20.858] filter.c: In function ‘pg_log_filter_error’:
> [13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might be 
> a candidate for ‘gnu_printf’ format attribute 
> [-Werror=suggest-attribute=format]
> [13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);
> [13:58:20.858] | ^
> [13:58:20.858] cc1: all warnings being treated as errors
> 
> and probably copy/paste bug
> 
> diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
> index f647bde28d..ab2abedf5f 100644
> --- a/src/bin/pg_dump/pg_restore.c
> +++ b/src/bin/pg_dump/pg_restore.c
> @@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions 
> *opts)
> case FILTER_OBJECT_TYPE_EXTENSION:
> case FILTER_OBJECT_TYPE_FOREIGN_DATA:
> pg_log_filter_error(, _("%s filter for \"%s\" is 
> not allowed."),
> -   "exclude",
> +   "include",
> filter_object_type_name(objtype));
> exit_nicely(1);
> 
> Regards
> 
> Pavel
> 
> next update - fix used, but uninitialized  "is_include" variable, when filter 
> is of FILTER_OBJECT_TYPE_NONE

Thanks, the posted patchset was indeed a bit of a sketch, thanks for fixing up
these.  I'll go over it again too to clean it up and try to make into something
committable.

I was pondering replacing the is_include handling with returning an enum for
the operation, to keep things more future proof in case we add more operations
(and also a bit less magic IMHO).

--
Daniel Gustafsson





Re: proposal: possibility to read dumped table's name from file

2023-11-13 Thread Pavel Stehule
Hi

ne 12. 11. 2023 v 14:17 odesílatel Pavel Stehule 
napsal:

> Hi
>
>
>> What are your thoughts on this version?  It's not in a committable state
>> as it
>> needs a bit more comments here and there and a triplecheck that nothing
>> was
>> missed in changing this, but I prefer to get your thoughts before
>> spending the
>> extra time.
>>
>
> I think using pointer to exit function is an elegant solution. I checked
> the code and I found only one issue. I fixed warning
>
> [13:57:22.578] time make -s -j${BUILD_JOBS} world-bin
> [13:58:20.858] filter.c: In function ‘pg_log_filter_error’:
> [13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might
> be a candidate for ‘gnu_printf’ format attribute
> [-Werror=suggest-attribute=format]
> [13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);
> [13:58:20.858] | ^
> [13:58:20.858] cc1: all warnings being treated as errors
>
> and probably copy/paste bug
>
> diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
> index f647bde28d..ab2abedf5f 100644
> --- a/src/bin/pg_dump/pg_restore.c
> +++ b/src/bin/pg_dump/pg_restore.c
> @@ -535,7 +535,7 @@ read_restore_filters(const char *filename,
> RestoreOptions *opts)
> case FILTER_OBJECT_TYPE_EXTENSION:
> case FILTER_OBJECT_TYPE_FOREIGN_DATA:
> pg_log_filter_error(, _("%s filter for \"%s\"
> is not allowed."),
> -   "exclude",
> +   "include",
> filter_object_type_name(objtype));
> exit_nicely(1);
>
> Regards
>
> Pavel
>

next update - fix used, but uninitialized  "is_include" variable, when
filter is of FILTER_OBJECT_TYPE_NONE

fix crash

# Running: pg_ctl -w -D
/tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/data/t_005_pg_dump_filterfile_main_data/pgdata
-l 
/tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/log/005_pg_dump_filterfile_main.log
-o --cluster-name=main start
waiting for server to start done
server started
# Postmaster PID for node "main" is 71352
# Running: pg_dump -p 65454 -f
/tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/data/t_005_pg_dump_filterfile_main_data/backup/plain.sql
--filter=/tmp/cirrus-ci-build/build-32/testrun/pg_dump/005_pg_dump_filterfile/data/tmp_test_0mO3/inputfile.txt
postgres
../src/bin/pg_dump/pg_dump.c:18800:7: runtime error: load of value 86,
which is not a valid value for type '_Bool'
==71579==Using libbacktrace symbolizer.
#0 0x566302cd in read_dump_filters ../src/bin/pg_dump/pg_dump.c:18800
#1 0x56663429 in main ../src/bin/pg_dump/pg_dump.c:670
#2 0xf7694e45 in __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x1ae45)
#3 0x56624d50 in _start
(/tmp/cirrus-ci-build/build-32/tmp_install/usr/local/pgsql/bin/pg_dump+0x1ad50)

Regards

Pavel

>
>
>>
>> --
>> Daniel Gustafsson
>>
>>
From 235bcd7944e99468c3fe6a65ca5490883b983a70 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Mon, 13 Nov 2023 14:10:49 +0100
Subject: [PATCH 4/4] fix uninitialize is_include variable, runtime error: load
 of value 86, which is not a valid value for type '_Bool'

---
 src/bin/pg_dump/filter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/bin/pg_dump/filter.c b/src/bin/pg_dump/filter.c
index 46970a57e3..2d724b1b35 100644
--- a/src/bin/pg_dump/filter.c
+++ b/src/bin/pg_dump/filter.c
@@ -455,6 +455,7 @@ filter_read_item(FilterStateData *fstate,
 		else
 		{
 			*objname = NULL;
+			*is_include = false;
 			*objtype = FILTER_OBJECT_TYPE_NONE;
 		}
 
-- 
2.41.0

From 13984bc6ca65b20c4a23a65602304ab0bf11c955 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Mon, 13 Nov 2023 08:26:22 +0100
Subject: [PATCH 3/4] add more tests related to unsupported options of
 pg_restore

---
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 58 -
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
index a0aee12543..09d3262b8b 100644
--- a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
+++ b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
@@ -6,7 +6,7 @@ use warnings;
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 98;
+use Test::More tests => 106;
 
 my $tempdir = PostgreSQL::Test::Utils::tempdir;;
 my $inputfile;
@@ -535,6 +535,62 @@ $dump = slurp_file($plainfile);
 ok($dump =~ qr/^CREATE TABLE public\.table_two/m, "wanted table restored");
 ok($dump !~ qr/^CREATE TABLE public\.table_one/m, "unwanted table is not restored");
 
+open $inputfile, '>', "$tempdir/inputfile.txt"
+  or die "unable to open filterfile for writing";
+print $inputfile "include table_data xxx";
+close $inputfile;
+
+command_fails_like(
+	[
+		'pg_restore', '-p', $port, '-f', $plainfile,
+		"--filter=$tempdir/inputfile.txt"
+	],
+	qr/include filter for "table data" is not 

Re: proposal: possibility to read dumped table's name from file

2023-11-12 Thread Pavel Stehule
Hi


> What are your thoughts on this version?  It's not in a committable state
> as it
> needs a bit more comments here and there and a triplecheck that nothing was
> missed in changing this, but I prefer to get your thoughts before spending
> the
> extra time.
>

I think using pointer to exit function is an elegant solution. I checked
the code and I found only one issue. I fixed warning

[13:57:22.578] time make -s -j${BUILD_JOBS} world-bin
[13:58:20.858] filter.c: In function ‘pg_log_filter_error’:
[13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might
be a candidate for ‘gnu_printf’ format attribute
[-Werror=suggest-attribute=format]
[13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp);
[13:58:20.858] | ^
[13:58:20.858] cc1: all warnings being treated as errors

and probably copy/paste bug

diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f647bde28d..ab2abedf5f 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename,
RestoreOptions *opts)
case FILTER_OBJECT_TYPE_EXTENSION:
case FILTER_OBJECT_TYPE_FOREIGN_DATA:
pg_log_filter_error(, _("%s filter for \"%s\" is
not allowed."),
-   "exclude",
+   "include",
filter_object_type_name(objtype));
exit_nicely(1);

Regards

Pavel


>
> --
> Daniel Gustafsson
>
>
From 9be8fb14a3fe75aa4203a059e8372986bf5e6615 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sun, 12 Nov 2023 13:54:26 +0100
Subject: [PATCH 2/2] fix err message

---
 src/bin/pg_dump/pg_restore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f647bde28d..ab2abedf5f 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts)
 case FILTER_OBJECT_TYPE_EXTENSION:
 case FILTER_OBJECT_TYPE_FOREIGN_DATA:
 	pg_log_filter_error(, _("%s filter for \"%s\" is not allowed."),
-		"exclude",
+		"include",
 		filter_object_type_name(objtype));
 	exit_nicely(1);
 
-- 
2.41.0

From cbaae854eca0cc88bb0886abfd45416ad13cffc7 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sat, 11 Nov 2023 20:34:34 +0100
Subject: [PATCH 1/2] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 471 +
 src/bin/pg_dump/filter.h|  60 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 114 +++-
 src/bin/pg_dump/pg_dumpall.c|  68 +-
 src/bin/pg_dump/pg_restore.c| 103 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1698 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8695571045..e2f100d552 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -836,6 +836,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the 

Re: proposal: possibility to read dumped table's name from file

2023-11-09 Thread Daniel Gustafsson
I went and had another look at this.  The patch has been around for 18
commitfests and is widely considered to add a good feature, so it seems about
time to get reach closure.

As I've mentioned in the past I'm not a big fan of the parser, but the thread
has overruled on that.  Another thing I think is a bit overcomplicated is the
layered error handling for printing log messages, and bubbling up of errors to
get around not being able to call exit_nicely.

In the attached version I've boiled down the error logging into a single new
function pg_log_filter_error() which takes a variable format string.  This
removes a fair bit of the extra calls and makes logging easier.  I've also
added a function pointer to the FilterStateData for passing the exit function
via filter_init.  This allows the filtering code to exit gracefully regardless
of which application is using it.  Finally, I've also reimplemented the logic
for checking the parsed tokens into switch statements without defaults in order
to get the compilerwarning on a missed case.  It's easy to miss adding code to
handle a state, especially when adding new ones, and this should help highlight
that.

Overall, this does shave a bit off the patch in size for what IMHO is better
readability and maintainability.  (I've also made a pgindent pass over it of
course).

What are your thoughts on this version?  It's not in a committable state as it
needs a bit more comments here and there and a triplecheck that nothing was
missed in changing this, but I prefer to get your thoughts before spending the
extra time.

--
Daniel Gustafsson



v20231109-0001-possibility-to-read-options-for-dump-from-.patch
Description: Binary data


Re: proposal: possibility to read dumped table's name from file

2023-09-11 Thread Pavel Stehule
Hi

po 11. 9. 2023 v 6:57 odesílatel Pavel Stehule 
napsal:

> Hi
>
> po 11. 9. 2023 v 6:34 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> only rebase
>>
>
> Unfortunately this rebase was not correct. I am sorry.
>
> fixed version
>

and fixed forgotten "break" in switch

Regards

Pavel



>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
From 1f0738992d69d0d748bd4494a9244c353c224ace Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 128 +++-
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1768 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c1e2220b3c..ae1cc522a8 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -835,6 +835,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1165,6 +1265,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
  

Re: proposal: possibility to read dumped table's name from file

2023-09-10 Thread Pavel Stehule
Hi

po 11. 9. 2023 v 6:34 odesílatel Pavel Stehule 
napsal:

> Hi
>
> only rebase
>

Unfortunately this rebase was not correct. I am sorry.

fixed version

Regards

Pavel


>
> Regards
>
> Pavel
>
>
From 32ea3d180ccd976b266bb48c5445c96a1aaf7a54 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 127 +++-
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1767 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c1e2220b3c..ae1cc522a8 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -835,6 +835,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1165,6 +1265,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1608,6 +1709,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;

Re: proposal: possibility to read dumped table's name from file

2023-09-10 Thread Pavel Stehule
Hi

only rebase

Regards

Pavel
From b62d99f51da03b1fb8dae577fc49420bf36a3bad Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 126 
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1767 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c1e2220b3c..ae1cc522a8 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -835,6 +835,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1165,6 +1265,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1608,6 +1709,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter 

Re: proposal: possibility to read dumped table's name from file

2023-03-21 Thread Pavel Stehule
út 21. 3. 2023 v 16:32 odesílatel Justin Pryzby 
napsal:

> On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> > ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby 
> napsal:
> >
> > > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > > rebase + enhancing about related option from a563c24
> > >
> > > Thanks.
> > >
> > > It looks like this doesn't currently handle extensions, which were
> added
> > > at 6568cef26e.
>
> What about this part ?  Should extension filters be supported ?
>

should be fixed


>
> I think the comment that I'd patched that lists all the filter types
> should be minimized, rather than duplicating the list of all the
> possible filters that's already in the usrr-facing documentation.
>

I modified this comment. Please, check

>
> One new typo: childrent
>

fixed

Regards

Pavel
From f91f525f32f51f7c5784dd7af57fe2b692db5e7f Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 114 
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 530 +++
 src/bin/pg_dump/filter.h|  58 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 127 
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1768 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d6b1faa804..17bfc661a9 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -829,6 +829,106 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data,
+-e/--extension for extensions.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   extension: data on foreign servers, works like
+   --extension. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   --table-and-children
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting 

Re: proposal: possibility to read dumped table's name from file

2023-03-21 Thread Pavel Stehule
út 21. 3. 2023 v 16:32 odesílatel Justin Pryzby 
napsal:

> On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> > ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby 
> napsal:
> >
> > > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > > rebase + enhancing about related option from a563c24
> > >
> > > Thanks.
> > >
> > > It looks like this doesn't currently handle extensions, which were
> added
> > > at 6568cef26e.
>
> What about this part ?  Should extension filters be supported ?
>

I missed this, yes, it should be supported.



>
> I think the comment that I'd patched that lists all the filter types
> should be minimized, rather than duplicating the list of all the
> possible filters that's already in the usrr-facing documentation.
>
> One new typo: childrent
>


Re: proposal: possibility to read dumped table's name from file

2023-03-21 Thread Justin Pryzby
On Mon, Mar 20, 2023 at 08:01:13AM +0100, Pavel Stehule wrote:
> ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby  napsal:
> 
> > On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > > rebase + enhancing about related option from a563c24
> >
> > Thanks.
> >
> > It looks like this doesn't currently handle extensions, which were added
> > at 6568cef26e.

What about this part ?  Should extension filters be supported ?

I think the comment that I'd patched that lists all the filter types
should be minimized, rather than duplicating the list of all the
possible filters that's already in the usrr-facing documentation.

One new typo: childrent




Re: proposal: possibility to read dumped table's name from file

2023-03-20 Thread Pavel Stehule
ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby 
napsal:

> On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > rebase + enhancing about related option from a563c24
>
> Thanks.
>
> It looks like this doesn't currently handle extensions, which were added
> at 6568cef26e.
>
> > +   table_and_children: tables, works like
> > +   -t/--table, except that
> > +   it also includes any partitions or inheritance child
> > +   tables of the table(s) matching the
> > +   pattern.
>
> Why doesn't this just say "works like --table-and-children" ?
>

changed


>
> I think as you wrote log_invalid_filter_format(), the messages wouldn't
> be translated, because they're printed via %s.  One option is to call
> _() on the message.
>

fixed


>
> > +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m,   "exclude dumped
> children table");
>
> !=~ is being interpretted as as numeric "!=" and throwing warnings.
> It should be a !~ b, right ?
> It'd be nice if perl warnings during the tests were less easy to miss.
>

should be fixed by you


>
> > + * char is not alpha. The char '_' is allowed too (exclude first
> position).
>



>
> Why is it treated specially?  Could it be treated the same as alpha?
>

It is usual behaviour in Postgres for keywords. Important is the complete
sentence "Returns NULL when the buffer is empty or the first char is not
alpha."

In this case this implementation has no big impact on behaviour - probably
you got a message "unknown keyword" instead of "missing keyword". But I
would
implement behaviour consistent with other places. My opinion in this case
is not extra strong - we can define the form of keywords like we want, just
this is consistent
with other parsers in Postgres.



>
> > + log_invalid_filter_format(,
> > +
>"\"include\" table data filter is not allowed");
> > + log_invalid_filter_format(,
> > +
>"\"include\" table data and children filter is not allowed");
>
> For these, it might be better to write the literal option:
>
> > +
>"include filter for \"table_data_and_children\" is not allowed");
>
> Because the option is a literal and shouldn't be translated.
> And it's probably better to write that using %s, like:
>
> > +
>"include filter for \"%s\" is not allowed");
>

done



>
> That makes shorter and fewer strings.
>
> Find attached a bunch of other corrections as 0002.txt
>

merged

Regards

Pavel
From 844b54e0dcdb65dbf7f86a75b977db16dad07254 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 107 +++
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 509 ++
 src/bin/pg_dump/filter.h|  57 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 114 
 src/bin/pg_dump/pg_dumpall.c|  60 +-
 src/bin/pg_dump/pg_restore.c| 110 +++
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 701 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1710 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d6b1faa804..f3e287b75a 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -829,6 +829,99 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-children for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | table_and_children | schema | foreign_data | table_data | table_data_and_children } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the 

Re: proposal: possibility to read dumped table's name from file

2023-03-19 Thread Pavel Stehule
ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby 
napsal:

> On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > rebase + enhancing about related option from a563c24
>
> Thanks.
>
> It looks like this doesn't currently handle extensions, which were added
> at 6568cef26e.
>
> > +   table_and_children: tables, works like
> > +   -t/--table, except that
> > +   it also includes any partitions or inheritance child
> > +   tables of the table(s) matching the
> > +   pattern.
>
> Why doesn't this just say "works like --table-and-children" ?
>
> I think as you wrote log_invalid_filter_format(), the messages wouldn't
> be translated, because they're printed via %s.  One option is to call
> _() on the message.
>
> > +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m,   "exclude dumped
> children table");
>
> !=~ is being interpretted as as numeric "!=" and throwing warnings.
> It should be a !~ b, right ?
> It'd be nice if perl warnings during the tests were less easy to miss.
>
> > + * char is not alpha. The char '_' is allowed too (exclude first
> position).
>
> Why is it treated specially?  Could it be treated the same as alpha?
>
> > + log_invalid_filter_format(,
> > +
>"\"include\" table data filter is not allowed");
> > + log_invalid_filter_format(,
> > +
>"\"include\" table data and children filter is not allowed");
>
> For these, it might be better to write the literal option:
>
> > +
>"include filter for \"table_data_and_children\" is not allowed");
>
> Because the option is a literal and shouldn't be translated.
> And it's probably better to write that using %s, like:
>
> > +
>"include filter for \"%s\" is not allowed");
>
> That makes shorter and fewer strings.
>
> Find attached a bunch of other corrections as 0002.txt
>

Thank you very much - I'll recheck the mentioned points tomorrow.


>
> I also dug up what I'd started in november, trying to reduce the code
> duplication betwen pg_restore/dump/all.  This isn't done, but I might
> never finish it, so I'll at least show what I have in case you think
> it's a good idea.  This passes tests on CI, except for autoconf, due to
> using exit_nicely() differently.
>

Your implementation reduced 60 lines, but the interface and code is more
complex. I cannot say what is significantly better. Personally, in this
case, I prefer my variant, because I think it is a little bit more
readable, and possible modification can be more simple. But this is just my
opinion, and I have no problem accepting other opinions. I can imagine to
define some configuration array like getopt, but it looks like over
engineering

Regards

Pavel


> --
> Justin
>


Re: proposal: possibility to read dumped table's name from file

2023-03-19 Thread Justin Pryzby
On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> rebase + enhancing about related option from a563c24

Thanks.

It looks like this doesn't currently handle extensions, which were added
at 6568cef26e.

> +   table_and_children: tables, works like
> +   -t/--table, except that
> +   it also includes any partitions or inheritance child
> +   tables of the table(s) matching the
> +   pattern.

Why doesn't this just say "works like --table-and-children" ?

I think as you wrote log_invalid_filter_format(), the messages wouldn't
be translated, because they're printed via %s.  One option is to call
_() on the message.

> +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m,   "exclude dumped children 
> table");

!=~ is being interpretted as as numeric "!=" and throwing warnings.
It should be a !~ b, right ?
It'd be nice if perl warnings during the tests were less easy to miss.

> + * char is not alpha. The char '_' is allowed too (exclude first position).

Why is it treated specially?  Could it be treated the same as alpha?

> + log_invalid_filter_format(,
> + 
>   "\"include\" table data filter is not allowed");
> + log_invalid_filter_format(,
> + 
>   "\"include\" table data and children filter is not allowed");

For these, it might be better to write the literal option:

> + 
>   "include filter for \"table_data_and_children\" is not allowed");

Because the option is a literal and shouldn't be translated.
And it's probably better to write that using %s, like:

> + 
>   "include filter for \"%s\" is not allowed");

That makes shorter and fewer strings.

Find attached a bunch of other corrections as 0002.txt

I also dug up what I'd started in november, trying to reduce the code
duplication betwen pg_restore/dump/all.  This isn't done, but I might
never finish it, so I'll at least show what I have in case you think
it's a good idea.  This passes tests on CI, except for autoconf, due to
using exit_nicely() differently.

-- 
Justin
>From e86d4dca9b4754185a1d559fdea7bf4ee2de8b1a Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH 1/3] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 110 +++
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 489 ++
 src/bin/pg_dump/filter.h|  56 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 117 
 src/bin/pg_dump/pg_dumpall.c|  61 +-
 src/bin/pg_dump/pg_restore.c| 114 
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 701 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1700 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d6b1faa8042..5672fbb273b 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -829,6 +829,102 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-and-children for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | table_and_children | schema | foreign_data | table_data | table_data_and_children } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered 

Re: proposal: possibility to read dumped table's name from file

2023-03-18 Thread Pavel Stehule
Hi

fresh rebase

regards

Pavel
From 09c64d9a5f2ed033c2691ca25ca6bec23320e1b0 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 110 +++
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 489 ++
 src/bin/pg_dump/filter.h|  56 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 117 
 src/bin/pg_dump/pg_dumpall.c|  61 +-
 src/bin/pg_dump/pg_restore.c| 114 
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 701 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1700 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d6b1faa804..5672fbb273 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -829,6 +829,102 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-and-children for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | table_and_children | schema | foreign_data | table_data | table_data_and_children } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   -t/--table, except that
+   it also includes any partitions or inheritance child
+   tables of the table(s) matching the
+   pattern.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+ 
+  
+   table_data_and_children: table data of any
+   partitions or inheritance child, works like
+   --exclude-table-data-and-children. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1159,6 +1255,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) pattern
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table patterns find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1581,6 +1678,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump 

Re: proposal: possibility to read dumped table's name from file

2023-03-16 Thread Pavel Stehule
út 7. 3. 2023 v 3:47 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> On Mon, Mar 06, 2023 at 10:20:32PM +0100, Daniel Gustafsson wrote:
> > > On 6 Mar 2023, at 21:45, Gregory Stark (as CFM) 
> wrote:
> > >
> > > So This patch has been through a lot of commitfests. And it really
> > > doesn't seem that hard to resolve -- Pavel has seemingly been willing
> > > to go along whichever way the wind has been blowing but honestly it
> > > kind of seems like he's just gotten drive-by suggestions and he's put
> > > a lot of work into trying to satisfy them.
> >
> > Agreed.
>
> Indeed, I'm not sure I would have had that much patience.
>
> > > He implemented --include-tables-from-file=... etc. Then he implemented
> > > a hand-written parser for a DSL to select objects, then he implemented
> > > a bison parser, then he went back to the hand-written parser.
> >
> > Well, kind of.  I was trying to take the patch to the finishing line but
> was
> > uncomfortable with the hand written parser so I implemented a parser in
> Bison
> > to replace it with.  Not that hand-written parsers are bad per se (or
> that my
> > bison parser was perfect), but reading quoted identifiers across line
> > boundaries tend to require a fair amount of handwritten code.  Pavel did
> not
> > object to this version, but it was objected to by two other committers.
> >
> > At this point [0] I stepped down from trying to finish it as the
> approach I was
> > comfortable didn't gain traction (which is totally fine).
> >
> > Downthread from this the patch got a lot of reviews from Julien with the
> old
> > parser back in place.
>
> Yeah, and the current state seems quite good to me.
>
> > > Can we get some consensus on whether the DSL looks right
> >
> > I would consider this pretty settled.
>
> Agreed.
>

rebase + enhancing about related option from a563c24

Regards

Pavel
From 76f224be5f88b2a33f9bdc4a38a5bd49c86a7d3b Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 110 +++
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 489 ++
 src/bin/pg_dump/filter.h|  56 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 117 
 src/bin/pg_dump/pg_dumpall.c|  61 +-
 src/bin/pg_dump/pg_restore.c| 114 
 src/bin/pg_dump/t/004_pg_dump_filterfile.pl | 701 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1700 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/004_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index e6b003bf10..2a8cbe41bc 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -829,6 +829,102 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-and-children for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | table_and_children | schema | foreign_data | table_data | table_data_and_children } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   table_and_children: tables, works like
+   -t/--table, except that
+   it also includes any partitions or inheritance child
+   tables of the table(s) matching the
+   pattern.
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 

Re: proposal: possibility to read dumped table's name from file

2023-03-06 Thread Julien Rouhaud
Hi,

On Mon, Mar 06, 2023 at 10:20:32PM +0100, Daniel Gustafsson wrote:
> > On 6 Mar 2023, at 21:45, Gregory Stark (as CFM)  wrote:
> >
> > So This patch has been through a lot of commitfests. And it really
> > doesn't seem that hard to resolve -- Pavel has seemingly been willing
> > to go along whichever way the wind has been blowing but honestly it
> > kind of seems like he's just gotten drive-by suggestions and he's put
> > a lot of work into trying to satisfy them.
>
> Agreed.

Indeed, I'm not sure I would have had that much patience.

> > He implemented --include-tables-from-file=... etc. Then he implemented
> > a hand-written parser for a DSL to select objects, then he implemented
> > a bison parser, then he went back to the hand-written parser.
>
> Well, kind of.  I was trying to take the patch to the finishing line but was
> uncomfortable with the hand written parser so I implemented a parser in Bison
> to replace it with.  Not that hand-written parsers are bad per se (or that my
> bison parser was perfect), but reading quoted identifiers across line
> boundaries tend to require a fair amount of handwritten code.  Pavel did not
> object to this version, but it was objected to by two other committers.
>
> At this point [0] I stepped down from trying to finish it as the approach I 
> was
> comfortable didn't gain traction (which is totally fine).
>
> Downthread from this the patch got a lot of reviews from Julien with the old
> parser back in place.

Yeah, and the current state seems quite good to me.

> > Can we get some consensus on whether the DSL looks right
>
> I would consider this pretty settled.

Agreed.




Re: proposal: possibility to read dumped table's name from file

2023-03-06 Thread Daniel Gustafsson
> On 6 Mar 2023, at 21:45, Gregory Stark (as CFM)  wrote:
> 
> So This patch has been through a lot of commitfests. And it really
> doesn't seem that hard to resolve -- Pavel has seemingly been willing
> to go along whichever way the wind has been blowing but honestly it
> kind of seems like he's just gotten drive-by suggestions and he's put
> a lot of work into trying to satisfy them.

Agreed.

> He implemented --include-tables-from-file=... etc. Then he implemented
> a hand-written parser for a DSL to select objects, then he implemented
> a bison parser, then he went back to the hand-written parser.

Well, kind of.  I was trying to take the patch to the finishing line but was
uncomfortable with the hand written parser so I implemented a parser in Bison
to replace it with.  Not that hand-written parsers are bad per se (or that my
bison parser was perfect), but reading quoted identifiers across line
boundaries tend to require a fair amount of handwritten code.  Pavel did not
object to this version, but it was objected to by two other committers.

At this point [0] I stepped down from trying to finish it as the approach I was
comfortable didn't gain traction (which is totally fine).

Downthread from this the patch got a lot of reviews from Julien with the old
parser back in place.

> Can we get some consensus on whether the DSL looks right

I would consider this pretty settled.

> and whether the hand-written parser is sensible.

This is the part where a committer who wants to pursue the hand-written parser
need to step up. With the amount of review received it's hopefully pretty close.

--
Daniel Gustafsson

[0] 098531e1-fba9-4b7d-884e-0a4363eee...@yesql.se





Re: proposal: possibility to read dumped table's name from file

2023-03-06 Thread Gregory Stark (as CFM)
So This patch has been through a lot of commitfests. And it really
doesn't seem that hard to resolve -- Pavel has seemingly been willing
to go along whichever way the wind has been blowing but honestly it
kind of seems like he's just gotten drive-by suggestions and he's put
a lot of work into trying to satisfy them.

He implemented --include-tables-from-file=... etc. Then he implemented
a hand-written parser for a DSL to select objects, then he implemented
a bison parser, then he went back to the hand-written parser.

Can we get some consensus on whether the DSL looks right and whether
the hand-written parser is sensible. And if so then can a committer
step up to actual review and commit the patch? The last review said it
might need a native English speaker to tweak some wording but
otherwise looked good.

-- 
Gregory Stark
As Commitfest Manager




Re: proposal: possibility to read dumped table's name from file

2022-11-22 Thread Pavel Stehule
Hi

út 22. 11. 2022 v 8:39 odesílatel Andres Freund  napsal:

> Hi,
>
> On 2022-11-13 20:32:47 +0100, Pavel Stehule wrote:
> > updated patch attached
>
> It fails with address sanitizer that's now part of CI:
>
> https://cirrus-ci.com/task/6031397744279552?logs=test_world#L2659
>
> [06:33:11.271] # ==31965==ERROR: AddressSanitizer: heap-buffer-overflow on
> address 0x61900480 at pc 0x559f1ac40822 bp 0x7ffea83e1ad0 sp
> 0x7ffea83e1ac8
> [06:33:11.271] # READ of size 1 at 0x61900480 thread T0
> [06:33:11.271] # #0 0x559f1ac40821 in read_pattern
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302
> [06:33:11.271] # #1 0x559f1ac40e4d in filter_read_item
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:459
> [06:33:11.271] # #2 0x559f1abe6fa5 in read_dump_filters
> /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:18229
> [06:33:11.271] # #3 0x559f1ac2bb1b in main
> /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:630
> [06:33:11.271] # #4 0x7fd91fabfd09 in __libc_start_main
> (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
> [06:33:11.271] # #5 0x559f1abe5d29 in _start
> (/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/pg_dump+0x39d29)
> [06:33:11.271] #
> [06:33:11.271] # 0x61900480 is located 0 bytes to the right of
> 1024-byte region [0x61900080,0x61900480)
> [06:33:11.271] # allocated by thread T0 here:
> [06:33:11.271] # #0 0x7fd91fe14e8f in __interceptor_malloc
> ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
> [06:33:11.271] # #1 0x559f1ac69f35 in pg_malloc_internal
> /tmp/cirrus-ci-build/src/common/fe_memutils.c:30
> [06:33:11.271] # #2 0x559f1ac69f35 in palloc
> /tmp/cirrus-ci-build/src/common/fe_memutils.c:117
> [06:33:11.271] #
> [06:33:11.271] # SUMMARY: AddressSanitizer: heap-buffer-overflow
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 in read_pattern
>
>
should be fixed in attached patch

I found and fix small memleak 24bytes per filter row (PQExpBufferData)

Regards

Pavel


>
> Greetings,
>
> Andres Freund
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..d5a6e2c7ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | table_data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with 

Re: proposal: possibility to read dumped table's name from file

2022-11-21 Thread Pavel Stehule
út 22. 11. 2022 v 8:39 odesílatel Andres Freund  napsal:

> Hi,
>
> On 2022-11-13 20:32:47 +0100, Pavel Stehule wrote:
> > updated patch attached
>
> It fails with address sanitizer that's now part of CI:
>
> https://cirrus-ci.com/task/6031397744279552?logs=test_world#L2659
>
> [06:33:11.271] # ==31965==ERROR: AddressSanitizer: heap-buffer-overflow on
> address 0x61900480 at pc 0x559f1ac40822 bp 0x7ffea83e1ad0 sp
> 0x7ffea83e1ac8
> [06:33:11.271] # READ of size 1 at 0x61900480 thread T0
> [06:33:11.271] # #0 0x559f1ac40821 in read_pattern
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302
> [06:33:11.271] # #1 0x559f1ac40e4d in filter_read_item
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:459
> [06:33:11.271] # #2 0x559f1abe6fa5 in read_dump_filters
> /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:18229
> [06:33:11.271] # #3 0x559f1ac2bb1b in main
> /tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:630
> [06:33:11.271] # #4 0x7fd91fabfd09 in __libc_start_main
> (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
> [06:33:11.271] # #5 0x559f1abe5d29 in _start
> (/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/pg_dump+0x39d29)
> [06:33:11.271] #
> [06:33:11.271] # 0x61900480 is located 0 bytes to the right of
> 1024-byte region [0x61900080,0x61900480)
> [06:33:11.271] # allocated by thread T0 here:
> [06:33:11.271] # #0 0x7fd91fe14e8f in __interceptor_malloc
> ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
> [06:33:11.271] # #1 0x559f1ac69f35 in pg_malloc_internal
> /tmp/cirrus-ci-build/src/common/fe_memutils.c:30
> [06:33:11.271] # #2 0x559f1ac69f35 in palloc
> /tmp/cirrus-ci-build/src/common/fe_memutils.c:117
> [06:33:11.271] #
> [06:33:11.271] # SUMMARY: AddressSanitizer: heap-buffer-overflow
> /tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 in read_pattern
>

I'll check it


>
>
> Greetings,
>
> Andres Freund
>


Re: proposal: possibility to read dumped table's name from file

2022-11-21 Thread Andres Freund
Hi,

On 2022-11-13 20:32:47 +0100, Pavel Stehule wrote:
> updated patch attached

It fails with address sanitizer that's now part of CI:

https://cirrus-ci.com/task/6031397744279552?logs=test_world#L2659

[06:33:11.271] # ==31965==ERROR: AddressSanitizer: heap-buffer-overflow on 
address 0x61900480 at pc 0x559f1ac40822 bp 0x7ffea83e1ad0 sp 0x7ffea83e1ac8
[06:33:11.271] # READ of size 1 at 0x61900480 thread T0
[06:33:11.271] # #0 0x559f1ac40821 in read_pattern 
/tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302
[06:33:11.271] # #1 0x559f1ac40e4d in filter_read_item 
/tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:459
[06:33:11.271] # #2 0x559f1abe6fa5 in read_dump_filters 
/tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:18229
[06:33:11.271] # #3 0x559f1ac2bb1b in main 
/tmp/cirrus-ci-build/src/bin/pg_dump/pg_dump.c:630
[06:33:11.271] # #4 0x7fd91fabfd09 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
[06:33:11.271] # #5 0x559f1abe5d29 in _start 
(/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/pg_dump+0x39d29)
[06:33:11.271] # 
[06:33:11.271] # 0x61900480 is located 0 bytes to the right of 1024-byte 
region [0x61900080,0x61900480)
[06:33:11.271] # allocated by thread T0 here:
[06:33:11.271] # #0 0x7fd91fe14e8f in __interceptor_malloc 
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
[06:33:11.271] # #1 0x559f1ac69f35 in pg_malloc_internal 
/tmp/cirrus-ci-build/src/common/fe_memutils.c:30
[06:33:11.271] # #2 0x559f1ac69f35 in palloc 
/tmp/cirrus-ci-build/src/common/fe_memutils.c:117
[06:33:11.271] # 
[06:33:11.271] # SUMMARY: AddressSanitizer: heap-buffer-overflow 
/tmp/cirrus-ci-build/src/bin/pg_dump/filter.c:302 in read_pattern


Greetings,

Andres Freund




Re: proposal: possibility to read dumped table's name from file

2022-11-21 Thread Pavel Stehule
út 22. 11. 2022 v 6:26 odesílatel Julien Rouhaud 
napsal:

> Hi,
>
> On Sun, Nov 13, 2022 at 08:32:47PM +0100, Pavel Stehule wrote:
> >
> > updated patch attached
>
> Thanks!
>
> Some enhancement could probably be done by a native english speaker, but
> apart
> from that it looks good to me, so hearing no other complaints I'm marking
> the
> CF entry as Ready for Committer!
>

Thank you very much for check and help

Regards

Pavel


Re: proposal: possibility to read dumped table's name from file

2022-11-21 Thread Julien Rouhaud
Hi,

On Sun, Nov 13, 2022 at 08:32:47PM +0100, Pavel Stehule wrote:
>
> updated patch attached

Thanks!

Some enhancement could probably be done by a native english speaker, but apart
from that it looks good to me, so hearing no other complaints I'm marking the
CF entry as Ready for Committer!




Re: proposal: possibility to read dumped table's name from file

2022-11-13 Thread Pavel Stehule
ne 13. 11. 2022 v 9:58 odesílatel Julien Rouhaud 
napsal:

> On Sat, Nov 12, 2022 at 09:35:59PM +0100, Pavel Stehule wrote:
>
> Thanks for the updated patch.  Apart from the function comment it looks
> good to
> me.
>
> Justin, did you have any other comment on the patch?
>
> > > I don't fully understand the part about subpatterns, but is that
> necessary
> > > to
> > > describe it?  Simply saying that any valid and possibly-quoted
> identifier
> > > can
> > > be parsed should make it clear that identifiers containing \n
> characters
> > > should
> > > work too.  Maybe also just mention that whitespaces are removed and
> special
> > > care is taken to output routines in exactly the same way calling code
> will
> > > expect it (that is comma-and-single-space type delimiter).
> > >
> >
> > In this case I hit the limits of my English language skills.
> >
> > I rewrote this comment, but it needs more care. Please, can you look at
> it?
>
> I'm also not a native English speaker so I'm far for writing perfect
> comments
> myself :)
>

far better than mine :)

Thank you very much

updated patch attached

Regards

Pavel


>
> Maybe something like
>
> /*
>  * read_pattern - reads on object pattern from input
>  *
>  * This function will parse any valid identifier (quoted or not, qualified
> or
>  * not), which can also includes the full signature for routines.
>  * Note that this function takes special care to sanitize the detected
>  * identifier (removing extraneous whitespaces or other unnecessary
>  * characters).  This is necessary as most backup/restore filtering
> functions
>  * only recognize identifiers if they are written exactly way as they are
>  * regenerated.
>  * Returns a pointer to next character after the found identifier, or NULL
> on
>  * error.
>  */
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..d5a6e2c7ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | table_data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb  db.sql
 
 
  
diff --git 

Re: proposal: possibility to read dumped table's name from file

2022-11-13 Thread Julien Rouhaud
On Sat, Nov 12, 2022 at 09:35:59PM +0100, Pavel Stehule wrote:

Thanks for the updated patch.  Apart from the function comment it looks good to
me.

Justin, did you have any other comment on the patch?

> > I don't fully understand the part about subpatterns, but is that necessary
> > to
> > describe it?  Simply saying that any valid and possibly-quoted identifier
> > can
> > be parsed should make it clear that identifiers containing \n characters
> > should
> > work too.  Maybe also just mention that whitespaces are removed and special
> > care is taken to output routines in exactly the same way calling code will
> > expect it (that is comma-and-single-space type delimiter).
> >
>
> In this case I hit the limits of my English language skills.
>
> I rewrote this comment, but it needs more care. Please, can you look at it?

I'm also not a native English speaker so I'm far for writing perfect comments
myself :)

Maybe something like

/*
 * read_pattern - reads on object pattern from input
 *
 * This function will parse any valid identifier (quoted or not, qualified or
 * not), which can also includes the full signature for routines.
 * Note that this function takes special care to sanitize the detected
 * identifier (removing extraneous whitespaces or other unnecessary
 * characters).  This is necessary as most backup/restore filtering functions
 * only recognize identifiers if they are written exactly way as they are
 * regenerated.
 * Returns a pointer to next character after the found identifier, or NULL on
 * error.
 */




Re: proposal: possibility to read dumped table's name from file

2022-11-12 Thread Pavel Stehule
Hi

and updated patch

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..d5a6e2c7ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | table_data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb  db.sql
 
 
  
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index e62d05e5ab..9cad26bbe6 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -122,6 +122,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for databases excluded
+from dump. The patterns are interpretted according to the same rules
+as --exclude-database.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for excluding databases,
+and can also be specified more than once for multiple filter files.
+   
+
+   
+The file lists one database pattern per row, with the following format:
+
+exclude database  PATTERN
+
+   
+  
+ 
+
  
   -g
   --globals-only
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 47bd7dbda0..ffeb564c52 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -188,6 +188,31 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects excluded
+or included from restore. The patterns are interpretted according to the
+same rules as --schema, --exclude-schema,
+--function, --index, --table
+or --trigger.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple 

Re: proposal: possibility to read dumped table's name from file

2022-11-12 Thread Pavel Stehule
pá 11. 11. 2022 v 9:11 odesílatel Julien Rouhaud 
napsal:

> Hi,
>
> On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote:
> >
> > pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud 
> > napsal:
> >
> > > > But one thing I noticed is that "optarg" looks wrong here:
> > > >
> > > > simple_string_list_append(>triggerNames, optarg);
> > >
> > > Ah indeed, good catch!  Maybe there should be an explicit test for
> every
> > > (include|exclude) / objtype combination?  It would be a bit verbose
> (and
> > > possibly hard to maintain).
> > >
> >
> > yes - pg_restore is not well covered by  tests, fixed
> >
> > I found another issue. The pg_restore requires a full signature of the
> > function and it is pretty sensitive on white spaces (pg_restore).
>
> Argh, indeed.  It's a good thing to have expanded the regression tests :)
>
> > I made a
> > mistake when I partially parsed patterns like SQL identifiers. It can
> work
> > for simple cases, but when I parse the function's signature it stops
> > working. So I rewrote the parsing pattern part. Now, I just read an input
> > string and I try to reduce spaces. Still multiline identifiers are
> > supported. Against the previous method of pattern parsing, I needed to
> > change just one regress test - now I am not able to detect garbage after
> > pattern :-/.
>
> I'm not sure it's really problematic.  It looks POLA-violation compatible
> with
> regular pg_dump options, for instance:
>
> $ echo "include table t1()" | pg_dump --filter - | ag CREATE
> CREATE TABLE public.t1 (
>
> $ pg_dump -t "t1()" | ag CREATE
> CREATE TABLE public.t1 (
>
> $ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE
> pg_dump: error: no matching tables were found
>
> $ pg_dump -t "t1()blabla" | ag CREATE
> pg_dump: error: no matching tables were found
>
> I don't think the file parsing code should try to be smart about checking
> the
> found patterns.
>
> > * function's filtering doesn't support schema - when the name of function
> > is specified with schema, then the function is not found
>
> Ah I didn't know that.  Indeed it only expect a non-qualified identifier,
> and
> would restore any function that matches the name (and arguments), possibly
> multiple ones if there are variants in different schema.  That's unrelated
> to
> this patch though.
>
> > * the function has to be specified with an argument type list - the
> > separator has to be exactly ", " string. Without space or with one space
> > more, the filtering doesn't work (new implementation of pattern parsing
> > reduces white spaces sensitivity). This is not a bug, but it is not well
> > documented.
>
> Agreed.
>
> > attached updated patch
>
> It looks overall good to me!  I just have a few minor nitpicking
> complaints:
>
> - you removed the pg_strip_clrf() calls and declared everything as "const
> char
>   *", so there's no need to explicitly cast the filter_get_keyword()
> arguments
>   anymore
>

removed


>
> Note also that the code now relies on the fact that there are some non-zero
> bytes after a pattern to know that no errors happened.  It's not a problem
> as
> you should find an EOF marker anyway if CLRF were stripped.
>

I am not sure if I understand this note well?


>
> + * Following routines are called from pg_dump, pg_dumpall and pg_restore.
> + * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore
> + * is different from implementation of this routine in pg_dumpall. So
> instead
> + * of directly calling exit_nicely we have to return some error flag (in
> this
> + * case NULL), and exit_nicelly will be executed from caller's routine.
>
> Slight improvement:
> [...]
> Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore
> is
> different from the one in pg_dumpall, so instead of...
>
> + * read_pattern - reads an pattern from input. The pattern can be mix of
> + * single line or multi line subpatterns. Single line subpattern starts
> first
> + * non white space char, and ending last non space char on line or by char
> + * '#'. The white spaces inside are removed (around char ".()"), or
> reformated
> + * around char ',' or reduced (the multiple spaces are replaced by one).
> + * Multiline subpattern starts by double quote and ending by this char
> too.
> + * The escape rules are same like for SQL quoted literal.
> + *
> + * Routine signalizes error by returning NULL. Otherwise returns pointer
> + * to next char after last processed char in input string.
>
>
> typo: reads "a" pattern from input...
>

fixed


>
> I don't fully understand the part about subpatterns, but is that necessary
> to
> describe it?  Simply saying that any valid and possibly-quoted identifier
> can
> be parsed should make it clear that identifiers containing \n characters
> should
> work too.  Maybe also just mention that whitespaces are removed and special
> care is taken to output routines in exactly the same way calling code will
> expect it (that is comma-and-single-space type 

Re: proposal: possibility to read dumped table's name from file

2022-11-11 Thread Julien Rouhaud
Hi,

On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote:
>
> pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud 
> napsal:
>
> > > But one thing I noticed is that "optarg" looks wrong here:
> > >
> > > simple_string_list_append(>triggerNames, optarg);
> >
> > Ah indeed, good catch!  Maybe there should be an explicit test for every
> > (include|exclude) / objtype combination?  It would be a bit verbose (and
> > possibly hard to maintain).
> >
>
> yes - pg_restore is not well covered by  tests, fixed
>
> I found another issue. The pg_restore requires a full signature of the
> function and it is pretty sensitive on white spaces (pg_restore).

Argh, indeed.  It's a good thing to have expanded the regression tests :)

> I made a
> mistake when I partially parsed patterns like SQL identifiers. It can work
> for simple cases, but when I parse the function's signature it stops
> working. So I rewrote the parsing pattern part. Now, I just read an input
> string and I try to reduce spaces. Still multiline identifiers are
> supported. Against the previous method of pattern parsing, I needed to
> change just one regress test - now I am not able to detect garbage after
> pattern :-/.

I'm not sure it's really problematic.  It looks POLA-violation compatible with
regular pg_dump options, for instance:

$ echo "include table t1()" | pg_dump --filter - | ag CREATE
CREATE TABLE public.t1 (

$ pg_dump -t "t1()" | ag CREATE
CREATE TABLE public.t1 (

$ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE
pg_dump: error: no matching tables were found

$ pg_dump -t "t1()blabla" | ag CREATE
pg_dump: error: no matching tables were found

I don't think the file parsing code should try to be smart about checking the
found patterns.

> * function's filtering doesn't support schema - when the name of function
> is specified with schema, then the function is not found

Ah I didn't know that.  Indeed it only expect a non-qualified identifier, and
would restore any function that matches the name (and arguments), possibly
multiple ones if there are variants in different schema.  That's unrelated to
this patch though.

> * the function has to be specified with an argument type list - the
> separator has to be exactly ", " string. Without space or with one space
> more, the filtering doesn't work (new implementation of pattern parsing
> reduces white spaces sensitivity). This is not a bug, but it is not well
> documented.

Agreed.

> attached updated patch

It looks overall good to me!  I just have a few minor nitpicking complaints:

- you removed the pg_strip_clrf() calls and declared everything as "const char
  *", so there's no need to explicitly cast the filter_get_keyword() arguments
  anymore

Note also that the code now relies on the fact that there are some non-zero
bytes after a pattern to know that no errors happened.  It's not a problem as
you should find an EOF marker anyway if CLRF were stripped.

+ * Following routines are called from pg_dump, pg_dumpall and pg_restore.
+ * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore
+ * is different from implementation of this routine in pg_dumpall. So instead
+ * of directly calling exit_nicely we have to return some error flag (in this
+ * case NULL), and exit_nicelly will be executed from caller's routine.

Slight improvement:
[...]
Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore is
different from the one in pg_dumpall, so instead of...

+ * read_pattern - reads an pattern from input. The pattern can be mix of
+ * single line or multi line subpatterns. Single line subpattern starts first
+ * non white space char, and ending last non space char on line or by char
+ * '#'. The white spaces inside are removed (around char ".()"), or reformated
+ * around char ',' or reduced (the multiple spaces are replaced by one).
+ * Multiline subpattern starts by double quote and ending by this char too.
+ * The escape rules are same like for SQL quoted literal.
+ *
+ * Routine signalizes error by returning NULL. Otherwise returns pointer
+ * to next char after last processed char in input string.


typo: reads "a" pattern from input...

I don't fully understand the part about subpatterns, but is that necessary to
describe it?  Simply saying that any valid and possibly-quoted identifier can
be parsed should make it clear that identifiers containing \n characters should
work too.  Maybe also just mention that whitespaces are removed and special
care is taken to output routines in exactly the same way calling code will
expect it (that is comma-and-single-space type delimiter).




Re: proposal: possibility to read dumped table's name from file

2022-11-05 Thread Pavel Stehule
Hi

pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud 
napsal:

> On Fri, Nov 04, 2022 at 07:19:27AM -0500, Justin Pryzby wrote:
> > On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote:
> > > On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote:
> > > > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud 
> napsal:
> > > > updated patch attached
> > > >
> > > > big thanks for these comments and tips
> > >
> > > Thanks for the updated patch!  As far as I'm concerned the patch is in
> a good
> > > shape, passes the CI and I don't have anything more to say so I'm
> marking it as
> > > Ready for Committer!
> >
> > +1
> >
> > I started looking to see if it's possible to simplify the patch at all,
> > but nothing to show yet.
> >
> > But one thing I noticed is that "optarg" looks wrong here:
> >
> > simple_string_list_append(>triggerNames, optarg);
>
> Ah indeed, good catch!  Maybe there should be an explicit test for every
> (include|exclude) / objtype combination?  It would be a bit verbose (and
> possibly hard to maintain).
>

yes - pg_restore is not well covered by  tests, fixed

I found another issue. The pg_restore requires a full signature of the
function and it is pretty sensitive on white spaces (pg_restore). I made a
mistake when I partially parsed patterns like SQL identifiers. It can work
for simple cases, but when I parse the function's signature it stops
working. So I rewrote the parsing pattern part. Now, I just read an input
string and I try to reduce spaces. Still multiline identifiers are
supported. Against the previous method of pattern parsing, I needed to
change just one regress test - now I am not able to detect garbage after
pattern :-/. It is possible to enter types like "double precision" or
"timestamp with time zone", without needing to check it on the server side.

When I wroted regress tests I found some  issues of pg_restore filtering
options (not related to this patch)

* function's filtering doesn't support schema - when the name of function
is specified with schema, then the function is not found

* the function has to be specified with an argument type list - the
separator has to be exactly ", " string. Without space or with one space
more, the filtering doesn't work (new implementation of pattern parsing
reduces white spaces sensitivity). This is not a bug, but it is not well
documented.

* the trigger filtering is probably broken (on pg_restore side). The name
should be entered in form "tablename triggername"

attached updated patch

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..d5a6e2c7ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | table_data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema 

Re: proposal: possibility to read dumped table's name from file

2022-11-04 Thread Julien Rouhaud
On Fri, Nov 04, 2022 at 02:37:05PM +0100, Pavel Stehule wrote:
> pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud 
> napsal:
>
> > >
> > > But one thing I noticed is that "optarg" looks wrong here:
> > >
> > > simple_string_list_append(>triggerNames, optarg);
> >
> > Ah indeed, good catch!  Maybe there should be an explicit test for every
> > (include|exclude) / objtype combination?  It would be a bit verbose (and
> > possibly hard to maintain).
> >
>
> I'll do it

Thanks a lot Pavel!  I switched the CF entry back to "Waiting on Author" in the
meantime.




Re: proposal: possibility to read dumped table's name from file

2022-11-04 Thread Pavel Stehule
pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud 
napsal:

> On Fri, Nov 04, 2022 at 07:19:27AM -0500, Justin Pryzby wrote:
> > On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote:
> > > On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote:
> > > > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud 
> napsal:
> > > > updated patch attached
> > > >
> > > > big thanks for these comments and tips
> > >
> > > Thanks for the updated patch!  As far as I'm concerned the patch is in
> a good
> > > shape, passes the CI and I don't have anything more to say so I'm
> marking it as
> > > Ready for Committer!
> >
> > +1
> >
> > I started looking to see if it's possible to simplify the patch at all,
> > but nothing to show yet.
> >
> > But one thing I noticed is that "optarg" looks wrong here:
> >
> > simple_string_list_append(>triggerNames, optarg);
>
> Ah indeed, good catch!  Maybe there should be an explicit test for every
> (include|exclude) / objtype combination?  It would be a bit verbose (and
> possibly hard to maintain).
>

I'll do it


Re: proposal: possibility to read dumped table's name from file

2022-11-04 Thread Julien Rouhaud
On Fri, Nov 04, 2022 at 07:19:27AM -0500, Justin Pryzby wrote:
> On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote:
> > On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote:
> > > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud  
> > > napsal:
> > > updated patch attached
> > >
> > > big thanks for these comments and tips
> > 
> > Thanks for the updated patch!  As far as I'm concerned the patch is in a 
> > good
> > shape, passes the CI and I don't have anything more to say so I'm marking 
> > it as
> > Ready for Committer!
> 
> +1
> 
> I started looking to see if it's possible to simplify the patch at all,
> but nothing to show yet.
> 
> But one thing I noticed is that "optarg" looks wrong here:
> 
> simple_string_list_append(>triggerNames, optarg);

Ah indeed, good catch!  Maybe there should be an explicit test for every
(include|exclude) / objtype combination?  It would be a bit verbose (and
possibly hard to maintain).




Re: proposal: possibility to read dumped table's name from file

2022-11-04 Thread Justin Pryzby
On Fri, Nov 04, 2022 at 07:59:01PM +0800, Julien Rouhaud wrote:
> On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote:
> > čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud  napsal:
> > updated patch attached
> >
> > big thanks for these comments and tips
> 
> Thanks for the updated patch!  As far as I'm concerned the patch is in a good
> shape, passes the CI and I don't have anything more to say so I'm marking it 
> as
> Ready for Committer!

+1

I started looking to see if it's possible to simplify the patch at all,
but nothing to show yet.

But one thing I noticed is that "optarg" looks wrong here:

simple_string_list_append(>triggerNames, optarg);

-- 
Justin




Re: proposal: possibility to read dumped table's name from file

2022-11-04 Thread Julien Rouhaud
On Thu, Nov 03, 2022 at 10:22:15PM +0100, Pavel Stehule wrote:
> čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud  napsal:
>
> >
> > Is there any interest with the initial pg_strip_crlf?  AFAICT all the rest
> > of
> > the code will ignore such caracters using isspace() so it wouldn't change
> > anything.
> >
>
> I think reading multiline identifiers is a little bit easier, because I
> don't need to check the ending \n and \r
> When I read multiline identifiers, I cannot ignore white spaces.

Ok.  I don't have a strong objection to it.

>
> updated patch attached
>
> big thanks for these comments and tips

Thanks for the updated patch!  As far as I'm concerned the patch is in a good
shape, passes the CI and I don't have anything more to say so I'm marking it as
Ready for Committer!




Re: proposal: possibility to read dumped table's name from file

2022-11-03 Thread Pavel Stehule
čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> On Wed, Oct 26, 2022 at 06:26:26AM +0200, Pavel Stehule wrote:
> >
> > út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud 
> > napsal:
> >
> > >
> > > I'm wondering if psql's parse_identifier() could be exported and reused
> > > here
> > > rather than creating yet another version.
> > >
> >
> > I looked there, and I don't think this parser is usable for this purpose.
> > It is  very sensitive on white spaces, and doesn't support multi-lines.
> It
> > is designed for support readline tab complete, it is designed for
> > simplicity not for correctness.
>
> Ah, sorry I should have checked more thoroughly.  I guess it's ok to have
> another identifier parser for the include file then, as this new one
> wouldn't
> really fit the tab-completion use case.
>
> > > also, is there any reason why this function doesn't call exit_nicely in
> > > case of
> > > error rather than letting each caller do it without any other cleanup?
> > >
> >
> > It is commented few lines up
> >
> > /*
> >  * Following routines are called from pg_dump, pg_dumpall and pg_restore.
> >  * Unfortunatelly, implementation of exit_nicely in pg_dump and
> pg_restore
> >  * is different from implementation of this rutine in pg_dumpall. So
> instead
> >  * direct calling exit_nicely we have to return some error flag (in this
> >  * case NULL), and exit_nicelly will be executed from caller's routine.
> >  */
>
> Oh right, I totally missed it sorry about that!
>
> About the new version, I didn't find any problem with the feature itself so
> it's a good thing!
>
> I still have a few comments about the patch.  First, about the behavior:
>
> - is that ok to have just "data" pattern instead of "table_data" or
> something
>   like that, since it's supposed to match --exclude-table-data option?
>

done


>
> - the error message are sometimes not super helpful.  For instance:
>
> $ echo "include data t1" | pg_dump --filter -
> pg_dump: error: invalid format of filter on line 1: include filter is not
> allowed for this type of object
>
> It would be nice if the error message mentioned "data" rather than a
> generic
> "this type of object".  Also, maybe we should quote "include" to outline
> that
> we found this keyword?
>
>
done



> About the patch itself:
> filter.c:
>
> +#include "postgres_fe.h"
> +
> +#include "filter.h"
> +
> +#include "common/logging.h"
>
> the filter.h inclusion should be done with the rest of the includes, in
> alphabetical order.
>
>
done




> +#defineis_keyword_str(cstr, str, bytes) \
> +   ((strlen(cstr) == bytes) && (pg_strncasecmp(cstr, str, bytes) ==
> 0))
>
> nit: our guidline is to protect macro arguments with parenthesis.  Some
> arguments can be evaluated multiple times but I don't think it's worth
> adding a
> comment for that.
>
>
done


> + * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
> + * is different from implementation of this rutine in pg_dumpall. So
> instead
> + * direct calling exit_nicely we have to return some error flag (in this
>
> typos: s/Unfortunatelly/Unfortunately/ and s/rutine/routine/
> Also, it would probably be better to say "instead of directly calling..."
>
>
done


> +static const char *
> +filter_object_type_name(FilterObjectType fot)
> +{
> +   switch (fot)
> +   {
> +   case FILTER_OBJECT_TYPE_NONE:
> +   return "comment or empty line";
> +[...]
> +   }
> +
> +   return "unknown object type";
> +}
>
> I'm wondering if we should add a pg_unreachable() there, some compilers
> might
> complain otherwise.  See CreateDestReceiver() for instance for similar
> pattern.
>

done


>
> + * Emit error message "invalid format of filter file ..."
> + *
> + * This is mostly a convenience routine to avoid duplicating file closing
> code
> + * in multiple callsites.
> + */
> +void
> +log_invalid_filter_format(FilterStateData *fstate, char *message)
>
> nit: invalid format *in* filter file...?
>

changed


>
> +void
> +log_unsupported_filter_object_type(FilterStateData *fstate,
> +
>  const char *appname,
> +
>  FilterObjectType fot)
> +{
> +   PQExpBuffer str = createPQExpBuffer();
> +
> +   printfPQExpBuffer(str,
> + "The application \"%s\" doesn't
> support filter for object type \"%s\".",
>
> nit: there shouldn't be uppercase in error messages, especially since this
> will
> be appended to another message by log_invalid_filter_format.  I would just
> just
> drop "The application" entirely for brevity.
>

changed


>
> +/*
> + * Release allocated resources for filter
> + */
> +void
> +filter_free(FilterStateData *fstate)
>
> nit: Release allocated resources for *the given* filter?
>

changed

>
> + * Search for keywords (limited to ascii alphabetic characters) in
> + * the passed in line buffer. Returns NULL, when the buffer is empty or
> first
> + * char is not alpha. The length of the found 

Re: proposal: possibility to read dumped table's name from file

2022-11-02 Thread Julien Rouhaud
Hi,

On Wed, Oct 26, 2022 at 06:26:26AM +0200, Pavel Stehule wrote:
>
> út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud 
> napsal:
>
> >
> > I'm wondering if psql's parse_identifier() could be exported and reused
> > here
> > rather than creating yet another version.
> >
>
> I looked there, and I don't think this parser is usable for this purpose.
> It is  very sensitive on white spaces, and doesn't support multi-lines. It
> is designed for support readline tab complete, it is designed for
> simplicity not for correctness.

Ah, sorry I should have checked more thoroughly.  I guess it's ok to have
another identifier parser for the include file then, as this new one wouldn't
really fit the tab-completion use case.

> > also, is there any reason why this function doesn't call exit_nicely in
> > case of
> > error rather than letting each caller do it without any other cleanup?
> >
>
> It is commented few lines up
>
> /*
>  * Following routines are called from pg_dump, pg_dumpall and pg_restore.
>  * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
>  * is different from implementation of this rutine in pg_dumpall. So instead
>  * direct calling exit_nicely we have to return some error flag (in this
>  * case NULL), and exit_nicelly will be executed from caller's routine.
>  */

Oh right, I totally missed it sorry about that!

About the new version, I didn't find any problem with the feature itself so
it's a good thing!

I still have a few comments about the patch.  First, about the behavior:

- is that ok to have just "data" pattern instead of "table_data" or something
  like that, since it's supposed to match --exclude-table-data option?

- the error message are sometimes not super helpful.  For instance:

$ echo "include data t1" | pg_dump --filter -
pg_dump: error: invalid format of filter on line 1: include filter is not 
allowed for this type of object

It would be nice if the error message mentioned "data" rather than a generic
"this type of object".  Also, maybe we should quote "include" to outline that
we found this keyword?

About the patch itself:
filter.c:

+#include "postgres_fe.h"
+
+#include "filter.h"
+
+#include "common/logging.h"

the filter.h inclusion should be done with the rest of the includes, in
alphabetical order.

+#defineis_keyword_str(cstr, str, bytes) \
+   ((strlen(cstr) == bytes) && (pg_strncasecmp(cstr, str, bytes) == 0))

nit: our guidline is to protect macro arguments with parenthesis.  Some
arguments can be evaluated multiple times but I don't think it's worth adding a
comment for that.

+ * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
+ * is different from implementation of this rutine in pg_dumpall. So instead
+ * direct calling exit_nicely we have to return some error flag (in this

typos: s/Unfortunatelly/Unfortunately/ and s/rutine/routine/
Also, it would probably be better to say "instead of directly calling..."

+static const char *
+filter_object_type_name(FilterObjectType fot)
+{
+   switch (fot)
+   {
+   case FILTER_OBJECT_TYPE_NONE:
+   return "comment or empty line";
+[...]
+   }
+
+   return "unknown object type";
+}

I'm wondering if we should add a pg_unreachable() there, some compilers might
complain otherwise.  See CreateDestReceiver() for instance for similar pattern.

+ * Emit error message "invalid format of filter file ..."
+ *
+ * This is mostly a convenience routine to avoid duplicating file closing code
+ * in multiple callsites.
+ */
+void
+log_invalid_filter_format(FilterStateData *fstate, char *message)

nit: invalid format *in* filter file...?

+void
+log_unsupported_filter_object_type(FilterStateData *fstate,
+   const 
char *appname,
+   
FilterObjectType fot)
+{
+   PQExpBuffer str = createPQExpBuffer();
+
+   printfPQExpBuffer(str,
+ "The application \"%s\" doesn't 
support filter for object type \"%s\".",

nit: there shouldn't be uppercase in error messages, especially since this will
be appended to another message by log_invalid_filter_format.  I would just just
drop "The application" entirely for brevity.

+/*
+ * Release allocated resources for filter
+ */
+void
+filter_free(FilterStateData *fstate)

nit: Release allocated resources for *the given* filter?

+ * Search for keywords (limited to ascii alphabetic characters) in
+ * the passed in line buffer. Returns NULL, when the buffer is empty or first
+ * char is not alpha. The length of the found keyword is returned in the size
+ * parameter.
+ */
+static const char *
+filter_get_keyword(const char **line, int *size)
+{
+   [...]
+   if (isascii(*ptr) && isalpha(*ptr))
+   {
+   result = ptr++;
+
+   while (isascii(*ptr) && (isalpha(*ptr) || *ptr == '_'))
+   

Re: proposal: possibility to read dumped table's name from file

2022-10-25 Thread Pavel Stehule
Hi


út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud 
napsal:

> On Thu, Oct 13, 2022 at 11:46:34AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote:
> > > I am sending version with handy written parser and meson support
> >
> > Given this is a new approach it seems inaccurate to have the CF entry
> marked
> > ready-for-committer. I've updated it to needs-review.
>
> I just had a quick look at the rest of the patch.
>
> For the parser, it seems that filter_get_pattern is reimplementing an
> identifier parsing function but isn't entirely correct.  It can correctly
> parse
> quoted non-qualified identifiers and non-quoted qualified identifiers, but
> not
> quoted and qualified ones.  For instance:
>
> $ echo 'include table nsp.tbl' | pg_dump --filter - >/dev/null
> $echo $?
> 0
>
> $ echo 'include table "TBL"' | pg_dump --filter - >/dev/null
> $echo $?
> 0
>
> $ echo 'include table "NSP"."TBL"' | pg_dump --filter - >/dev/null
> pg_dump: error: invalid format of filter on line 1: unexpected extra data
> after pattern
>

fixed


>
> This should also be covered in the regression tests.
>

done


>
> I'm wondering if psql's parse_identifier() could be exported and reused
> here
> rather than creating yet another version.
>

I looked there, and I don't think this parser is usable for this purpose.
It is  very sensitive on white spaces, and doesn't support multi-lines. It
is designed for support readline tab complete, it is designed for
simplicity not for correctness.


>
> Nitpicking: the comments needs some improvements:
>
> + /*
> +  * Simple routines - just don't repeat same code
> +  *
> +  * Returns true, when filter's file is opened
> +  */
> + bool
> + filter_init(FilterStateData *fstate, const char *filename)
>

done


>
> also, is there any reason why this function doesn't call exit_nicely in
> case of
> error rather than letting each caller do it without any other cleanup?
>

It is commented few lines up

/*
 * Following routines are called from pg_dump, pg_dumpall and pg_restore.
 * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
 * is different from implementation of this rutine in pg_dumpall. So instead
 * direct calling exit_nicely we have to return some error flag (in this
 * case NULL), and exit_nicelly will be executed from caller's routine.
 */


>
> + /*
> +  * Release allocated sources for filter
> +  */
> + void
> + filter_free_sources(FilterStateData *fstate)
>
> I'm assuming "ressources" not "sources"?
>

changed


>
> + /*
> +  * log_format_error - Emit error message
> +  *
> +  * This is mostly a convenience routine to avoid duplicating file
> closing code
> +  * in multiple callsites.
> +  */
> + void
> + log_invalid_filter_format(FilterStateData *fstate, char *message)
>
> mismatch between comment and function name (same for filter_read_item)
>

fixes


>
> + static const char *
> + filter_object_type_name(FilterObjectType fot)
>
> No description.
>
>
fixed


> /*
>  * Helper routine to reduce duplicated code
>  */
> void
> log_unsupported_filter_object_type(FilterStateData *fstate,
>
> const char *appname,
>
> FilterObjectType fot)
>
> Need more helpful comment.
>

fixed

Thank you for comments

attached updated patch

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..955bfcfdad 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. 

Re: proposal: possibility to read dumped table's name from file

2022-10-18 Thread Julien Rouhaud
On Thu, Oct 13, 2022 at 11:46:34AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote:
> > I am sending version with handy written parser and meson support
> 
> Given this is a new approach it seems inaccurate to have the CF entry marked
> ready-for-committer. I've updated it to needs-review.

I just had a quick look at the rest of the patch.

For the parser, it seems that filter_get_pattern is reimplementing an
identifier parsing function but isn't entirely correct.  It can correctly parse
quoted non-qualified identifiers and non-quoted qualified identifiers, but not
quoted and qualified ones.  For instance:

$ echo 'include table nsp.tbl' | pg_dump --filter - >/dev/null
$echo $?
0

$ echo 'include table "TBL"' | pg_dump --filter - >/dev/null
$echo $?
0

$ echo 'include table "NSP"."TBL"' | pg_dump --filter - >/dev/null
pg_dump: error: invalid format of filter on line 1: unexpected extra data after 
pattern

This should also be covered in the regression tests.

I'm wondering if psql's parse_identifier() could be exported and reused here
rather than creating yet another version.  


Nitpicking: the comments needs some improvements:

+ /*
+  * Simple routines - just don't repeat same code
+  *
+  * Returns true, when filter's file is opened
+  */
+ bool
+ filter_init(FilterStateData *fstate, const char *filename)

also, is there any reason why this function doesn't call exit_nicely in case of
error rather than letting each caller do it without any other cleanup?

+ /*
+  * Release allocated sources for filter
+  */
+ void
+ filter_free_sources(FilterStateData *fstate)

I'm assuming "ressources" not "sources"?

+ /*
+  * log_format_error - Emit error message
+  *
+  * This is mostly a convenience routine to avoid duplicating file closing code
+  * in multiple callsites.
+  */
+ void
+ log_invalid_filter_format(FilterStateData *fstate, char *message)

mismatch between comment and function name (same for filter_read_item)

+ static const char *
+ filter_object_type_name(FilterObjectType fot)

No description.

/*
 * Helper routine to reduce duplicated code
 */
void
log_unsupported_filter_object_type(FilterStateData *fstate,
const 
char *appname,

FilterObjectType fot)

Need more helpful comment.




Re: proposal: possibility to read dumped table's name from file

2022-10-13 Thread Andres Freund
Hi,

On 2022-10-07 07:26:08 +0200, Pavel Stehule wrote:
> I am sending version with handy written parser and meson support

Given this is a new approach it seems inaccurate to have the CF entry marked
ready-for-committer. I've updated it to needs-review.

Greetings,

Andres Freund




Re: proposal: possibility to read dumped table's name from file

2022-10-07 Thread Pavel Stehule
pá 7. 10. 2022 v 16:03 odesílatel Julien Rouhaud 
napsal:

> On Fri, Oct 07, 2022 at 07:26:08AM +0200, Pavel Stehule wrote:
> >
> > I checked this code again, and I don't think some refactoring is easy.
> > getFiltersFromFile is not duplicated. It is just probably badly named.
> >
> > These routines are used from pg_dump, pg_dumpall and pg_restore. There
> are
> > significant differences in supported objects and in types used for
> returned
> > lists (dumpOptions, SimpleStringList, and RestoreOptions). If I have one
> > routine, then I need to implement some mechanism for specification of
> > supported objects, and a special type that can be used as a proxy between
> > caller and parser to hold lists of parsed values. To be names less
> > confusing I renamed them to read_dump_filters, read_dumpall_filters and
> > read_restore_filters
>
> Ah right, I missed the different argument types.  Now that the functions
> have
> improved names it looks way clearer, and it seems just fine!
>

Thank you for check

Pavel


Re: proposal: possibility to read dumped table's name from file

2022-10-07 Thread Julien Rouhaud
On Fri, Oct 07, 2022 at 07:26:08AM +0200, Pavel Stehule wrote:
>
> I checked this code again, and I don't think some refactoring is easy.
> getFiltersFromFile is not duplicated. It is just probably badly named.
>
> These routines are used from pg_dump, pg_dumpall and pg_restore. There are
> significant differences in supported objects and in types used for returned
> lists (dumpOptions, SimpleStringList, and RestoreOptions). If I have one
> routine, then I need to implement some mechanism for specification of
> supported objects, and a special type that can be used as a proxy between
> caller and parser to hold lists of parsed values. To be names less
> confusing I renamed them to read_dump_filters, read_dumpall_filters and
> read_restore_filters

Ah right, I missed the different argument types.  Now that the functions have
improved names it looks way clearer, and it seems just fine!




Re: proposal: possibility to read dumped table's name from file

2022-10-06 Thread Pavel Stehule
Hi

I am sending version with handy written parser and meson support

po 3. 10. 2022 v 6:34 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> > You started rewriting it, but you didn't finish it.
> >
> > Unfortunately, there is not a clean opinion on using bison's parser for
> > this purpose. I understand that the complexity of this language is too
> low,
> > so the benefit of using bison's gramatic is low too. Personally, I have
> not
> > any problem using bison for this purpose. For this case, I think we
> compare
> > two similarly long ways, but unfortunately, customers that have a problem
> > with long command lines still have this problem.
> >
> > Can we go forward? Daniel is strongly against handwritten parser. Is
> there
> > somebody strongly against bison's based parser? There is not any other
> way.
>
> I don't have a strong opinion either, but it seems that 2 people argued
> against
> a bison parser (vs only 1 arguing for) and the fact that the current habit
> is
> to rely on hand written parsers for simple cases (e.g. jsonapi.c /
> pg_parse_json()), it seems that we should go back to Pavel's original
> parser.
>
> I only had a quick look but it indeed seems trivial, it just maybe need a
> bit
> of refactoring to avoid some code duplication (getFiltersFromFile is
> duplicated, and getDatabaseExcludeFiltersFromFile could be removed if
> getFiltersFromFile knew about the 2 patterns).
>

I checked this code again, and I don't think some refactoring is easy.
getFiltersFromFile is not duplicated. It is just probably badly named.

These routines are used from pg_dump, pg_dumpall and pg_restore. There are
significant differences in supported objects and in types used for returned
lists (dumpOptions, SimpleStringList, and RestoreOptions). If I have one
routine, then I need to implement some mechanism for specification of
supported objects, and a special type that can be used as a proxy between
caller and parser to hold lists of parsed values. To be names less
confusing I renamed them to read_dump_filters, read_dumpall_filters and
read_restore_filters

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..955bfcfdad 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t 

Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Julien Rouhaud
Hi,

On Mon, Oct 03, 2022 at 06:00:12AM +0200, Pavel Stehule wrote:
> ne 2. 10. 2022 v 22:52 odesílatel Daniel Gustafsson 
> napsal:
> 
> > > On 2 Oct 2022, at 18:04, Andres Freund  wrote:
> > > On 2022-10-02 00:19:59 -0700, Andres Freund wrote:
> > >> On 2022-10-01 23:56:59 -0700, Andres Freund wrote:
> > >>> On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
> >  Correct, fixed in the attached.
> > >>>
> > >>> Updated patch adding meson compatibility attached.
> > >>
> > >> Err, forgot to amend one hunk :(
> > >
> > > That fixed it on all platforms but windows, due to copy-pasto. I really
> > should
> > > have stopped earlier yesterday...
> >
> > Thanks for updating the patch!
> >
> > The parser in the original submission was -1'd by me, and the current
> > version
> > proposed as an alternative.  This was subsequently -1'd as well but no
> > updated
> > patch with a rewritten parser has been posted.  So this is now stalled
> > again.
> >
> 
> You started rewriting it, but you didn't finish it.
> 
> Unfortunately, there is not a clean opinion on using bison's parser for
> this purpose. I understand that the complexity of this language is too low,
> so the benefit of using bison's gramatic is low too. Personally, I have not
> any problem using bison for this purpose. For this case, I think we compare
> two similarly long ways, but unfortunately, customers that have a problem
> with long command lines still have this problem.
> 
> Can we go forward? Daniel is strongly against handwritten parser. Is there
> somebody strongly against bison's based parser? There is not any other way.

I don't have a strong opinion either, but it seems that 2 people argued against
a bison parser (vs only 1 arguing for) and the fact that the current habit is
to rely on hand written parsers for simple cases (e.g. jsonapi.c /
pg_parse_json()), it seems that we should go back to Pavel's original parser.

I only had a quick look but it indeed seems trivial, it just maybe need a bit
of refactoring to avoid some code duplication (getFiltersFromFile is
duplicated, and getDatabaseExcludeFiltersFromFile could be removed if
getFiltersFromFile knew about the 2 patterns).




Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Pavel Stehule
ne 2. 10. 2022 v 22:52 odesílatel Daniel Gustafsson 
napsal:

> > On 2 Oct 2022, at 18:04, Andres Freund  wrote:
> > On 2022-10-02 00:19:59 -0700, Andres Freund wrote:
> >> On 2022-10-01 23:56:59 -0700, Andres Freund wrote:
> >>> On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
>  Correct, fixed in the attached.
> >>>
> >>> Updated patch adding meson compatibility attached.
> >>
> >> Err, forgot to amend one hunk :(
> >
> > That fixed it on all platforms but windows, due to copy-pasto. I really
> should
> > have stopped earlier yesterday...
>
> Thanks for updating the patch!
>
> The parser in the original submission was -1'd by me, and the current
> version
> proposed as an alternative.  This was subsequently -1'd as well but no
> updated
> patch with a rewritten parser has been posted.  So this is now stalled
> again.
>

You started rewriting it, but you didn't finish it.

Unfortunately, there is not a clean opinion on using bison's parser for
this purpose. I understand that the complexity of this language is too low,
so the benefit of using bison's gramatic is low too. Personally, I have not
any problem using bison for this purpose. For this case, I think we compare
two similarly long ways, but unfortunately, customers that have a problem
with long command lines still have this problem.

Can we go forward? Daniel is strongly against handwritten parser. Is there
somebody strongly against bison's based parser? There is not any other way.

I am able to complete Daniel's patch, if there will not be objections.

Regards

Pavel







> Having been around in 12 commitfests without a committer feeling confident
> about pushing this I plan to mark it returned with feedback, and if a new
> parser materializes itc can be readded instead of being dragged along.
>
> > c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers.
> >
> > This is a common enough mistake that I'm wondering if we could automate
> > warning about it somehow.
>
> Maybe we can add a simple git grep invocation in the CompilerWarnings CI
> job to
> catch this in the CFBot?  If something like the below sketch matches then
> we
> can throw an error. (only for illustration, all three files needs to
> checked).
>
> git grep "\"c\.h" -- *.h :^src/include/postgres*.h;
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>


Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Andres Freund
On 2022-10-02 22:52:33 +0200, Daniel Gustafsson wrote:
> The parser in the original submission was -1'd by me, and the current version
> proposed as an alternative.  This was subsequently -1'd as well but no updated
> patch with a rewritten parser has been posted.  So this is now stalled again.
> 
> Having been around in 12 commitfests without a committer feeling confident
> about pushing this I plan to mark it returned with feedback, and if a new
> parser materializes itc can be readded instead of being dragged along.

Makes sense to me.




Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Tom Lane
Daniel Gustafsson  writes:
> On 2 Oct 2022, at 18:04, Andres Freund  wrote:
>> c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers.
>> This is a common enough mistake that I'm wondering if we could automate
>> warning about it somehow.

> Maybe we can add a simple git grep invocation in the CompilerWarnings CI job 
> to
> catch this in the CFBot?

I'd be inclined to teach headerscheck or cpluspluscheck about it.

regards, tom lane




Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Daniel Gustafsson
> On 2 Oct 2022, at 18:04, Andres Freund  wrote:
> On 2022-10-02 00:19:59 -0700, Andres Freund wrote:
>> On 2022-10-01 23:56:59 -0700, Andres Freund wrote:
>>> On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
 Correct, fixed in the attached.
>>> 
>>> Updated patch adding meson compatibility attached.
>> 
>> Err, forgot to amend one hunk :(
> 
> That fixed it on all platforms but windows, due to copy-pasto. I really should
> have stopped earlier yesterday...

Thanks for updating the patch!

The parser in the original submission was -1'd by me, and the current version
proposed as an alternative.  This was subsequently -1'd as well but no updated
patch with a rewritten parser has been posted.  So this is now stalled again.

Having been around in 12 commitfests without a committer feeling confident
about pushing this I plan to mark it returned with feedback, and if a new
parser materializes itc can be readded instead of being dragged along.

> c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers.
> 
> This is a common enough mistake that I'm wondering if we could automate
> warning about it somehow.

Maybe we can add a simple git grep invocation in the CompilerWarnings CI job to
catch this in the CFBot?  If something like the below sketch matches then we
can throw an error. (only for illustration, all three files needs to checked).

git grep "\"c\.h" -- *.h :^src/include/postgres*.h;

--
Daniel Gustafsson   https://vmware.com/





Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Andres Freund
Hi,

On 2022-10-02 00:19:59 -0700, Andres Freund wrote:
> On 2022-10-01 23:56:59 -0700, Andres Freund wrote:
> > On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
> > > Correct, fixed in the attached.
> > 
> > Updated patch adding meson compatibility attached.
> 
> Err, forgot to amend one hunk :(

That fixed it on all platforms but windows, due to copy-pasto. I really should
have stopped earlier yesterday...


> +/*-
> + *
> + * filter.h
> + * Common header file for the parser of filter file
> + *
> + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * src/bin/pg_dump/filter.h
> + *
> + *-
> + */
> +#ifndef FILTER_H
> +#define FILTER_H
> +#include "c.h"

c.h (and postgres.h, postgres_fe.h) shouldn't be included in headers.


This is a common enough mistake that I'm wondering if we could automate
warning about it somehow.

Greetings,

Andres Freund
>From e0dff3c3275a69b53fdf1c628c204365bb96852f Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson 
Date: Wed, 7 Sep 2022 15:22:05 +0200
Subject: [PATCH v7] Add include/exclude filtering via file in pg_dump

Author: Pavel Stehule 
Discussion: https://postgr.es/m/CAFj8pRB10wvW0CC9Xq=1XDs=zcqxer3cblcnza+qix4cuh-...@mail.gmail.com
---
 src/bin/pg_dump/.gitignore|   4 +
 src/bin/pg_dump/Makefile  |  17 +++-
 src/bin/pg_dump/filter.h  |  44 ++
 src/bin/pg_dump/filterparse.y |  64 ++
 src/bin/pg_dump/filterscan.l  | 139 ++
 src/bin/pg_dump/meson.build   |  17 
 src/bin/pg_dump/pg_backup_utils.c |  33 +++
 src/bin/pg_dump/pg_backup_utils.h |   1 +
 src/bin/pg_dump/pg_dump.c |  56 
 src/bin/pg_dump/pg_dumpall.c  |  65 +-
 src/bin/pg_dump/pg_restore.c  |  58 +
 doc/src/sgml/ref/pg_dump.sgml |  88 +++
 doc/src/sgml/ref/pg_dumpall.sgml  |  22 +
 doc/src/sgml/ref/pg_restore.sgml  |  25 ++
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 15 files changed, 633 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/filterparse.y
 create mode 100644 src/bin/pg_dump/filterscan.l

diff --git a/src/bin/pg_dump/.gitignore b/src/bin/pg_dump/.gitignore
index e6d78127793..11f2d68bea0 100644
--- a/src/bin/pg_dump/.gitignore
+++ b/src/bin/pg_dump/.gitignore
@@ -2,4 +2,8 @@
 /pg_dumpall
 /pg_restore
 
+# Local generated source files
+/filterparse.c
+/filterscan.c
+
 /tmp_check/
diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index 9dc5a784dd2..e3befdc9b1f 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -26,6 +26,8 @@ OBJS = \
 	$(WIN32RES) \
 	compress_io.o \
 	dumputils.o \
+	filterparse.o \
+	filterscan.o \
 	parallel.o \
 	pg_backup_archiver.o \
 	pg_backup_custom.o \
@@ -37,14 +39,23 @@ OBJS = \
 
 all: pg_dump pg_restore pg_dumpall
 
+# See notes in src/backend/parser/Makefile about the following two rules
+filterparse.h: filterparse.c
+	touch $@
+
+filterparse.c: BISONFLAGS += -d
+
+# Force these dependencies to be known even without dependency info built:
+filterparse.o filterscan.o: filterparse.h
+
 pg_dump: pg_dump.o common.o pg_dump_sort.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
 	$(CC) $(CFLAGS) pg_dump.o common.o pg_dump_sort.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 pg_restore: pg_restore.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
 	$(CC) $(CFLAGS) pg_restore.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
-pg_dumpall: pg_dumpall.o dumputils.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-	$(CC) $(CFLAGS) pg_dumpall.o dumputils.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_dumpall: pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_dump$(X) '$(DESTDIR)$(bindir)'/pg_dump$(X)
@@ -63,6 +74,8 @@ installcheck:
 uninstall:
 	rm -f $(addprefix '$(DESTDIR)$(bindir)'/, pg_dump$(X) pg_restore$(X) pg_dumpall$(X))
 
+distprep: filterparse.c filterscan.c
+
 clean distclean maintainer-clean:
 	rm -f pg_dump$(X) pg_restore$(X) pg_dumpall$(X) $(OBJS) pg_dump.o common.o pg_dump_sort.o pg_restore.o pg_dumpall.o
 	rm -rf tmp_check
diff --git a/src/bin/pg_dump/filter.h b/src/bin/pg_dump/filter.h
new file mode 100644
index 000..5dff4161f02
--- /dev/null
+++ b/src/bin/pg_dump/filter.h
@@ -0,0 +1,44 @@
+/*-
+ *
+ * filter.h
+ *	  Common header file for the parser 

Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Andres Freund
Hi,

On 2022-10-01 23:56:59 -0700, Andres Freund wrote:
> On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
> > Correct, fixed in the attached.
> 
> Updated patch adding meson compatibility attached.

Err, forgot to amend one hunk :(

Greetings,

Andres Freund
>From fe2926ccd49a460cbaa39a8916a4dd097463b294 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson 
Date: Wed, 7 Sep 2022 15:22:05 +0200
Subject: [PATCH v6] Add include/exclude filtering via file in pg_dump

Author: Pavel Stehule 
Discussion: https://postgr.es/m/CAFj8pRB10wvW0CC9Xq=1XDs=zcqxer3cblcnza+qix4cuh-...@mail.gmail.com
---
 src/bin/pg_dump/.gitignore|   4 +
 src/bin/pg_dump/Makefile  |  17 +++-
 src/bin/pg_dump/filter.h  |  44 ++
 src/bin/pg_dump/filterparse.y |  64 ++
 src/bin/pg_dump/filterscan.l  | 139 ++
 src/bin/pg_dump/meson.build   |  18 
 src/bin/pg_dump/pg_backup_utils.c |  33 +++
 src/bin/pg_dump/pg_backup_utils.h |   1 +
 src/bin/pg_dump/pg_dump.c |  56 
 src/bin/pg_dump/pg_dumpall.c  |  65 +-
 src/bin/pg_dump/pg_restore.c  |  58 +
 doc/src/sgml/ref/pg_dump.sgml |  88 +++
 doc/src/sgml/ref/pg_dumpall.sgml  |  22 +
 doc/src/sgml/ref/pg_restore.sgml  |  25 ++
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 15 files changed, 634 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/filterparse.y
 create mode 100644 src/bin/pg_dump/filterscan.l

diff --git a/src/bin/pg_dump/.gitignore b/src/bin/pg_dump/.gitignore
index e6d78127793..11f2d68bea0 100644
--- a/src/bin/pg_dump/.gitignore
+++ b/src/bin/pg_dump/.gitignore
@@ -2,4 +2,8 @@
 /pg_dumpall
 /pg_restore
 
+# Local generated source files
+/filterparse.c
+/filterscan.c
+
 /tmp_check/
diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index 9dc5a784dd2..e3befdc9b1f 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -26,6 +26,8 @@ OBJS = \
 	$(WIN32RES) \
 	compress_io.o \
 	dumputils.o \
+	filterparse.o \
+	filterscan.o \
 	parallel.o \
 	pg_backup_archiver.o \
 	pg_backup_custom.o \
@@ -37,14 +39,23 @@ OBJS = \
 
 all: pg_dump pg_restore pg_dumpall
 
+# See notes in src/backend/parser/Makefile about the following two rules
+filterparse.h: filterparse.c
+	touch $@
+
+filterparse.c: BISONFLAGS += -d
+
+# Force these dependencies to be known even without dependency info built:
+filterparse.o filterscan.o: filterparse.h
+
 pg_dump: pg_dump.o common.o pg_dump_sort.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
 	$(CC) $(CFLAGS) pg_dump.o common.o pg_dump_sort.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 pg_restore: pg_restore.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
 	$(CC) $(CFLAGS) pg_restore.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
-pg_dumpall: pg_dumpall.o dumputils.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-	$(CC) $(CFLAGS) pg_dumpall.o dumputils.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_dumpall: pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_dump$(X) '$(DESTDIR)$(bindir)'/pg_dump$(X)
@@ -63,6 +74,8 @@ installcheck:
 uninstall:
 	rm -f $(addprefix '$(DESTDIR)$(bindir)'/, pg_dump$(X) pg_restore$(X) pg_dumpall$(X))
 
+distprep: filterparse.c filterscan.c
+
 clean distclean maintainer-clean:
 	rm -f pg_dump$(X) pg_restore$(X) pg_dumpall$(X) $(OBJS) pg_dump.o common.o pg_dump_sort.o pg_restore.o pg_dumpall.o
 	rm -rf tmp_check
diff --git a/src/bin/pg_dump/filter.h b/src/bin/pg_dump/filter.h
new file mode 100644
index 000..5dff4161f02
--- /dev/null
+++ b/src/bin/pg_dump/filter.h
@@ -0,0 +1,44 @@
+/*-
+ *
+ * filter.h
+ *	  Common header file for the parser of filter file
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_dump/filter.h
+ *
+ *-
+ */
+#ifndef FILTER_H
+#define FILTER_H
+#include "c.h"
+
+/*
+ * List of objects that can be specified in filter file
+ */
+typedef enum
+{
+	FILTER_OBJECT_TYPE_NONE,
+	FILTER_OBJECT_TYPE_DATA,
+	FILTER_OBJECT_TYPE_DATABASE,
+	FILTER_OBJECT_TYPE_FOREIGN_DATA,
+	FILTER_OBJECT_TYPE_FUNCTION,
+	FILTER_OBJECT_TYPE_INDEX,
+	FILTER_OBJECT_TYPE_SCHEMA,
+	FILTER_OBJECT_TYPE_TABLE,
+	FILTER_OBJECT_TYPE_TRIGGER,
+}			FilterObjectType;
+
+extern FILE *filter_yyin;
+
+extern int filter_yylex(void);
+extern void filter_yyerror(void *priv, const char *msg);
+extern void 

Re: proposal: possibility to read dumped table's name from file

2022-10-02 Thread Andres Freund
Hi,

On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
> Correct, fixed in the attached.

Updated patch adding meson compatibility attached.

Greetings,

Andres Freund
>From 5d3ba4d2d6567626ccc0019208ea4c0ea91ac866 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson 
Date: Wed, 7 Sep 2022 15:22:05 +0200
Subject: [PATCH v5] Add include/exclude filtering via file in pg_dump

Author: Pavel Stehule 
Discussion: https://postgr.es/m/CAFj8pRB10wvW0CC9Xq=1XDs=zcqxer3cblcnza+qix4cuh-...@mail.gmail.com
---
 src/bin/pg_dump/.gitignore|   4 +
 src/bin/pg_dump/Makefile  |  17 +++-
 src/bin/pg_dump/filter.h  |  44 ++
 src/bin/pg_dump/filterparse.y |  64 ++
 src/bin/pg_dump/filterscan.l  | 139 ++
 src/bin/pg_dump/meson.build   |  18 
 src/bin/pg_dump/pg_backup_utils.c |  33 +++
 src/bin/pg_dump/pg_backup_utils.h |   1 +
 src/bin/pg_dump/pg_dump.c |  56 
 src/bin/pg_dump/pg_dumpall.c  |  65 +-
 src/bin/pg_dump/pg_restore.c  |  58 +
 doc/src/sgml/ref/pg_dump.sgml |  88 +++
 doc/src/sgml/ref/pg_dumpall.sgml  |  22 +
 doc/src/sgml/ref/pg_restore.sgml  |  25 ++
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 15 files changed, 634 insertions(+), 4 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/filterparse.y
 create mode 100644 src/bin/pg_dump/filterscan.l

diff --git a/src/bin/pg_dump/.gitignore b/src/bin/pg_dump/.gitignore
index e6d78127793..11f2d68bea0 100644
--- a/src/bin/pg_dump/.gitignore
+++ b/src/bin/pg_dump/.gitignore
@@ -2,4 +2,8 @@
 /pg_dumpall
 /pg_restore
 
+# Local generated source files
+/filterparse.c
+/filterscan.c
+
 /tmp_check/
diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index 9dc5a784dd2..e3befdc9b1f 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -26,6 +26,8 @@ OBJS = \
 	$(WIN32RES) \
 	compress_io.o \
 	dumputils.o \
+	filterparse.o \
+	filterscan.o \
 	parallel.o \
 	pg_backup_archiver.o \
 	pg_backup_custom.o \
@@ -37,14 +39,23 @@ OBJS = \
 
 all: pg_dump pg_restore pg_dumpall
 
+# See notes in src/backend/parser/Makefile about the following two rules
+filterparse.h: filterparse.c
+	touch $@
+
+filterparse.c: BISONFLAGS += -d
+
+# Force these dependencies to be known even without dependency info built:
+filterparse.o filterscan.o: filterparse.h
+
 pg_dump: pg_dump.o common.o pg_dump_sort.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
 	$(CC) $(CFLAGS) pg_dump.o common.o pg_dump_sort.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 pg_restore: pg_restore.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
 	$(CC) $(CFLAGS) pg_restore.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
-pg_dumpall: pg_dumpall.o dumputils.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-	$(CC) $(CFLAGS) pg_dumpall.o dumputils.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+pg_dumpall: pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+	$(CC) $(CFLAGS) pg_dumpall.o dumputils.o filterparse.o filterscan.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_dump$(X) '$(DESTDIR)$(bindir)'/pg_dump$(X)
@@ -63,6 +74,8 @@ installcheck:
 uninstall:
 	rm -f $(addprefix '$(DESTDIR)$(bindir)'/, pg_dump$(X) pg_restore$(X) pg_dumpall$(X))
 
+distprep: filterparse.c filterscan.c
+
 clean distclean maintainer-clean:
 	rm -f pg_dump$(X) pg_restore$(X) pg_dumpall$(X) $(OBJS) pg_dump.o common.o pg_dump_sort.o pg_restore.o pg_dumpall.o
 	rm -rf tmp_check
diff --git a/src/bin/pg_dump/filter.h b/src/bin/pg_dump/filter.h
new file mode 100644
index 000..5dff4161f02
--- /dev/null
+++ b/src/bin/pg_dump/filter.h
@@ -0,0 +1,44 @@
+/*-
+ *
+ * filter.h
+ *	  Common header file for the parser of filter file
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_dump/filter.h
+ *
+ *-
+ */
+#ifndef FILTER_H
+#define FILTER_H
+#include "c.h"
+
+/*
+ * List of objects that can be specified in filter file
+ */
+typedef enum
+{
+	FILTER_OBJECT_TYPE_NONE,
+	FILTER_OBJECT_TYPE_DATA,
+	FILTER_OBJECT_TYPE_DATABASE,
+	FILTER_OBJECT_TYPE_FOREIGN_DATA,
+	FILTER_OBJECT_TYPE_FUNCTION,
+	FILTER_OBJECT_TYPE_INDEX,
+	FILTER_OBJECT_TYPE_SCHEMA,
+	FILTER_OBJECT_TYPE_TABLE,
+	FILTER_OBJECT_TYPE_TRIGGER,
+}			FilterObjectType;
+
+extern FILE *filter_yyin;
+
+extern int filter_yylex(void);
+extern void filter_yyerror(void *priv, const char *msg);
+extern void filter_scanner_init(void);
+extern void filter_scanner_finish(void);
+extern int filter_yyparse(void 

Re: proposal: possibility to read dumped table's name from file

2022-09-22 Thread Andres Freund
Hi,

On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
> > On 9 Sep 2022, at 11:00, Andrew Dunstan  wrote:
> > 
> >> On Sep 9, 2022, at 5:53 PM, John Naylor  
> >> wrote:
> >> 
> >> Note that the grammar has shift-reduce conflicts. 
> 
> > Looks like the last rule for Filters should not be there.
> 
> Correct, fixed in the attached.

Due to the merge of the meson build, this patch now needs to adjust the
relevant meson.build. This is the cause of the failures at:
https://cirrus-ci.com/build/5788292678418432

See e.g. src/bin/pgbench/meson.build

Greetings,

Andres Freund




Re: proposal: possibility to read dumped table's name from file

2022-09-13 Thread Pavel Stehule
út 13. 9. 2022 v 10:46 odesílatel John Naylor 
napsal:

> On Mon, Sep 12, 2022 at 8:10 PM Pavel Stehule 
> wrote:
> >
> > po 12. 9. 2022 v 9:59 odesílatel Daniel Gustafsson 
> napsal:
> >> I don't the capabilities of the tool is all that interesting compared
> to the
> >> long term maintainability and readability of the source code.
>
> With make distprep and maintainer-clean, separate makefile and MSVC
> build logic a short time before converting to Meson, I'm not sure that
> even the short term maintainability here is a good trade off for what
> we're getting.
>
> > The parser in bison/flex does the same work and it is true, so code is
> more readable. Although for this case, a handy written parser was trivial
> too.
>
> If the hand-written version is trivial, then we should prefer it.
>

Please, can you check and compare both versions? My view is subjective.

Regards

Pavel




> --
> John Naylor
> EDB: http://www.enterprisedb.com
>


Re: proposal: possibility to read dumped table's name from file

2022-09-13 Thread John Naylor
On Mon, Sep 12, 2022 at 8:10 PM Pavel Stehule  wrote:
>
> po 12. 9. 2022 v 9:59 odesílatel Daniel Gustafsson  napsal:
>> I don't the capabilities of the tool is all that interesting compared to the
>> long term maintainability and readability of the source code.

With make distprep and maintainer-clean, separate makefile and MSVC
build logic a short time before converting to Meson, I'm not sure that
even the short term maintainability here is a good trade off for what
we're getting.

> The parser in bison/flex does the same work and it is true, so code is more 
> readable. Although for this case, a handy written parser was trivial too.

If the hand-written version is trivial, then we should prefer it.
--
John Naylor
EDB: http://www.enterprisedb.com




Re: proposal: possibility to read dumped table's name from file

2022-09-12 Thread Erik Rijkers




Op 12-09-2022 om 16:00 schreef Erik Rijkers:

Op 12-09-2022 om 09:58 schreef Daniel Gustafsson:

On 9 Sep 2022, at 11:00, Andrew Dunstan  wrote:

On Sep 9, 2022, at 5:53 PM, John Naylor 
 wrote:



[v4-0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch]


I noticed that pg_restore --filter cannot, or at last not always, be 
used with the same filter-file that was used to produce a dump with 
pg_dump --filter.


Is that as designed?  It seems a bit counterintuitive.  It'd be nice if 
that could be fixed.  Admittedly, the 'same' problem in pg_restore -t, 
also less than ideal.


(A messy bashdemo below)


I hope the issue is still clear, even though in the bash I sent, I 
messed up the dumpfile name (i.e., in the bash that I sent the pg_dump 
creates another dump name than what is given to pg_restore. They should 
use the same dumpname, obviously)




thanks,

Erik Rijkers





Re: proposal: possibility to read dumped table's name from file

2022-09-12 Thread Erik Rijkers

Op 12-09-2022 om 09:58 schreef Daniel Gustafsson:

On 9 Sep 2022, at 11:00, Andrew Dunstan  wrote:


On Sep 9, 2022, at 5:53 PM, John Naylor  wrote:



[v4-0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch]


I noticed that pg_restore --filter cannot, or at last not always, be 
used with the same filter-file that was used to produce a dump with 
pg_dump --filter.


Is that as designed?  It seems a bit counterintuitive.  It'd be nice if 
that could be fixed.  Admittedly, the 'same' problem in pg_restore -t, 
also less than ideal.


(A messy bashdemo below)

thanks,

Erik Rijkers


#! /bin/bash
db2='testdb2' db3='testdb3'
db2='testdb_source' db3='testdb_target'
sql_dropdb="drop database if exists $db2; drop database if exists $db3;"
sql_createdb="create database $db2; create database $db3;"
schema1=s1  table1=table1  t1=$schema1.$table1
schema2=s2  table2=table2  t2=$schema2.$table2
sql_schema_init="create schema if not exists $schema1; create schema if 
not exists $schema2;"
sql_test="select '$t1', n from $t1 order by n; select '$t2', n from $t2 
order by n;"


function sqltest()
{
  for database_name in $db2 $db3 ;do
port_used=$( echo "show port" |psql -qtAX -d $database_name )
echo -n "-- $database_name ($port_used):  "
echo "$sql_test" | psql -qtAX -a -d $database_name | md5sum
  done
  echo
}

echo "setting up orig db $db2, target db $db3"
echo "$sql_dropdb"| psql -qtAX
echo "$sql_createdb"  | psql -qtAX

psql -X -d $db2 << SQL
$sql_schema_init
create table $t1 as select n from generate_series(1, (10^1)::int) as f(n);
create table $t2 as select n from generate_series(2, (10^2)::int) as f(n);
SQL
echo "
include table $t1
include table $t2
# include schema $s1
# include schema $s2
" > inputfile1.txt

# in filter; out plain
echo "-- pg_dump -F p -f plainfile1 --filter=inputfile1.txt -d $db2"
 pg_dump -F p -f plainfile1 --filter=inputfile1.txt -d $db2

echo "$sql_schema_init" | psql -qX -d $db3
echo  "-- pg_restore -d $db3 dumpfile1"
  pg_restore -d $db3 dumpfile1
  rc=$?
echo "-- pg_restore returned [$rc]  -- pg_restore without --filter"
sqltest

# enable this to see it fail
if [[ 1 -eq 1 ]]
then

# clean out
echo "drop schema $schema1 cascade; drop schema $schema2 cascade; " | 
psql -qtAXad $db3


--filter=inputfile1.txt"
echo "$sql_schema_init" | psql -qX -d $db3
echo "-- pg_restore -d $db3 --filter=inputfile1.txt dumpfile1"
 pg_restore -d $db3 --filter=inputfile1.txt dumpfile1
 rc=$?
echo "-- pg_restore returned [$rc]  -- pg_restore without --filter"
sqltest

fi





Re: proposal: possibility to read dumped table's name from file

2022-09-12 Thread Pavel Stehule
po 12. 9. 2022 v 9:59 odesílatel Daniel Gustafsson  napsal:

> > On 9 Sep 2022, at 11:00, Andrew Dunstan  wrote:
> >
> >> On Sep 9, 2022, at 5:53 PM, John Naylor 
> wrote:
> >>
> >> Note that the grammar has shift-reduce conflicts.
>
> > Looks like the last rule for Filters should not be there.
>
> Correct, fixed in the attached.
>
> > I do wonder whether we should be using bison/flex here, seems like using
> a
> > sledgehammer to crack a nut.
>
>
> I don't the capabilities of the tool is all that interesting compared to
> the
> long term maintainability and readability of the source code.  Personally I
> think a simple Bison/Flex parser is easier to read and reason about than
> the
> corresponding written in C.
>

When this work is done, then there is no reason to throw it. The parser in
bison/flex does the same work and it is true, so code is more readable.
Although for this case, a handy written parser was trivial too.

Regards

Pavel



>
> --
> Daniel Gustafsson   https://vmware.com/
>
>


Re: proposal: possibility to read dumped table's name from file

2022-09-12 Thread Daniel Gustafsson
> On 9 Sep 2022, at 11:00, Andrew Dunstan  wrote:
> 
>> On Sep 9, 2022, at 5:53 PM, John Naylor  wrote:
>> 
>> Note that the grammar has shift-reduce conflicts. 

> Looks like the last rule for Filters should not be there.

Correct, fixed in the attached.

> I do wonder whether we should be using bison/flex here, seems like using a
> sledgehammer to crack a nut.


I don't the capabilities of the tool is all that interesting compared to the
long term maintainability and readability of the source code.  Personally I
think a simple Bison/Flex parser is easier to read and reason about than the
corresponding written in C.

--
Daniel Gustafsson   https://vmware.com/



v4-0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch
Description: Binary data


Re: proposal: possibility to read dumped table's name from file

2022-09-09 Thread Andrew Dunstan





> On Sep 9, 2022, at 5:53 PM, John Naylor  wrote:
> 
> On Thu, Sep 8, 2022 at 7:32 PM Daniel Gustafsson  wrote:
>> [v3]
> 
> Note that the grammar has shift-reduce conflicts. If you run a fairly
> recent Bison, you can show them like this:
> 
> bison -Wno-deprecated -Wcounterexamples -d -o filterparse.c filterparse.y
> 
> filterparse.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
> filterparse.y: warning: shift/reduce conflict on token C_INCLUDE
> [-Wcounterexamples]
>  Example: • C_INCLUDE include_object pattern
>  Shift derivation
>Filters
>↳ 3: Filter
> ↳ 4: • C_INCLUDE include_object pattern
>  Reduce derivation
>Filters
>↳ 2: Filters  Filter
> ↳ 1: ε • ↳ 4: C_INCLUDE include_object pattern
> filterparse.y: warning: shift/reduce conflict on token C_EXCLUDE
> [-Wcounterexamples]
>  Example: • C_EXCLUDE exclude_object pattern
>  Shift derivation
>Filters
>↳ 3: Filter
> ↳ 5: • C_EXCLUDE exclude_object pattern
>  Reduce derivation
>Filters
>↳ 2: Filters  Filter
> ↳ 1: ε • ↳ 5: C_EXCLUDE exclude_object pattern
> 


Looks like the last rule for Filters should not be there. I do wonder whether 
we should be using bison/flex here, seems like using a sledgehammer to crack a 
nut.

Cheers

Andrew



Re: proposal: possibility to read dumped table's name from file

2022-09-09 Thread John Naylor
On Thu, Sep 8, 2022 at 7:32 PM Daniel Gustafsson  wrote:
> [v3]

Note that the grammar has shift-reduce conflicts. If you run a fairly
recent Bison, you can show them like this:

bison -Wno-deprecated -Wcounterexamples -d -o filterparse.c filterparse.y

filterparse.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
filterparse.y: warning: shift/reduce conflict on token C_INCLUDE
[-Wcounterexamples]
  Example: • C_INCLUDE include_object pattern
  Shift derivation
Filters
↳ 3: Filter
 ↳ 4: • C_INCLUDE include_object pattern
  Reduce derivation
Filters
↳ 2: Filters  Filter
 ↳ 1: ε • ↳ 4: C_INCLUDE include_object pattern
filterparse.y: warning: shift/reduce conflict on token C_EXCLUDE
[-Wcounterexamples]
  Example: • C_EXCLUDE exclude_object pattern
  Shift derivation
Filters
↳ 3: Filter
 ↳ 5: • C_EXCLUDE exclude_object pattern
  Reduce derivation
Filters
↳ 2: Filters  Filter
 ↳ 1: ε • ↳ 5: C_EXCLUDE exclude_object pattern

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: proposal: possibility to read dumped table's name from file

2022-09-08 Thread Pavel Stehule
Hi


st 7. 9. 2022 v 21:46 odesílatel Daniel Gustafsson  napsal:

> As noted upthread at some point, I'm not overly excited about the parser in
> filter.c, for maintainability and readability reasons.  So, I've
> reimplemented
> the parser in Flex/Bison in the attached patch, which IMHO provides a
> clear(er)
> picture of the grammar and is more per project standards.  This version of
> the
> patch is your latest version with just the parser replaced (at a reduction
> in
> size as a side benefit).
>
> All features supported in your latest patch version are present, and it
> passes
> all the tests added by this patch.  It's been an undisclosed amount of
> years
> since I wrote a Bison parser (well, yacc really) from scratch so I don't
> rule
> out having made silly mistakes.  I would very much appreciate review from
> those
> more well versed in this area.
>
> One thing this patchversion currently lacks is refined error messaging,
> but if
> we feel that this approach is a viable path then that can be tweaked.  The
> function which starts the parser can also be refactored to be shared across
> pg_dump, pg_dumpall and pg_restore but I've kept it simple for now.
>
> Thoughts?  It would be nice to get this patch across the finishline during
> this
> commitfest.
>
>
I have no objections to this, and thank you so you try to move this patch
forward.

Regards

Pavel

--
> Daniel Gustafsson   https://vmware.com/
>
>


Re: proposal: possibility to read dumped table's name from file

2022-09-08 Thread Daniel Gustafsson
> On 8 Sep 2022, at 13:44, Julien Rouhaud  wrote:
> On Thu, Sep 08, 2022 at 01:38:42PM +0200, Daniel Gustafsson wrote:
>>> On 8 Sep 2022, at 12:00, Erik Rijkers  wrote:

>>> I did notice one peculiarity (in your patch) where for each table a few 
>>> spaces are omitted by pg_dump.
>> 
>> Right, I had that on my TODO to fix before submitting but clearly forgot.  It
>> boils down to consuming the space between commands and object types and 
>> object
>> patterns. The attached v2 fixes that.
> 
> I only had a quick look at the parser,

Thanks for looking!

> .. and one thing that strikes me is:
> 
> +Patterns:
> + /* EMPTY */
> + | Patterns Pattern
> + | Pattern
> + ;
> +
> +Pattern:
> + C_INCLUDE include_object pattern { include_item(priv, $2, $3); }
> 
> It seems confusing to mix Pattern(s) (the rules) and pattern (the token).
> Maybe instead using Include(s) or Item(s) on the bison side, and/or
> name_pattern on the lexer side?

That makes a lot of sense, I renamed the rules in the parser but kept them in
the lexer since that seemed like the clearest scheme.

Also in the attached is a small refactoring to share parser init between
pg_dump and pg_restore (pg_dumpall shares little with these so not there for
now), buffer resize overflow calculation and some error message tweaking.

--
Daniel Gustafsson   https://vmware.com/



v3-0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch
Description: Binary data




Re: proposal: possibility to read dumped table's name from file

2022-09-08 Thread Julien Rouhaud
Hi,

On Thu, Sep 08, 2022 at 01:38:42PM +0200, Daniel Gustafsson wrote:
> > On 8 Sep 2022, at 12:00, Erik Rijkers  wrote:
> > 
> > Op 07-09-2022 om 21:45 schreef Daniel Gustafsson:
> >> One thing this patchversion currently lacks is refined error messaging, 
> >> but if
> >> we feel that this approach is a viable path then that can be tweaked.  The
> >> function which starts the parser can also be refactored to be shared across
> >> pg_dump, pg_dumpall and pg_restore but I've kept it simple for now.
> >> Thoughts?  It would be nice to get this patch across the finishline during 
> >> this
> >> commitfest.
> > 
> > > [0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch]
> > 
> > This seems to dump & restore well (as Pavels patch does).
> 
> Thanks for looking!
> 
> > I did notice one peculiarity (in your patch) where for each table a few 
> > spaces are omitted by pg_dump.
> 
> Right, I had that on my TODO to fix before submitting but clearly forgot.  It
> boils down to consuming the space between commands and object types and object
> patterns. The attached v2 fixes that.

I only had a quick look at the parser, and one thing that strikes me is:

+Patterns:
+   /* EMPTY */
+   | Patterns Pattern
+   | Pattern
+   ;
+
+Pattern:
+   C_INCLUDE include_object pattern { include_item(priv, $2, $3); }

It seems confusing to mix Pattern(s) (the rules) and pattern (the token).
Maybe instead using Include(s) or Item(s) on the bison side, and/or
name_pattern on the lexer side?




Re: proposal: possibility to read dumped table's name from file

2022-09-08 Thread Daniel Gustafsson
> On 8 Sep 2022, at 12:00, Erik Rijkers  wrote:
> 
> Op 07-09-2022 om 21:45 schreef Daniel Gustafsson:
>> One thing this patchversion currently lacks is refined error messaging, but 
>> if
>> we feel that this approach is a viable path then that can be tweaked.  The
>> function which starts the parser can also be refactored to be shared across
>> pg_dump, pg_dumpall and pg_restore but I've kept it simple for now.
>> Thoughts?  It would be nice to get this patch across the finishline during 
>> this
>> commitfest.
> 
> > [0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch]
> 
> This seems to dump & restore well (as Pavels patch does).

Thanks for looking!

> I did notice one peculiarity (in your patch) where for each table a few 
> spaces are omitted by pg_dump.

Right, I had that on my TODO to fix before submitting but clearly forgot.  It
boils down to consuming the space between commands and object types and object
patterns. The attached v2 fixes that.

--
Daniel Gustafsson   https://vmware.com/



v2-0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch
Description: Binary data


Re: proposal: possibility to read dumped table's name from file

2022-09-08 Thread Erik Rijkers

Op 07-09-2022 om 21:45 schreef Daniel Gustafsson:


One thing this patchversion currently lacks is refined error messaging, but if
we feel that this approach is a viable path then that can be tweaked.  The
function which starts the parser can also be refactored to be shared across
pg_dump, pg_dumpall and pg_restore but I've kept it simple for now.

Thoughts?  It would be nice to get this patch across the finishline during this
commitfest.


> [0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch]

This seems to dump & restore well (as Pavels patch does).

I did notice one peculiarity (in your patch) where for each table a few 
spaces are omitted by pg_dump.


-
#! /bin/bash

psql -qXc "drop database if exists testdb2"
psql -qXc "create database testdb2"

echo "
create schema if not exists test;
create table table0 (id integer);
create table table1 (id integer);
insert into table0 select n from generate_series(1,2) as f(n);
insert into table1 select n from generate_series(1,2) as f(n);
" | psql -qXad testdb2

echo "include table table0" > inputfile1.txt

echo "include table table0
include table table1" > inputfile2.txt

# 1 table, emits 2 spaces
echo -ne ">"
pg_dump -F p -f plainfile1 --filter=inputfile1.txt -d testdb2
echo "<"

# 2 tables, emits 4 space
echo -ne ">"
pg_dump -F p -f plainfile2 --filter=inputfile2.txt -d testdb2
echo "<"

# dump without filter emits no spaces
echo -ne ">"
pg_dump -F c -f plainfile3 -t table0 -table1 -d testdb2
echo "<"
-

It's probably a small thing -- but I didn't find it.

thanks,

Erik Rijkers


--
Daniel Gustafsson   https://vmware.com/






Re: proposal: possibility to read dumped table's name from file

2022-09-07 Thread Daniel Gustafsson
As noted upthread at some point, I'm not overly excited about the parser in
filter.c, for maintainability and readability reasons.  So, I've reimplemented
the parser in Flex/Bison in the attached patch, which IMHO provides a clear(er)
picture of the grammar and is more per project standards.  This version of the
patch is your latest version with just the parser replaced (at a reduction in
size as a side benefit).

All features supported in your latest patch version are present, and it passes
all the tests added by this patch.  It's been an undisclosed amount of years
since I wrote a Bison parser (well, yacc really) from scratch so I don't rule
out having made silly mistakes.  I would very much appreciate review from those
more well versed in this area.

One thing this patchversion currently lacks is refined error messaging, but if
we feel that this approach is a viable path then that can be tweaked.  The
function which starts the parser can also be refactored to be shared across
pg_dump, pg_dumpall and pg_restore but I've kept it simple for now.

Thoughts?  It would be nice to get this patch across the finishline during this
commitfest.

--
Daniel Gustafsson   https://vmware.com/



0001-Add-include-exclude-filtering-via-file-in-pg_dump.patch
Description: Binary data


Re: proposal: possibility to read dumped table's name from file

2022-08-22 Thread Pavel Stehule
Hi

I am sending fresh rebase + enhancing tests for pg_dumpall and pg_restore

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c08276bc0a..b64bae6987 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb  db.sql
 
 
  
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 5d54074e01..7671795060 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -122,6 +122,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for databases excluded
+from dump. The patterns are interpretted according to the same rules
+as --exclude-database.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for excluding databases,
+and can also be specified more than once for multiple filter files.
+   
+
+   
+The file lists one database pattern per row, with the following format:
+
+exclude database  PATTERN
+
+   
+  
+ 
+
  
   -g
   --globals-only
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 47bd7dbda0..ffeb564c52 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -188,6 +188,31 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects excluded
+or included from restore. The patterns are interpretted according to the
+same rules as --schema, --exclude-schema,
+--function, --index, --table
+or --trigger.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can 

Re: proposal: possibility to read dumped table's name from file

2022-07-18 Thread Pavel Stehule
ne 17. 7. 2022 v 16:01 odesílatel Justin Pryzby 
napsal:

> Thanks for updating the patch.
>
> This failed to build on windows.
> http://cfbot.cputube.org/pavel-stehule.html
>
>
Yes, there was a significant problem with the function exit_nicely, that is
differently implemented in pg_dump and pg_dumpall.



> Some more comments inline.
>
> On Sun, Jul 17, 2022 at 08:20:47AM +0200, Pavel Stehule wrote:
> > The attached patch implements the --filter option for pg_dumpall and for
> > pg_restore too.
>
> > diff --git a/doc/src/sgml/ref/pg_dump.sgml
> b/doc/src/sgml/ref/pg_dump.sgml
> > index 5efb442b44..ba2920dbee 100644
> > --- a/doc/src/sgml/ref/pg_dump.sgml
> > +++ b/doc/src/sgml/ref/pg_dump.sgml
> > @@ -779,6 +779,80 @@ PostgreSQL documentation
> >
> >   
> >
> > + 
> > +  --filter= class="parameter">filename
> > +  
> > +   
> > +Specify a filename from which to read patterns for objects to
> include
> > +or exclude from the dump. The patterns are interpreted
> according to the
> > +same rules as the corresponding options:
> > +-t/--table for tables,
> > +-n/--schema for schemas,
> > +--include-foreign-data for data on foreign
> servers and
> > +--exclude-table-data for table data.
> > +To read from STDIN use
> - as the

STDIN comma
>

fixed


>
> > +   
> > +Lines starting with # are considered
> comments and
> > +are ignored. Comments can be placed after filter as well. Blank
> lines
>
> change "are ignored" to "ignored", I think.
>

changed


>
> > diff --git a/doc/src/sgml/ref/pg_dumpall.sgml
> b/doc/src/sgml/ref/pg_dumpall.sgml
> > index 8a081f0080..137491340c 100644
> > --- a/doc/src/sgml/ref/pg_dumpall.sgml
> > +++ b/doc/src/sgml/ref/pg_dumpall.sgml
> > @@ -122,6 +122,29 @@ PostgreSQL documentation
> >
> >   
> >
> > + 
> > +  --filter= class="parameter">filename
> > +  
> > +   
> > +Specify a filename from which to read patterns for databases
> excluded
> > +from dump. The patterns are interpretted according to the same
> rules
> > +like --exclude-database.
>
> same rules *as*
>

fixed


>
> > +To read from STDIN use
> - as the
>
> comma
>

fixed


>
> > +filename.  The --filter option can be
> specified in
> > +conjunction with the above listed options for including or
> excluding
>
> For dumpall, remove "for including or"
> change "above listed options" to "exclude-database" ?
>

fixed


>
> > diff --git a/doc/src/sgml/ref/pg_restore.sgml
> b/doc/src/sgml/ref/pg_restore.sgml
> > index 526986eadb..5f16c4a333 100644
> > --- a/doc/src/sgml/ref/pg_restore.sgml
> > +++ b/doc/src/sgml/ref/pg_restore.sgml
> > @@ -188,6 +188,31 @@ PostgreSQL documentation
> >
> >   
> >
> > + 
> > +  --filter= class="parameter">filename
> > +  
> > +   
> > +Specify a filename from which to read patterns for objects
> excluded
> > +or included from restore. The patterns are interpretted
> according to the
> > +same rules like --schema,
> --exclude-schema,
>
> s/like/as/
>

changed


>
> > +--function, --index,
> --table
> > +or --trigger.
> > +To read from STDIN use
> - as the
>
> STDIN comma
>

fixed



>
> > +/*
> > + * filter_get_keyword - read the next filter keyword from buffer
> > + *
> > + * Search for keywords (limited to containing ascii alphabetic
> characters) in
>
> remove "containing"
>

fixed


>
> > + /*
> > +  * If the object name pattern has been quoted we must take care
> parse out
> > +  * the entire quoted pattern, which may contain whitespace and can
> span
> > +  * over many lines.
>
> quoted comma
> *to parse
> remove "over"
>

fixed


>
> > + * The pattern is either simple without any  whitespace, or properly
> quoted
>
> double space
>

fixed


>
> > + * in case there is whitespace in the object name. The pattern handling
> follows
>
> s/is/may be/
>

fixed


>
> > + if (size == 7 && pg_strncasecmp(keyword,
> "include", 7) == 0)
> > + *is_include = true;
> > + else if (size == 7 && pg_strncasecmp(keyword,
> "exclude", 7) == 0)
> > + *is_include = false;
>
> Can't you write strncasecmp(keyword, "include", size) to avoid hardcoding
> "7" ?
>

I need to compare the size of the keyword with expected size, but I can use
strlen(conststr). I wrote new macro is_keyword_str to fix this issue

fixed


>
> > +
> > + if (size == 4 && pg_strncasecmp(keyword, "data",
> 4) == 0)
> > + *objtype = FILTER_OBJECT_TYPE_DATA;
> > + else if (size == 8 && pg_strncasecmp(keyword,
> "database", 8) == 0)
> > + *objtype = FILTER_OBJECT_TYPE_DATABASE;
> > + else if (size == 12 && pg_strncasecmp(keyword,
> "foreign_data", 12) == 0)
> > +

Re: proposal: possibility to read dumped table's name from file

2022-07-17 Thread Justin Pryzby
Thanks for updating the patch.

This failed to build on windows.
http://cfbot.cputube.org/pavel-stehule.html

Some more comments inline.

On Sun, Jul 17, 2022 at 08:20:47AM +0200, Pavel Stehule wrote:
> The attached patch implements the --filter option for pg_dumpall and for
> pg_restore too.

> diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
> index 5efb442b44..ba2920dbee 100644
> --- a/doc/src/sgml/ref/pg_dump.sgml
> +++ b/doc/src/sgml/ref/pg_dump.sgml
> @@ -779,6 +779,80 @@ PostgreSQL documentation
>
>   
>  
> + 
> +  --filter= class="parameter">filename
> +  
> +   
> +Specify a filename from which to read patterns for objects to include
> +or exclude from the dump. The patterns are interpreted according to 
> the
> +same rules as the corresponding options:
> +-t/--table for tables,
> +-n/--schema for schemas,
> +--include-foreign-data for data on foreign servers 
> and
> +--exclude-table-data for table data.
> +To read from STDIN use - as 
> the

STDIN comma

> +   
> +Lines starting with # are considered comments and
> +are ignored. Comments can be placed after filter as well. Blank lines

change "are ignored" to "ignored", I think.

> diff --git a/doc/src/sgml/ref/pg_dumpall.sgml 
> b/doc/src/sgml/ref/pg_dumpall.sgml
> index 8a081f0080..137491340c 100644
> --- a/doc/src/sgml/ref/pg_dumpall.sgml
> +++ b/doc/src/sgml/ref/pg_dumpall.sgml
> @@ -122,6 +122,29 @@ PostgreSQL documentation
>
>   
>  
> + 
> +  --filter= class="parameter">filename
> +  
> +   
> +Specify a filename from which to read patterns for databases excluded
> +from dump. The patterns are interpretted according to the same rules
> +like --exclude-database.

same rules *as*

> +To read from STDIN use - as 
> the

comma

> +filename.  The --filter option can be specified in
> +conjunction with the above listed options for including or excluding

For dumpall, remove "for including or"
change "above listed options" to "exclude-database" ?

> diff --git a/doc/src/sgml/ref/pg_restore.sgml 
> b/doc/src/sgml/ref/pg_restore.sgml
> index 526986eadb..5f16c4a333 100644
> --- a/doc/src/sgml/ref/pg_restore.sgml
> +++ b/doc/src/sgml/ref/pg_restore.sgml
> @@ -188,6 +188,31 @@ PostgreSQL documentation
>
>   
>  
> + 
> +  --filter= class="parameter">filename
> +  
> +   
> +Specify a filename from which to read patterns for objects excluded
> +or included from restore. The patterns are interpretted according to 
> the
> +same rules like --schema, 
> --exclude-schema,

s/like/as/

> +--function, --index, 
> --table
> +or --trigger.
> +To read from STDIN use - as 
> the

STDIN comma

> +/*
> + * filter_get_keyword - read the next filter keyword from buffer
> + *
> + * Search for keywords (limited to containing ascii alphabetic characters) in

remove "containing"

> + /*
> +  * If the object name pattern has been quoted we must take care parse 
> out
> +  * the entire quoted pattern, which may contain whitespace and can span
> +  * over many lines.

quoted comma
*to parse
remove "over"

> + * The pattern is either simple without any  whitespace, or properly quoted

double space

> + * in case there is whitespace in the object name. The pattern handling 
> follows

s/is/may be/

> + if (size == 7 && pg_strncasecmp(keyword, "include", 7) 
> == 0)
> + *is_include = true;
> + else if (size == 7 && pg_strncasecmp(keyword, 
> "exclude", 7) == 0)
> + *is_include = false;

Can't you write strncasecmp(keyword, "include", size) to avoid hardcoding "7" ?

> +
> + if (size == 4 && pg_strncasecmp(keyword, "data", 4) == 
> 0)
> + *objtype = FILTER_OBJECT_TYPE_DATA;
> + else if (size == 8 && pg_strncasecmp(keyword, 
> "database", 8) == 0)
> + *objtype = FILTER_OBJECT_TYPE_DATABASE;
> + else if (size == 12 && pg_strncasecmp(keyword, 
> "foreign_data", 12) == 0)
> + *objtype = FILTER_OBJECT_TYPE_FOREIGN_DATA;
> + else if (size == 8 && pg_strncasecmp(keyword, 
> "function", 8) == 0)
> + *objtype = FILTER_OBJECT_TYPE_FUNCTION;
> + else if (size == 5 && pg_strncasecmp(keyword, "index", 
> 5) == 0)
> + *objtype = FILTER_OBJECT_TYPE_INDEX;
> + else if (size == 6 && pg_strncasecmp(keyword, "schema", 
> 6) == 0)
> + *objtype = FILTER_OBJECT_TYPE_SCHEMA;
> + else if (size == 5 && pg_strncasecmp(keyword, "table", 
> 5) == 0)
> + *objtype = 

Re: proposal: possibility to read dumped table's name from file

2022-07-17 Thread Pavel Stehule
Hi

čt 14. 7. 2022 v 6:54 odesílatel Pavel Stehule 
napsal:

>
>
> st 13. 7. 2022 v 22:49 odesílatel Andrew Dunstan 
> napsal:
>
>>
>> On 2022-04-25 Mo 13:39, Pavel Stehule wrote:
>> > Hi
>> >
>> > fresh rebase
>> >
>> >
>>
>>
>> If we're going to do this for pg_dump's include/exclude options,
>> shouldn't we also provide an equivalent facility for pg_dumpall's
>> --exclude-database option?
>>
>>
> It has sense
>

The attached patch implements the --filter option for pg_dumpall and for
pg_restore too.

Regards

Pavel



>>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 5efb442b44..ba2920dbee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+are ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb  db.sql
 
 
  
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 8a081f0080..137491340c 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -122,6 +122,29 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for databases excluded
+from dump. The patterns are interpretted according to the same rules
+like --exclude-database.
+To read from STDIN use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one database pattern per row, with the following format:
+
+exclude database  PATTERN
+
+   
+  
+ 
+
  
   -g
   --globals-only
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 526986eadb..5f16c4a333 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -188,6 +188,31 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a 

Re: proposal: possibility to read dumped table's name from file

2022-07-13 Thread Pavel Stehule
st 13. 7. 2022 v 22:49 odesílatel Andrew Dunstan 
napsal:

>
> On 2022-04-25 Mo 13:39, Pavel Stehule wrote:
> > Hi
> >
> > fresh rebase
> >
> >
>
>
> If we're going to do this for pg_dump's include/exclude options,
> shouldn't we also provide an equivalent facility for pg_dumpall's
> --exclude-database option?
>
>
It has sense

Regards

Pavel




>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


Re: proposal: possibility to read dumped table's name from file

2022-07-13 Thread Andrew Dunstan


On 2022-04-25 Mo 13:39, Pavel Stehule wrote:
> Hi
>
> fresh rebase
>
>


If we're going to do this for pg_dump's include/exclude options,
shouldn't we also provide an equivalent facility for pg_dumpall's
--exclude-database option?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: proposal: possibility to read dumped table's name from file

2022-04-25 Thread Pavel Stehule
Hi

fresh rebase

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c946755737..3711959fa2 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+are ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb  db.sql
 
 
  
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 786d592e2b..bb00e4fd4b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,10 +55,12 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/option_utils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -96,6 +98,29 @@ typedef enum OidOptions
 	zeroAsNone = 4
 } OidOptions;
 
+/*
+ * State data for reading filter items from stream
+ */
+typedef struct
+{
+	FILE	   *fp;
+	const char *filename;
+	int			lineno;
+	StringInfoData linebuff;
+}			FilterStateData;
+
+/*
+ * List of objects that can be specified in filter file
+ */
+typedef enum
+{
+	FILTER_OBJECT_TYPE_NONE,
+	FILTER_OBJECT_TYPE_TABLE,
+	FILTER_OBJECT_TYPE_SCHEMA,
+	FILTER_OBJECT_TYPE_FOREIGN_DATA,
+	FILTER_OBJECT_TYPE_DATA
+}			FilterObjectType;
+
 /* global decls */
 static bool dosync = true;		/* Issue fsync() to make dump durable on disk. */
 
@@ -317,6 +342,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
+static void getFiltersFromFile(const char *filename, DumpOptions *dopt);
 
 
 int
@@ -389,6 +415,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, _row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", 

Re: proposal: possibility to read dumped table's name from file

2021-10-27 Thread Pavel Stehule
Hi

út 5. 10. 2021 v 14:30 odesílatel Daniel Gustafsson 
napsal:

> > On 1 Oct 2021, at 15:19, Daniel Gustafsson  wrote:
>
> > I'm still not happy with the docs, I need to take another look there and
> see if
> > I make them more readable but otherwise I don't think there are any open
> issues
> > with this.
>
> Attached is a rebased version which has rewritten docs which I think are
> more
> in line with the pg_dump documentation.  I've also added tests for
> --strict-names operation, as well subjected it to pgindent and pgperltidy.
>
> Unless there are objections, I think this is pretty much ready to go in.
>

I am sending a rebased version of patch pg_dump-filteropt-20211005.patch
with fixed regress tests and fixed documentation (reported by Erik).
I found another issue - the stringinfo line used in filter_get_pattern was
released too early - the line (memory) was used later in check of unexpected
chars after pattern string. I fixed it by moving this stringinfo buffer to
fstate structure. It can be shared by all routines, and it can be safely
released at
an end of filter processing, where we are sure, so these data can be free.

Regards

Pavel




> --
> Daniel Gustafsson   https://vmware.com/
>
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..9245355686 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -789,6 +789,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+are ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1122,6 +1196,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1531,6 +1606,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb  db.sql
 
 
  
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d1842edde0..f4f84bdce0 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,10 +55,12 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/option_utils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -90,6 +92,29 @@ typedef enum OidOptions
 	

Re: proposal: possibility to read dumped table's name from file

2021-10-05 Thread Erik Rijkers

Op 05-10-2021 om 14:30 schreef Daniel Gustafsson:



Unless there are objections, I think this is pretty much ready to go in.


Agreed.  One typo:

'This keyword can only be with the exclude keyword.'should be
'This keyword can only be used with the exclude keyword.'


thanks,

Erik Rijkers






--
Daniel Gustafsson   https://vmware.com/






Re: proposal: possibility to read dumped table's name from file

2021-10-05 Thread Pavel Stehule
út 5. 10. 2021 v 14:30 odesílatel Daniel Gustafsson 
napsal:

> > On 1 Oct 2021, at 15:19, Daniel Gustafsson  wrote:
>
> > I'm still not happy with the docs, I need to take another look there and
> see if
> > I make them more readable but otherwise I don't think there are any open
> issues
> > with this.
>
> Attached is a rebased version which has rewritten docs which I think are
> more
> in line with the pg_dump documentation.  I've also added tests for
> --strict-names operation, as well subjected it to pgindent and pgperltidy.
>
> Unless there are objections, I think this is pretty much ready to go in.
>

great, thank you

Pavel


> --
> Daniel Gustafsson   https://vmware.com/
>
>


Re: proposal: possibility to read dumped table's name from file

2021-10-05 Thread Daniel Gustafsson
> On 1 Oct 2021, at 15:19, Daniel Gustafsson  wrote:

> I'm still not happy with the docs, I need to take another look there and see 
> if
> I make them more readable but otherwise I don't think there are any open 
> issues
> with this.

Attached is a rebased version which has rewritten docs which I think are more
in line with the pg_dump documentation.  I've also added tests for
--strict-names operation, as well subjected it to pgindent and pgperltidy.

Unless there are objections, I think this is pretty much ready to go in.

--
Daniel Gustafsson   https://vmware.com/



pg_dump-filteropt-20211005.patch
Description: Binary data


Re: proposal: possibility to read dumped table's name from file

2021-10-04 Thread Daniel Gustafsson
> On 2 Oct 2021, at 08:18, Erik Rijkers  wrote:

> So the issue is not as serious as it seemed.  

This is also not related to this patch in any way, or am I missing a point
here? This can just as well be achieved without this patch.

> The complaint remaining is only that this could somehow be documented better.

The pg_dump documentation today have a large highlighted note about this:

"When --include-foreign-data is specified, pg_dump does not check that the
foreign table is writable.  Therefore, there is no guarantee that the
results of a foreign table dump can be successfully restored."

This was extensively discussed [0] when this went in, is there additional
documentation you'd like to see for this?

--
Daniel Gustafsson   https://vmware.com/

[0] 
https://postgr.es/m/lejpr01mb0185483c0079d2f651b16231e7...@lejpr01mb0185.deuprd01.prod.outlook.de





Re: proposal: possibility to read dumped table's name from file

2021-10-02 Thread Erik Rijkers

On 10/1/21 6:19 PM, Erik Rijkers wrote:

On 10/1/21 3:19 PM, Daniel Gustafsson wrote:


As has been discussed upthread, this format strikes a compromise wrt 
simplicity
and doesn't preclude adding a more structured config file in the 
future should




If you try to dump/restore a foreign file from a file_fdw server, the 
restore step will complain and thus leave the returnvalue nonzero. The 
foreign table will be there, with complete 'data'.


A complete runnable exampe is a lot of work; I hope the below bits of 
input and output makes the problem clear.  Main thing: the pg_restore 
contains 2 ERROR lines like:


pg_restore: error: COPY failed for table "ireise1": ERROR:  cannot 
insert into foreign table "ireise1"


Further testing makes clear that the file_fdw-addressing line
  include foreign_data goethe
was the culprit: it causes a COPY which of course fails in a readonly 
wrapper like file_fdw. Without that line it works (because I run the 
restore on the same machine so the underlying file_fdw .txt files are 
there for testdb2 too)


So the issue is not as serious as it seemed.  The complaint remaining is 
only that this could somehow be documented better.


I attach a running example (careful, it deletes stuff)  of the original 
ERROR-producing bash (remove the 'include foreign_data' line from the 
input file to run it without error).


thanks,

Erik Rijkers





dump_restore_foreign_table.sh
Description: application/shellscript


Re: proposal: possibility to read dumped table's name from file

2021-10-01 Thread Erik Rijkers

On 10/1/21 3:19 PM, Daniel Gustafsson wrote:


As has been discussed upthread, this format strikes a compromise wrt simplicity
and doesn't preclude adding a more structured config file in the future should
we want that.  I think this takes care of most comments and opinions made in
this thread.

--
Daniel Gustafsson   https://vmware.com/



Hi,

If you try to dump/restore a foreign file from a file_fdw server, the 
restore step will complain and thus leave the returnvalue nonzero. The 
foreign table will be there, with complete 'data'.


A complete runnable exampe is a lot of work; I hope the below bits of 
input and output makes the problem clear.  Main thing: the pg_restore 
contains 2 ERROR lines like:


pg_restore: error: COPY failed for table "ireise1": ERROR:  cannot 
insert into foreign table "ireise1"




--
From the test bash:

echo "
include table table0   # ok   public
include table test.table1  #
include foreign_data goethe  # foreign server 'goethe' (file_fdw)
include table gutenberg.ireise1  # foreign table
include table gutenberg.ireise2  # foreign table
" > inputfile1.txt

pg_dump --create -Fc -c -p $port -d $db1 -f dump1 --filter=inputfile1.txt
echo

# prepare for restore
server_name=goethe
echo "create schema if not exists test;"  | psql -qaXd $db2
echo "create schema if not exists gutenberg;" | psql -qaXd $db2
echo "create server if not exists $server_name foreign data wrapper 
file_fdw " \

  | psql -qaXd $db2

echo "-- pg_restore --if-exists -cvd $db2 dump1 "
 pg_restore --if-exists -cvd $db2 dump1
rc=$?
echo "-- rc [$rc]"  -
echo

--

from the output:

-- pg_dump --create -Fc -c -p 6969 -d testdb1 -f dump1 
--filter=inputfile1.txt



-- pg_restore --if-exists -cvd testdb2 dump1
pg_restore: connecting to database for restore
pg_restore: dropping TABLE table1
pg_restore: dropping TABLE table0
pg_restore: dropping FOREIGN TABLE ireise2
pg_restore: dropping FOREIGN TABLE ireise1
pg_restore: creating FOREIGN TABLE "gutenberg.ireise1"
pg_restore: creating COMMENT "gutenberg.FOREIGN TABLE ireise1"
pg_restore: creating FOREIGN TABLE "gutenberg.ireise2"
pg_restore: creating COMMENT "gutenberg.FOREIGN TABLE ireise2"
pg_restore: creating TABLE "public.table0"
pg_restore: creating TABLE "test.table1"
pg_restore: processing data for table "gutenberg.ireise1"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5570; 0 23625 TABLE DATA ireise1 aardvark
pg_restore: error: COPY failed for table "ireise1": ERROR:  cannot 
insert into foreign table "ireise1"

pg_restore: processing data for table "gutenberg.ireise2"
pg_restore: from TOC entry 5571; 0 23628 TABLE DATA ireise2 aardvark
pg_restore: error: COPY failed for table "ireise2": ERROR:  cannot 
insert into foreign table "ireise2"

pg_restore: processing data for table "public.table0"
pg_restore: processing data for table "test.table1"
pg_restore: warning: errors ignored on restore: 2
-- rc [1]

-


A second, separate practical hickup is that schema's are not restored 
from the dumped $schema.$table includes -- but this can be worked 
around; for my inputfile1.txt I had to run separately (as seen above, 
before running the pg_restore):


create schema if not exists test;
create schema if not exists gutenberg;
create server if not exists goethe foreign data wrapper file_fdw;

A bit annoying but still maybe all right.


Thanks,

Erik Rijkers





Re: proposal: possibility to read dumped table's name from file

2021-10-01 Thread Pavel Stehule
pá 1. 10. 2021 v 15:19 odesílatel Daniel Gustafsson 
napsal:

> I took another pass over this today and touched up the documentation (docs
> and
> code) as well as tweaked the code a bit here and there to both make it fit
> the
> pg_dump style better and to clean up a few small things.  I've also added
> a set
> of additional tests to cover more of the functionality.
>
> I'm still not happy with the docs, I need to take another look there and
> see if
> I make them more readable but otherwise I don't think there are any open
> issues
> with this.
>
> As has been discussed upthread, this format strikes a compromise wrt
> simplicity
> and doesn't preclude adding a more structured config file in the future
> should
> we want that.  I think this takes care of most comments and opinions made
> in
> this thread.
>

It looks well.

Thank you

Pavel


> --
> Daniel Gustafsson   https://vmware.com/
>
>


Re: proposal: possibility to read dumped table's name from file

2021-10-01 Thread Daniel Gustafsson
I took another pass over this today and touched up the documentation (docs and
code) as well as tweaked the code a bit here and there to both make it fit the
pg_dump style better and to clean up a few small things.  I've also added a set
of additional tests to cover more of the functionality.

I'm still not happy with the docs, I need to take another look there and see if
I make them more readable but otherwise I don't think there are any open issues
with this.

As has been discussed upthread, this format strikes a compromise wrt simplicity
and doesn't preclude adding a more structured config file in the future should
we want that.  I think this takes care of most comments and opinions made in
this thread.

--
Daniel Gustafsson   https://vmware.com/



pg_dump-filteropt-20211001.patch
Description: Binary data


Re: proposal: possibility to read dumped table's name from file

2021-09-23 Thread Pavel Stehule
Hi


> The above comments are fixed in the attached, as well as a pass over the
> docs
> and extended tests to actually test matching a foreign server.  What do
> think
> about this version?  I'm still not convinced that there aren't more quoting
> bugs in the parser, but I've left that intact for now.
>

This patch is based on the version that you sent 21.9. Just I modified
string comparison in keyword detection. If we don't allow support
abbreviations of keywords (and I dislike it), then the check of size is
necessary. Any other is without change.

Regards

Pavel




>
> --
> Daniel Gustafsson   https://vmware.com/
>
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..6e9a56f68c 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -789,6 +789,67 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+  Filtering rules for which objects to dump are read from the specified
+  file.  Specify - to read from
+STDIN.  The file has the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the object is to be included or
+excluded, and the second keyword specifies the type of object to be
+filtered:
+
+ 
+   table: table
+ 
+ 
+   schema: schema
+ 
+ 
+   foreign_data: foreign server
+ 
+ 
+   data: table data
+ 
+
+   
+
+   
+Example:
+
+include table mytable*
+exclude table mytable2
+
+With the this filter file, the dump would include all tables with 
+name starting by mytable, but exclude table
+mytable2.
+   
+
+   
+Lines starting with # are considered comments and
+are ignored. Comments can be placed after filter as well. Blank lines
+are also ignored.
+   
+
+   
+The --filter option works just like the other
+options to include or exclude tables (-t or
+--table), schemas (-n or
+--schema), table data (--exclude-table-data),
+or foreign tables (--include-foreign-data).
+It isn't possible to exclude a specific foreign table or
+to include a specific table's data.
+   
+  
+ 
+
  
   --if-exists
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a485fb2d07..ec979b9f6f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,10 +55,12 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/option_utils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -90,6 +92,28 @@ typedef enum OidOptions
 	zeroAsNone = 4
 } OidOptions;
 
+/*
+ * State data for reading filter items from stream.
+ */
+typedef struct
+{
+	FILE	   *fp;
+	char	   *filename;
+	int			lineno;
+} FilterStateData;
+
+/*
+ * List of objects that can be specified in filter file
+ */
+typedef enum
+{
+	FILTER_OBJECT_TYPE_NONE,
+	FILTER_OBJECT_TYPE_TABLE,
+	FILTER_OBJECT_TYPE_SCHEMA,
+	FILTER_OBJECT_TYPE_FOREIGN_DATA,
+	FILTER_OBJECT_TYPE_DATA
+} FilterObjectType;
+
 /* global decls */
 static bool dosync = true;		/* Issue fsync() to make dump durable on disk. */
 
@@ -308,6 +332,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
+static void read_filters_from_file(char *filename, DumpOptions *dopt);
 
 
 int
@@ -380,6 +405,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, _row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", no_argument, _exists, 1},
 		{"inserts", no_argument, NULL, 9},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -613,6 +639,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* filter implementation */
+read_filters_from_file(optarg, );
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -1038,6 +1068,8 @@ help(const char *progname)
 			 "   access to)\n"));
 	printf(_("  --exclude-table-data=PATTERN do NOT dump data for the specified table(s)\n"));
 	printf(_("  --extra-float-digits=NUM override default setting for extra_float_digits\n"));
+	printf(_("  --filter=FILENAMEdump objects and data based on the filter expressions\n"
+			 "

Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Pavel Stehule
Hi


> The above comments are fixed in the attached, as well as a pass over the
> docs
> and extended tests to actually test matching a foreign server.  What do
> think
> about this version?  I'm still not convinced that there aren't more quoting
> bugs in the parser, but I've left that intact for now.
>

The problematic points are double quotes and new line char. Any other is
just in  sequence of bytes.

I have just one note to your patch. When you use pg_strncasecmp, then you
have to check the size too


char   *xxx = "incl";
int xxx_size = 4;

elog(NOTICE, "%d",
  pg_strncasecmp(xxx, "include", xxx_size) == 0);

result is NOTICE:  1

"incl" is not keyword "include"

Regards

Pavel



>
> --
> Daniel Gustafsson   https://vmware.com/
>
>


Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Tomas Vondra
On 9/21/21 3:28 PM, Alvaro Herrera wrote:
> I definitely agree that we should have two files, one for config and
> another one for filter, since their purposes are orthogonal and their
> formats are likely different; trying to cram the filter specification in
> the config file seems unfriendly because it'd force users to write the
> filter in whatever alien grammar used for the config file.  Also, this
> would make it easier to use a single config file with a bunch of
> different filter files.
> 

+1, that is pretty much excatly what I argued for not too long ago.

> On 2021-Sep-21, Daniel Gustafsson wrote:
> 
>> I'm not convinced that we can/should change or remove a commandline parameter
>> in a coming version when there might be scripts expecting it to work in a
>> specific way.  Having a --filter as well as a --config where the configfile 
>> can
>> refer to the filterfile also passed via --filter sounds like problem waiting 
>> to
>> happen, so I think we need to settle how we want to interact with this file
>> before anything goes in.
> 
> I think both the filter and the hypothetical config file are going to
> interact (be redundant) with almost all already existing switches, and
> there's no need to talk about removing anything (e.g., nobody would
> argue for the removal of "-t" even though that's redundant with the
> filter file).
> 
> I see no problem with the config file specifying a filter file.
> 
> AFAICS if the config file specifies a filter and the user also specifies
> a filter in the command line, we have two easy options: raise an error
> about the redundant option, or have the command line option supersede
> the one in the config file.  The latter strikes me as the more useful
> behavior, and it's in line with what other tools do in similar cases, so
> that's what I propose doing.
> 
> (There might be less easy options too, such as somehow combining the two
> filters, but offhand I don't see any reason why this is real-world
> useful, so I don't propose doing that.)
> 

Well, I think we already have to do decisions like that, because you can
do e.g. this:

pg_dump -T t -t t

So we already do combine the switches, and we do this:

When both -t and -T are given, the behavior is to dump just the
tables that match at least one -t switch but no -T switches. If -T
appears without -t, then tables matching -T are excluded from what
is otherwise a normal dump.

That seems fairly reasonable, and I don't see why not to use the same
logic for combining patterns no matter where we got them (filter file,
command-line option, etc.).

Just combine everything, and then check if there's any "exclude" rule.
If yes, we're done - exclude. If not, check if there's "include" rule.
If not, still exclude. Otherwise include.

Seems reasonable and consistent to me, and I don't see why not to allow
multiple --filter parameters.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Alvaro Herrera
I definitely agree that we should have two files, one for config and
another one for filter, since their purposes are orthogonal and their
formats are likely different; trying to cram the filter specification in
the config file seems unfriendly because it'd force users to write the
filter in whatever alien grammar used for the config file.  Also, this
would make it easier to use a single config file with a bunch of
different filter files.

On 2021-Sep-21, Daniel Gustafsson wrote:

> I'm not convinced that we can/should change or remove a commandline parameter
> in a coming version when there might be scripts expecting it to work in a
> specific way.  Having a --filter as well as a --config where the configfile 
> can
> refer to the filterfile also passed via --filter sounds like problem waiting 
> to
> happen, so I think we need to settle how we want to interact with this file
> before anything goes in.

I think both the filter and the hypothetical config file are going to
interact (be redundant) with almost all already existing switches, and
there's no need to talk about removing anything (e.g., nobody would
argue for the removal of "-t" even though that's redundant with the
filter file).

I see no problem with the config file specifying a filter file.

AFAICS if the config file specifies a filter and the user also specifies
a filter in the command line, we have two easy options: raise an error
about the redundant option, or have the command line option supersede
the one in the config file.  The latter strikes me as the more useful
behavior, and it's in line with what other tools do in similar cases, so
that's what I propose doing.

(There might be less easy options too, such as somehow combining the two
filters, but offhand I don't see any reason why this is real-world
useful, so I don't propose doing that.)

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."(Robert Davidson)
   http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php




Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Pavel Stehule
út 21. 9. 2021 v 14:37 odesílatel Daniel Gustafsson 
napsal:

> > On 21 Sep 2021, at 08:50, Pavel Stehule  wrote:
> > po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson  > napsal:
>
> > +   printf(_("  --filter=FILENAMEdump objects and data
> based on the filter expressions\n"
> > +"   from the filter
> file\n"));
> > Before we settle on --filter I think we need to conclude whether this
> file is
> > intended to be included from a config file, or used on it's own.  If we
> gow tih
> > the former then we might not want a separate option for just --filter.
> >
> > I prefer to separate two files. Although there is some intersection, I
> think it is good to have two simple separate files for two really different
> tasks.
> > It does filtering, and it should be controlled by option "--filter".
> When the implementation will be changed, then this option can be changed
> too.
> > Filtering is just a pg_dump related feature. Revision of client
> application configuration is a much more generic task, and if we mix it to
> one, we can be
> > in a trap. It can be hard to find one good format for large script
> generated content, and possibly hand written structured content. For
> practical
> > reasons it can be good to have two files too. Filters and configurations
> can have different life cycles.
>
> I'm not convinced that we can/should change or remove a commandline
> parameter
> in a coming version when there might be scripts expecting it to work in a
> specific way.  Having a --filter as well as a --config where the
> configfile can
> refer to the filterfile also passed via --filter sounds like problem
> waiting to
> happen, so I think we need to settle how we want to interact with this file
> before anything goes in.
>
> Any thoughts from those in the thread who have had strong opinions on
> config
> files etc?
>
> > +if (filter_is_keyword(keyword, size, "include"))
> > I would prefer if this function call was replaced by just the
> pg_strcasecmp()
> > call in filter_is_keyword() and the strlen optimization there removed.
> The is
> > not a hot-path, we can afford the string comparison in case of errors.
> Having
> > the string comparison done inline here will improve readability saving
> the
> > reading from jumping to another function to see what it does.
> >
> > I agree that this is not a hot-path, just I don't feel well if I need to
> make a zero end string just for comparison pg_strcasecmp. Current design
> reduces malloc/free cycles. It is used in more places, when Postgres parses
> strings - SQL parser, plpgsql parser. I am not sure about the benefits and
> costs - pg_strcasecmp can be more readable, but for any keyword I have to
> call pstrdup and pfree. Is it necessary? My opinion in this part is not too
> strong - it is a minor issue, maybe I have a little bit different feelings
> about benefits and costs in this specific case, and if you really think the
> benefits of rewriting are higher, I'll do it
>
> Sorry, I typoed my response.  What I meant was to move the pg_strncasecmp
> call
> inline and not do the strlen check, to save readers from jumping around.
> So
> basically end up with the below in read_filter_item():
>
> +   /* Now we expect sequence of two keywords */
> +   if (pg_strncasecmp(keyword, "include", size) == 0)
> +   *is_include = true;
>
>
I don't think so it is safe (strict). Only pg_strncasecmp(..)  is true for
keywords "includex", "includedsss", ... You should to compare the size

Regards

Pavel


>
>


Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Daniel Gustafsson
> On 21 Sep 2021, at 08:50, Pavel Stehule  wrote:
> po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson  > napsal:

> +   printf(_("  --filter=FILENAMEdump objects and data based 
> on the filter expressions\n"
> +"   from the filter 
> file\n"));
> Before we settle on --filter I think we need to conclude whether this file is
> intended to be included from a config file, or used on it's own.  If we gow 
> tih
> the former then we might not want a separate option for just --filter.
> 
> I prefer to separate two files. Although there is some intersection, I think 
> it is good to have two simple separate files for two really different tasks.
> It does filtering, and it should be controlled by option "--filter". When the 
> implementation will be changed, then this option can be changed too. 
> Filtering is just a pg_dump related feature. Revision of client application 
> configuration is a much more generic task, and if we mix it to one, we can be
> in a trap. It can be hard to find one good format for large script generated 
> content, and possibly hand written structured content. For practical
> reasons it can be good to have two files too. Filters and configurations can 
> have different life cycles.

I'm not convinced that we can/should change or remove a commandline parameter
in a coming version when there might be scripts expecting it to work in a
specific way.  Having a --filter as well as a --config where the configfile can
refer to the filterfile also passed via --filter sounds like problem waiting to
happen, so I think we need to settle how we want to interact with this file
before anything goes in.

Any thoughts from those in the thread who have had strong opinions on config
files etc?

> +if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed.  The is
> not a hot-path, we can afford the string comparison in case of errors.  Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
> 
> I agree that this is not a hot-path, just I don't feel well if I need to make 
> a zero end string just for comparison pg_strcasecmp. Current design reduces 
> malloc/free cycles. It is used in more places, when Postgres parses strings - 
> SQL parser, plpgsql parser. I am not sure about the benefits and costs - 
> pg_strcasecmp can be more readable, but for any keyword I have to call 
> pstrdup and pfree. Is it necessary? My opinion in this part is not too strong 
> - it is a minor issue, maybe I have a little bit different feelings about 
> benefits and costs in this specific case, and if you really think the 
> benefits of rewriting are higher, I'll do it

Sorry, I typoed my response.  What I meant was to move the pg_strncasecmp call
inline and not do the strlen check, to save readers from jumping around.  So
basically end up with the below in read_filter_item():

+   /* Now we expect sequence of two keywords */
+   if (pg_strncasecmp(keyword, "include", size) == 0)
+   *is_include = true;

> +initStringInfo();
> Why is this using a StringInfo rather than a PQExpBuffer as the rest of 
> pg_dump
> does?
> 
> The StringInfo is used because I use the pg_get_line_buf function, and this 
> function uses this API.

Ah, of course.

A few other comments from another pass over this:

+   exit_nicely(-1);
Why -1?  pg_dump (and all other binaries) exits with 1 on IMO even more serious
errors so I think this should use 1 as well.


+   if (!pg_get_line_buf(fstate->fp, line))
+   {
+   if (ferror(fstate->fp))
+   fatal("could not read from file \"%s\": %m", 
fstate->filename);
+
+   exit_invalid_filter_format(fstate,"unexpected end of file");
+   }
In the ferror() case this codepath isn't running fclose() on the file pointer
(unless stdin) which we do elsewhere, so this should use pg_log_error and
exit_nicely instead.


+   pg_log_fatal("could not read from file \"%s\": %m", fstate->filename);
Based on how other errors are treated in pg_dump I think this should be
downgraded to a pg_log_error.

The above comments are fixed in the attached, as well as a pass over the docs
and extended tests to actually test matching a foreign server.  What do think
about this version?  I'm still not convinced that there aren't more quoting
bugs in the parser, but I've left that intact for now.

--
Daniel Gustafsson   https://vmware.com/



0001-Implement-filter-for-pg_dump.patch
Description: Binary data


Re: proposal: possibility to read dumped table's name from file

2021-09-21 Thread Pavel Stehule
po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson 
napsal:

> > Will do a closer review on the patch shortly.
>
> Had a read through, and tested, the latest posted version today:
>
> +Read objects filters from the specified file. Specify "-" to read from
> +stdin. Lines of this file must have the following format:
> I think this should be - and STDIN
> to
> match the rest of the docs.
>
>
> +   
> +With the following filter file, the dump would include table
> +mytable1 and data from foreign tables of
> +some_foreign_server foreign server, but
> exclude data
> +from table mytable2.
> +
> +include table mytable1
> +include foreign_data some_foreign_server
> +exclude table mytable2
> +
> +   
> This example is highlighting the issue I've previously raised with the
> UX/doc
> of this feature.  The "exclude table mytable2" is totally pointless in the
> above since the exact match of "mytable1" will remove all other objects.
> What
> we should be doing instead is use the pattern matching aspect along the
> lines
> of the below:
>
> include table mytable*
> exclude table mytable2
>
> +The --filter option works just like the other
> +options to include or exclude tables, schemas, table data, or
> foreign
> This should refer to the actual options by name to make it clear which we
> are
> talking about.
>

fixed


>
> +   printf(_("  --filter=FILENAMEdump objects and data
> based on the filter expressions\n"
> +"   from the filter
> file\n"));
> Before we settle on --filter I think we need to conclude whether this file
> is
> intended to be included from a config file, or used on it's own.  If we
> gow tih
> the former then we might not want a separate option for just --filter.
>

I prefer to separate two files. Although there is some intersection, I
think it is good to have two simple separate files for two really different
tasks.
It does filtering, and it should be controlled by option "--filter". When
the implementation will be changed, then this option can be changed too.
Filtering is just a pg_dump related feature. Revision of client application
configuration is a much more generic task, and if we mix it to one, we can
be
in a trap. It can be hard to find one good format for large script
generated content, and possibly hand written structured content. For
practical
reasons it can be good to have two files too. Filters and configurations
can have different life cycles.


>
> +if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the
> pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed.
> The is
> not a hot-path, we can afford the string comparison in case of errors.
> Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
>

I agree that this is not a hot-path, just I don't feel well if I need to
make a zero end string just for comparison pg_strcasecmp. Current design
reduces malloc/free cycles. It is used in more places, when Postgres parses
strings - SQL parser, plpgsql parser. I am not sure about the benefits and
costs - pg_strcasecmp can be more readable, but for any keyword I have to
call pstrdup and pfree. Is it necessary? My opinion in this part is not too
strong - it is a minor issue, maybe I have a little bit different feelings
about benefits and costs in this specific case, and if you really think the
benefits of rewriting are higher, I'll do it.


>
> +initStringInfo();
> Why is this using a StringInfo rather than a PQExpBuffer as the rest of
> pg_dump
> does?
>

The StringInfo is used because I use the pg_get_line_buf function, and this
function uses this API.


>
> +typedef struct
> I think these should be at the top of the file with the other typedefs.
>
>
done



>
> When testing strange object names, I was unable to express this name in
> the filter file:
>
> $ ./bin/psql
> psql (15devel)
> Type "help" for help.
>
> danielg=# create table "
> danielg"# t
> danielg"# t
> danielg"# " (a integer);
> CREATE TABLE
> danielg=# select relname from pg_class order by oid desc limit 1;
>  relname
> -
> +
>  t  +
>  t  +
>
> (1 row)
>
>
Good catch - I had badly placed pg_strip_crlf function, fixed and regress
tests enhanced

Please check assigned patch

Regards

Pavel





> --
> Daniel Gustafsson   https://vmware.com/
>
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..d692f4230e 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -789,6 +789,55 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Read objects filters from the specified file. Specify - to read from
+STDIN. Lines of this file must have the 

Re: proposal: possibility to read dumped table's name from file

2021-09-20 Thread Daniel Gustafsson
> Will do a closer review on the patch shortly.

Had a read through, and tested, the latest posted version today:

+Read objects filters from the specified file. Specify "-" to read from
+stdin. Lines of this file must have the following format:
I think this should be - and STDIN to
match the rest of the docs.


+   
+With the following filter file, the dump would include table
+mytable1 and data from foreign tables of
+some_foreign_server foreign server, but exclude data
+from table mytable2.
+
+include table mytable1
+include foreign_data some_foreign_server
+exclude table mytable2
+
+   
This example is highlighting the issue I've previously raised with the UX/doc
of this feature.  The "exclude table mytable2" is totally pointless in the
above since the exact match of "mytable1" will remove all other objects.  What
we should be doing instead is use the pattern matching aspect along the lines
of the below:

include table mytable*
exclude table mytable2

+The --filter option works just like the other
+options to include or exclude tables, schemas, table data, or foreign
This should refer to the actual options by name to make it clear which we are
talking about.


+   printf(_("  --filter=FILENAMEdump objects and data based on 
the filter expressions\n"
+"   from the filter 
file\n"));
Before we settle on --filter I think we need to conclude whether this file is
intended to be included from a config file, or used on it's own.  If we gow tih
the former then we might not want a separate option for just --filter.


+if (filter_is_keyword(keyword, size, "include"))
I would prefer if this function call was replaced by just the pg_strcasecmp()
call in filter_is_keyword() and the strlen optimization there removed.  The is
not a hot-path, we can afford the string comparison in case of errors.  Having
the string comparison done inline here will improve readability saving the
reading from jumping to another function to see what it does.


+initStringInfo();
Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump
does?


+typedef struct
I think these should be at the top of the file with the other typedefs.


When testing strange object names, I was unable to express this name in the 
filter file:

$ ./bin/psql
psql (15devel)
Type "help" for help.

danielg=# create table "
danielg"# t
danielg"# t
danielg"# " (a integer);
CREATE TABLE
danielg=# select relname from pg_class order by oid desc limit 1;
 relname
-
+
 t  +
 t  +

(1 row)

--
Daniel Gustafsson   https://vmware.com/





Re: proposal: possibility to read dumped table's name from file

2021-09-17 Thread Pavel Stehule
> >
> > Initially not, but now, when I am thinking about it, I don't think so
> Bison helps. The syntax of the filter file is nicely linear. Now, the code
> of the parser is a little bit larger than minimalistic, but it is due to
> nicer error's messages. The raw implementation in Bison raised just "syntax
> error" and positions. I did code refactoring, and now the scanning, parsing
> and processing are divided into separated routines. Parsing related code
> has 90 lines. In this case, I don't think using a parser grammar file can
> carry any benefit. grammar is more readable, sure, but we need to include
> bison, we need to handle errors, and if we want to raise more helpful
> errors than just "syntax error", then the code will be longer.
>
> I'm not so concerned by code size, but rather parsing of quotations etc and
> being able to reason about it's correctness.  IMHO that's easier done by
> reading a defined grammar than parsing a handwritten parser.
>
>
In this case the complex part is not a parser, but the scanner is complex
and writing this in flex is not too easy. I wrote so the grammar file can
be more readable, but the usual error from Bison is "syntax error" and
position, so it does not win from the user perspective. When a parser is
not linear, then a generated parser can help a lot, but using it at this
moment is premature.


> Will do a closer review on the patch shortly.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>


  1   2   3   >