Re: pg_dump --where option

2020-09-30 Thread Michael Paquier
On Mon, Sep 14, 2020 at 05:00:19PM +0200, Daniel Gustafsson wrote:
> I'm not sure I follow. Surely tests can be added for this functionality?

We should have tests for that.  I can see that this has not been
answered in two weeks, so this has been marked as returned with
feedback in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: pg_dump --where option

2020-09-14 Thread Daniel Gustafsson
> On 14 Sep 2020, at 12:04, Surafel Temesgen  wrote:
> On Fri, Jul 31, 2020 at 1:38 AM Daniel Gustafsson  > wrote:
> 
> >  $ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > 
> > testdump2
> >  $ pg_dump: error: processing of table "public.test1" failed
> > 
> > both test1 and test2 exist in the database and the same subquery works 
> > under psql.
> This is because pg_dump uses schema-qualified object name I add documentation 
> about to use schema-qualified name when using sub query

Documenting something is well and good, but isn't allowing arbitrary SQL
copy-pasted into the query (which isn't checked for schema qualification)
opening up for some of the ill-effects of CVE-2018-1058?

> I don’t add tests because single-quotes and double-quotes are meta-characters 
> for PROVE too.

I'm not sure I follow. Surely tests can be added for this functionality?


How should one invoke this on a multibyte char table name which require
quoting, like --table='"x"' (where x would be an mb char).  Reading the
original thread and trying the syntax from there, it's also not clear how table
names with colons should be handled.  I know they're not common, but if they're
not supported then the tradeoff should be documented.

A nearby thread [0] is adding functionality to read from an input file due to
the command line being too short.  Consumers of this might not run into the
issues mentioned there, but it doesn't seem far fetched that someone who does
also adds a small WHERE clause too.  Maybe these patches should join forces?

cheers ./daniel

[0] CAFj8pRB10wvW0CC9Xq=1XDs=zcqxer3cblcnza+qix4cuh-...@mail.gmail.com



Re: pg_dump --where option

2020-09-14 Thread Surafel Temesgen
On Fri, Jul 31, 2020 at 1:38 AM Daniel Gustafsson  wrote:

>
> >  $ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" >
> testdump2
> >  $ pg_dump: error: processing of table "public.test1" failed
> >
> > both test1 and test2 exist in the database and the same subquery works
> under psql.
> >
>

This is because pg_dump uses schema-qualified object name I add
documentation about to use schema-qualified name when using sub query




> > I also notice that the regression tests for pg_dump is failing due to
> the patch, I think it is worth looking into the failure messages and also
> add some test cases on the new "where" clause to ensure that it can cover
> as many use cases as possible.
>
>
I fix regression test  failure on the attached patch.

I don’t add tests because single-quotes and double-quotes are
meta-characters for PROVE too.

regards

Surafel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 0b2e2de87b..7dc3041247 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1104,6 +1104,24 @@ PostgreSQL documentation
   
  
 
+ 
+  --where=table:filter_clause
+  
+   
+When dumping data for table, only include rows
+that meet the filter_clause condition.
+if --where contains subquery, uses schema-qualified name otherwise
+it is error because pg_dump uses schema-qualified object name to identifies the tables.
+This option is useful when you want to dump only a subset of a particular table.
+--where can be given more than once to provide different filters
+for multiple tables. Note that if multiple options refer to the same table,
+only the first filter_clause will be applied. If necessary, use quotes in your shell to
+provide an argument that contains spaces.
+E.g. --where=mytable:"created_at >= '2018-01-01' AND test = 'f'"
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d3ca54e4dc..418684e272 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -121,6 +121,8 @@ static SimpleStringList tabledata_exclude_patterns = {NULL, NULL};
 static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
 static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
 static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
+static SimpleStringList tabledata_where_patterns = {NULL, NULL};
+static SimpleOidList tabledata_where_oids = {NULL, NULL};
 
 static const CatalogId nilCatalogId = {0, 0};
 
@@ -156,7 +158,8 @@ static void expand_foreign_server_name_patterns(Archive *fout,
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
-	   bool strict_names);
+	   bool strict_names,
+	   bool match_data);
 static NamespaceInfo *findNamespace(Archive *fout, Oid nsoid);
 static void dumpTableData(Archive *fout, TableDataInfo *tdinfo);
 static void refreshMatViewData(Archive *fout, TableDataInfo *tdinfo);
@@ -387,6 +390,7 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"where", required_argument, NULL, 12},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -604,6 +608,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:/* table data WHERE clause */
+simple_string_list_append(_where_patterns, optarg);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -806,17 +814,26 @@ main(int argc, char **argv)
 	{
 		expand_table_name_patterns(fout, _include_patterns,
    _include_oids,
-   strict_names);
+   strict_names, false);
 		if (table_include_oids.head == NULL)
 			fatal("no matching tables were found");
 	}
+
+	if (tabledata_where_patterns.head != NULL)
+	{
+		expand_table_name_patterns(fout, _where_patterns,
+   _where_oids,
+   true, true);
+		if (tabledata_where_oids.head == NULL)
+			fatal("no matching table was found");
+	}
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
 
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
 
 	expand_foreign_server_name_patterns(fout, _servers_include_patterns,
 		_servers_include_oids);
@@ -1047,6 +1064,7 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "   use SET SESSION AUTHORIZATION commands instead of\n"
 			 "   ALTER OWNER commands to set ownership\n"));
+	printf(_("  --where=TABLE:WHERE_CLAUSE   only dump selected rows for the given table\n"));
 
 	

Re: pg_dump --where option

2020-07-30 Thread Daniel Gustafsson
> On 10 Jul 2020, at 02:03, Cary Huang  wrote:
> 
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   tested, failed
> Spec compliant:   tested, failed
> Documentation:tested, failed
> 
> Hi
> 
> I had a look at the patch and it cleanly applies to postgres master branch. I 
> tried to do a quick test on the new "where clause" functionality and for the 
> most part it does the job as described and I'm sure some people will find 
> this feature useful to their database dump needs. However I tried the feature 
> with a case where I have a subquery in the where clause, but it seems to be 
> failing to dump the data. I ran the pg_dump like:
> 
>  $ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > 
> testdump2
>  $ pg_dump: error: processing of table "public.test1" failed
> 
> both test1 and test2 exist in the database and the same subquery works under 
> psql.
> 
> I also notice that the regression tests for pg_dump is failing due to the 
> patch, I think it is worth looking into the failure messages and also add 
> some test cases on the new "where" clause to ensure that it can cover as many 
> use cases as possible.

As this is being reviewed, but time is running out in this CF, I'm moving this
to the next CF.  The entry will be moved to Waiting for Author based on the
above review.

cheers ./daniel



Re: pg_dump --where option

2020-07-09 Thread Cary Huang
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hi

I had a look at the patch and it cleanly applies to postgres master branch. I 
tried to do a quick test on the new "where clause" functionality and for the 
most part it does the job as described and I'm sure some people will find this 
feature useful to their database dump needs. However I tried the feature with a 
case where I have a subquery in the where clause, but it seems to be failing to 
dump the data. I ran the pg_dump like:

  $ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > 
testdump2
  $ pg_dump: error: processing of table "public.test1" failed

both test1 and test2 exist in the database and the same subquery works under 
psql.
 
I also notice that the regression tests for pg_dump is failing due to the 
patch, I think it is worth looking into the failure messages and also add some 
test cases on the new "where" clause to ensure that it can cover as many use 
cases as possible.

thank you
Best regards

Cary Huang
-
HighGo Software Inc. (Canada)
cary.hu...@highgo.ca
www.highgo.ca

pg_dump --where option

2020-06-15 Thread Surafel Temesgen
Internally pg_dump have capability to filter the table data to dump by same
filter clause but it have no interface to use it and the patch here [1]
adds interface to it but it have at-least two issue, one is error message
in case of incorrect where clause specification is somehow hard to debug
and strange to pg_dump .Other issue is it applies the same filter clause to
multiple tables if pattern matching return multiple tables and it seems
undesired behavior to me because mostly we don’t want to applied the same
where clause specification to multiple table. The attached patch contain a
fix for both issue

[1].
https://www.postgresql.org/message-id/flat/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=pxktetcxxg4kcut...@mail.gmail.com


regards

Surafel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 2f0807e912..1c43eaa9de 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1103,6 +1103,21 @@ PostgreSQL documentation
   
  
 
+ 
+  --where=table:filter_clause
+  
+   
+When dumping data for table, only include rows
+that meet the filter_clause condition.
+This option is useful when you want to dump only a subset of a particular table.
+--where can be given more than once to provide different filters for multiple tables.
+Note that if multiple options refer to the same table, only the first filter_clause will be applied.
+If necessary, use quotes in your shell to provide an argument that contains spaces.
+E.g. --where=mytable:"created_at >= '2018-01-01' AND test = 'f'"
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 89d598f856..566469cdb7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -121,6 +121,8 @@ static SimpleStringList tabledata_exclude_patterns = {NULL, NULL};
 static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
 static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
 static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
+static SimpleStringList tabledata_where_patterns = {NULL, NULL};
+static SimpleOidList tabledata_where_oids = {NULL, NULL};
 
 static const CatalogId nilCatalogId = {0, 0};
 
@@ -156,7 +158,8 @@ static void expand_foreign_server_name_patterns(Archive *fout,
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
-	   bool strict_names);
+	   bool strict_names,
+	   bool match_data);
 static NamespaceInfo *findNamespace(Archive *fout, Oid nsoid);
 static void dumpTableData(Archive *fout, TableDataInfo *tdinfo);
 static void refreshMatViewData(Archive *fout, TableDataInfo *tdinfo);
@@ -386,6 +389,7 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"where", required_argument, NULL, 12},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -603,6 +607,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:/* table data WHERE clause */
+simple_string_list_append(_where_patterns, optarg);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -805,17 +813,26 @@ main(int argc, char **argv)
 	{
 		expand_table_name_patterns(fout, _include_patterns,
    _include_oids,
-   strict_names);
+   strict_names, false);
 		if (table_include_oids.head == NULL)
 			fatal("no matching tables were found");
 	}
+
+	if (tabledata_where_patterns.head != NULL)
+	{
+		expand_table_name_patterns(fout, _where_patterns,
+   _where_oids,
+   true, true);
+		if (tabledata_where_oids.head == NULL)
+			fatal("no matching table was found");
+	}
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
 
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
 
 	expand_foreign_server_name_patterns(fout, _servers_include_patterns,
 		_servers_include_oids);
@@ -1046,6 +1063,7 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "   use SET SESSION AUTHORIZATION commands instead of\n"
 			 "   ALTER OWNER commands to set ownership\n"));
+	printf(_("  --where=TABLE:WHERE_CLAUSE   only dump selected rows for the given table\n"));
 
 	printf(_("\nConnection options:\n"));
 	printf(_("  -d, --dbname=DBNAME  database to dump\n"));
@@ -1393,16 +1411,20 @@ expand_foreign_server_name_patterns(Archive *fout,
 /*
  * Find the OIDs of all tables matching the given list of patterns,
  * and append them to the given OID list. See also