Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Mon, Aug 14, 2017 at 12:40 AM, Rushabh Lathiawrote: > On Fri, Aug 11, 2017 at 10:50 PM, Robert Haas wrote: >> On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathia >> wrote: >> > Please find attach patch with the changes. >> >> I found the way that you had the logic structured in flagInhTables() >> to be quite hard to follow, so I rewrote it in the attached version. >> This version also has a bunch of minor cosmetic changes. Barring >> objections, I'll commit it once the tree opens for v11 development. > > Thanks Robert. > > Patch changes looks good to me. Thanks. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Fri, Aug 11, 2017 at 10:50 PM, Robert Haaswrote: > On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathia > wrote: > > Please find attach patch with the changes. > > I found the way that you had the logic structured in flagInhTables() > to be quite hard to follow, so I rewrote it in the attached version. > This version also has a bunch of minor cosmetic changes. Barring > objections, I'll commit it once the tree opens for v11 development. > > Thanks Robert. Patch changes looks good to me. regards, Rushabh Lathia www.EnterpriseDB.com
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathiawrote: > Please find attach patch with the changes. I found the way that you had the logic structured in flagInhTables() to be quite hard to follow, so I rewrote it in the attached version. This version also has a bunch of minor cosmetic changes. Barring objections, I'll commit it once the tree opens for v11 development. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company load-via-partition-root-rmh.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Thu, Aug 10, 2017 at 8:26 PM, Robert Haaswrote: > On Thu, Aug 10, 2017 at 3:47 AM, Rushabh Lathia > wrote: > >> (1) seems like a pretty arbitrary restriction, so I don't like that > >> option. (2) would hurt performance in some use cases. Do we have an > >> option (3)? > > > > How about protecting option 2) with the load-via-partition-root > protection. > > Means > > load the parents information even dump is not set only when > > load-via-partition-root > > & ispartition. > > > > This won't hurt performance for the normal cases. > > Yes, that seems like the right approach. > > +Dump data via the top-most partitioned table (rather than > partition > +table) when dumping data for the partition table. > > I think we should phrase this a bit more clearly, something like this: > When dumping a COPY or INSERT statement for a partitioned table, > target the root of the partitioning hierarchy which contains it rather > than the partition itself. This may be useful when reloading data on > a server where rows do not always fall into the same partitions as > they did on the original server. This could happen, for example, if > the partitioning column is of type text and the two system have > different definitions of the collation used to partition the data. > > Done. > +printf(_(" --load-via-partition-rootload partition table via > the root relation\n")); > > "relation" seems odd to me here. root table? > > Done. > /* Don't bother computing anything for non-target tables, either > */ > if (!tblinfo[i].dobj.dump) > +{ > +/* > + * Load the parents information for the partition table when > + * the load-via-partition-root option is set. As we need the > + * parents information to get the partition root. > + */ > +if (dopt->load_via_partition_root && > +tblinfo[i].ispartition) > +findParentsByOid([i], inhinfo, numInherits); > continue; > +} > > Duplicating the call to findParentsByOid seems less then ideal. How > about doing something like this: > > if (tblinfo[i].dobj.dump) > { > find_parents = true; > mark_parents = true; > } > else if (dopt->load_via_partition_root && tblinfo[i].ispartition) > find_parents = true; > > if (find_parents) > findParentsByOid([i], inhinfo, numInherits); > > etc. > > Done changes to avoid duplicate call to findParentsByOid(). > The comments for this function also need some work - e.g. the function > header comment deserves some kind of update for these changes. > > Done. > +static TableInfo * > +getRootTableInfo(TableInfo *tbinfo) > +{ > +Assert(tbinfo->ispartition); > +Assert(tbinfo->numParents == 1); > +if (tbinfo->parents[0]->ispartition) > +return getRootTableInfo(tbinfo->parents[0]); > + > +return tbinfo->parents[0]; > +} > > This code should iterate, not recurse, to avoid any risk of blowing > out the stack. > > Done. Please find attach patch with the changes. Thanks, Rushabh Lathia www.EnterpriseDB.com diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index bafa031..de8297a 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -889,6 +889,21 @@ PostgreSQL documentation + --load-via-partition-root + + +When dumping a COPY or INSERT statement for a partitioned table, +target the root of the partitioning hierarchy which contains it rather +than the partition itself. This will be useful when reloading data on +a server where rows do not always fall into the same partitions as +they did on the original server. This could happen, for example, if +the partitioning column is of type text and the two system have +different definitions of the collation used to partition the data. + + + + + --section=sectionname diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml index aa944a2..dc1d3cc 100644 --- a/doc/src/sgml/ref/pg_dumpall.sgml +++ b/doc/src/sgml/ref/pg_dumpall.sgml @@ -431,6 +431,21 @@ PostgreSQL documentation + --load-via-partition-root + + +When dumping a COPY or INSERT statement for a partitioned table, +target the root of the partitioning hierarchy which contains it rather +than the partition itself. This will be useful when reloading data on +a server where rows do not always fall into the same partitions as +they did on the original server. This could happen, for example, if +the partitioning column is of type text and the two system have +different definitions of the collation used to partition the data. + + + + + --use-set-session-authorization
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Thu, Aug 10, 2017 at 3:47 AM, Rushabh Lathiawrote: >> (1) seems like a pretty arbitrary restriction, so I don't like that >> option. (2) would hurt performance in some use cases. Do we have an >> option (3)? > > How about protecting option 2) with the load-via-partition-root protection. > Means > load the parents information even dump is not set only when > load-via-partition-root > & ispartition. > > This won't hurt performance for the normal cases. Yes, that seems like the right approach. +Dump data via the top-most partitioned table (rather than partition +table) when dumping data for the partition table. I think we should phrase this a bit more clearly, something like this: When dumping a COPY or INSERT statement for a partitioned table, target the root of the partitioning hierarchy which contains it rather than the partition itself. This may be useful when reloading data on a server where rows do not always fall into the same partitions as they did on the original server. This could happen, for example, if the partitioning column is of type text and the two system have different definitions of the collation used to partition the data. +printf(_(" --load-via-partition-rootload partition table via the root relation\n")); "relation" seems odd to me here. root table? /* Don't bother computing anything for non-target tables, either */ if (!tblinfo[i].dobj.dump) +{ +/* + * Load the parents information for the partition table when + * the load-via-partition-root option is set. As we need the + * parents information to get the partition root. + */ +if (dopt->load_via_partition_root && +tblinfo[i].ispartition) +findParentsByOid([i], inhinfo, numInherits); continue; +} Duplicating the call to findParentsByOid seems less then ideal. How about doing something like this: if (tblinfo[i].dobj.dump) { find_parents = true; mark_parents = true; } else if (dopt->load_via_partition_root && tblinfo[i].ispartition) find_parents = true; if (find_parents) findParentsByOid([i], inhinfo, numInherits); etc. The comments for this function also need some work - e.g. the function header comment deserves some kind of update for these changes. +static TableInfo * +getRootTableInfo(TableInfo *tbinfo) +{ +Assert(tbinfo->ispartition); +Assert(tbinfo->numParents == 1); +if (tbinfo->parents[0]->ispartition) +return getRootTableInfo(tbinfo->parents[0]); + +return tbinfo->parents[0]; +} This code should iterate, not recurse, to avoid any risk of blowing out the stack. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 9, 2017 at 1:20 AM, Robert Haaswrote: > On Tue, Aug 8, 2017 at 8:48 AM, Rushabh Lathia > wrote: > > It seems like with we set the numParents and parents only for the > > dumpable objects (flagInhTables()). Current patch relies on the > numParents > > and parents to get the root partition TableInfo, but when --schema is > been > > specified - it doesn't load those information for the object which is not > > dumpable. > > > > Now one options is: > > > > 1) restrict the --load-via-partition-root to be used with > > the --schema or may be some other options - where we restrict the > > objects. > > > > Consider this, partition root is in schema 'a' and the partition table > is in > > schema 'b', if someone specify the --schema b with > > --load-via-partition-root, > > I think we should not do "INSERT INTO a.tab" to load the data (because > > user specified --schema b). > > > > 2) fix flagInhTables() to load the numParents and the parents information > > for the partition table (can be checked using ispartition), irrespective > of > > whether object is dumpable is true or not. > > (1) seems like a pretty arbitrary restriction, so I don't like that > option. (2) would hurt performance in some use cases. Do we have an > option (3)? > How about protecting option 2) with the load-via-partition-root protection. Means load the parents information even dump is not set only when load-via-partition-root & ispartition. This won't hurt performance for the normal cases. Attaching the patch. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Rushabh Lathia diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index bafa031..65f7e98 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -889,6 +889,16 @@ PostgreSQL documentation + --load-via-partition-root + + +Dump data via the top-most partitioned table (rather than partition +table) when dumping data for the partition table. + + + + + --section=sectionname diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml index aa944a2..54b8e7e 100644 --- a/doc/src/sgml/ref/pg_dumpall.sgml +++ b/doc/src/sgml/ref/pg_dumpall.sgml @@ -431,6 +431,16 @@ PostgreSQL documentation + --load-via-partition-root + + +Dump data via the top-most partitioned table (rather than partition +table) when dumping data for the partition table. + + + + + --use-set-session-authorization diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index 47191be..65c2d7f 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -66,7 +66,7 @@ static int numExtensions; static ExtensionMemberId *extmembers; static int numextmembers; -static void flagInhTables(TableInfo *tbinfo, int numTables, +static void flagInhTables(Archive *fout, TableInfo *tbinfo, int numTables, InhInfo *inhinfo, int numInherits); static void flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables); static DumpableObject **buildIndexArray(void *objArray, int numObjs, @@ -243,7 +243,7 @@ getSchemaData(Archive *fout, int *numTablesPtr) /* Link tables to parents, mark parents of target tables interesting */ if (g_verbose) write_msg(NULL, "finding inheritance relationships\n"); - flagInhTables(tblinfo, numTables, inhinfo, numInherits); + flagInhTables(fout, tblinfo, numTables, inhinfo, numInherits); if (g_verbose) write_msg(NULL, "reading column info for interesting tables\n"); @@ -304,9 +304,10 @@ getSchemaData(Archive *fout, int *numTablesPtr) * modifies tblinfo */ static void -flagInhTables(TableInfo *tblinfo, int numTables, +flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, InhInfo *inhinfo, int numInherits) { + DumpOptions *dopt = fout->dopt; int i, j; int numParents; @@ -322,7 +323,17 @@ flagInhTables(TableInfo *tblinfo, int numTables, /* Don't bother computing anything for non-target tables, either */ if (!tblinfo[i].dobj.dump) + { + /* + * Load the parents information for the partition table when + * the load-via-partition-root option is set. As we need the + * parents information to get the partition root. + */ + if (dopt->load_via_partition_root && +tblinfo[i].ispartition) +findParentsByOid([i], inhinfo, numInherits); continue; + } /* Find all the immediate parent tables */ findParentsByOid([i], inhinfo, numInherits); diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 144068a..ce3100f 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -157,6 +157,7 @@ typedef struct _dumpOptions int outputNoTablespaces; int use_setsessauth; int
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Tue, Aug 8, 2017 at 8:48 AM, Rushabh Lathiawrote: > It seems like with we set the numParents and parents only for the > dumpable objects (flagInhTables()). Current patch relies on the numParents > and parents to get the root partition TableInfo, but when --schema is been > specified - it doesn't load those information for the object which is not > dumpable. > > Now one options is: > > 1) restrict the --load-via-partition-root to be used with > the --schema or may be some other options - where we restrict the > objects. > > Consider this, partition root is in schema 'a' and the partition table is in > schema 'b', if someone specify the --schema b with > --load-via-partition-root, > I think we should not do "INSERT INTO a.tab" to load the data (because > user specified --schema b). > > 2) fix flagInhTables() to load the numParents and the parents information > for the partition table (can be checked using ispartition), irrespective of > whether object is dumpable is true or not. (1) seems like a pretty arbitrary restriction, so I don't like that option. (2) would hurt performance in some use cases. Do we have an option (3)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Tue, Aug 8, 2017 at 6:18 PM, Rushabh Lathiawrote: > Thanks Rajkumar for testing and reporting this. > > > It seems like with we set the numParents and parents only for the > dumpable objects (flagInhTables()). Current patch relies on the numParents > and parents to get the root partition TableInfo, but when --schema is been > specified - it doesn't load those information for the object which is not > dumpable. > > Now one options is: > > 1) restrict the --load-via-partition-root to be used with > the --schema or may be some other options - where we restrict the > objects. > > Consider this, partition root is in schema 'a' and the partition table is in > schema 'b', if someone specify the --schema b with > --load-via-partition-root, > I think we should not do "INSERT INTO a.tab" to load the data (because > user specified --schema b). > +1, this looks cleaner to me. > 2) fix flagInhTables() to load the numParents and the parents information > for the partition table (can be checked using ispartition), irrespective of > whether object is dumpable is true or not. > > May be something like: > > @@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables, > > /* Don't bother computing anything for non-target tables, either */ > if (!tblinfo[i].dobj.dump) > + { > + if (tblinfo[i].ispartition) > + findParentsByOid([i], inhinfo, numInherits); > continue; > + } > > I am still looking into this, meanwhile any inputs are welcome. > See the note given in the pg_dump document[1] is : "When -n is specified, pg_dump makes no attempt to dump any other database objects that the selected schema(s) might depend upon. Therefore, there is no guarantee that the results of a specific-schema dump can be successfully restored by themselves into a clean database." If we want to follow the same trend then we could simply dump all partition(ed) belong to a specified schema. In the Rajkumar’s example, we need to dump partitions belong to schema 'a' (i.e t1_p1 and t1_p2_p1), and target root for the insert query will be the same table or the parent belong to the same schema. Regards, Amul [1] https://www.postgresql.org/docs/devel/static/app-pgdump.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
Thanks Rajkumar for testing and reporting this. It seems like with we set the numParents and parents only for the dumpable objects (flagInhTables()). Current patch relies on the numParents and parents to get the root partition TableInfo, but when --schema is been specified - it doesn't load those information for the object which is not dumpable. Now one options is: 1) restrict the --load-via-partition-root to be used with the --schema or may be some other options - where we restrict the objects. Consider this, partition root is in schema 'a' and the partition table is in schema 'b', if someone specify the --schema b with --load-via-partition-root, I think we should not do "INSERT INTO a.tab" to load the data (because user specified --schema b). 2) fix flagInhTables() to load the numParents and the parents information for the partition table (can be checked using ispartition), irrespective of whether object is dumpable is true or not. May be something like: @@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables, /* Don't bother computing anything for non-target tables, either */ if (!tblinfo[i].dobj.dump) + { + if (tblinfo[i].ispartition) + findParentsByOid([i], inhinfo, numInherits); continue; + } I am still looking into this, meanwhile any inputs are welcome. On Tue, Aug 8, 2017 at 4:10 PM, Rajkumar Raghuwanshi < rajkumar.raghuwan...@enterprisedb.com> wrote: > Hi Rushabh, > > While testing latest v2 patch, I got a crash when using > --load-via-partition-root with --schema options. Below are steps to > reproduce. > > --create below test data > create schema a; > create schema b; > create schema c; > > create table t1 (a int,b text) partition by list(a); > create table a.t1_p1 partition of t1 FOR VALUES in (1,2,3,4) partition by > list(a); > create table b.t1_p1_p1 partition of a.t1_p1 FOR VALUES in (1,2); > create table c.t1_p1_p2 partition of a.t1_p1 FOR VALUES in (3,4); > create table b.t1_p2 partition of t1 FOR VALUES in (5,6,7,8) partition by > list(a); > create table a.t1_p2_p1 partition of b.t1_p2 FOR VALUES in (5,6); > create table t1_p2_p2 partition of b.t1_p2 FOR VALUES in (7,8); > > insert into t1 values (8,'t1'); > insert into a.t1_p1 values (2,'a.t1_p1'); > insert into b.t1_p1_p1 values (1,'b.t1_p1_p1'); > insert into c.t1_p1_p2 values (3,'c.t1_p1_p2'); > insert into b.t1_p2 values (6,'b.t1_p2'); > insert into a.t1_p2_p1 values (5,'a.t1_p2_p1'); > insert into t1_p2_p2 values (7,'t1_p2_p1'); > insert into t1 values (4 ,'t1'); > > --trying to take pg_dump > [edb@localhost bin]$ ./pg_dump -d postgres --schema=a -f d1.dump -Fp > [edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root -f > d2.dump -Fp > [edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root > --schema=a -f d3.dump -Fp > pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents > == 1' failed. > Aborted (core dumped) > > > > Thanks & Regards, > Rajkumar Raghuwanshi > QMG, EnterpriseDB Corporation > > On Fri, Aug 4, 2017 at 3:01 PM, Rushabh Lathia> wrote: > >> >> Here is an update patch, now renamed the switch to >> --load-via-partition-root >> and also added the documentation for the new switch into pg_dump as well >> as pg_dumpall. >> >> >> On Fri, Aug 4, 2017 at 7:13 AM, Amit Langote < >> langote_amit...@lab.ntt.co.jp> wrote: >> >>> On 2017/08/04 1:08, David G. Johnston wrote: >>> > On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane wrote: >>> > >>> >> Robert Haas writes: >>> >>> So maybe --load-via-partition-root if nobody likes my previous >>> >>> suggestion of --partition-data-via-root ? >>> >> >>> >> WFM. >>> >> >>> > >>> > +1 >>> >>> +1. >>> >>> Thanks, >>> Amit >>> >>> >> >> >> Thanks, >> Rushabh Lathia >> www.EnterpriseDB.com >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> >> > -- Rushabh Lathia
Re: [HACKERS] reload-through-the-top-parent switch the partition table
Hi Rushabh, While testing latest v2 patch, I got a crash when using --load-via-partition-root with --schema options. Below are steps to reproduce. --create below test data create schema a; create schema b; create schema c; create table t1 (a int,b text) partition by list(a); create table a.t1_p1 partition of t1 FOR VALUES in (1,2,3,4) partition by list(a); create table b.t1_p1_p1 partition of a.t1_p1 FOR VALUES in (1,2); create table c.t1_p1_p2 partition of a.t1_p1 FOR VALUES in (3,4); create table b.t1_p2 partition of t1 FOR VALUES in (5,6,7,8) partition by list(a); create table a.t1_p2_p1 partition of b.t1_p2 FOR VALUES in (5,6); create table t1_p2_p2 partition of b.t1_p2 FOR VALUES in (7,8); insert into t1 values (8,'t1'); insert into a.t1_p1 values (2,'a.t1_p1'); insert into b.t1_p1_p1 values (1,'b.t1_p1_p1'); insert into c.t1_p1_p2 values (3,'c.t1_p1_p2'); insert into b.t1_p2 values (6,'b.t1_p2'); insert into a.t1_p2_p1 values (5,'a.t1_p2_p1'); insert into t1_p2_p2 values (7,'t1_p2_p1'); insert into t1 values (4 ,'t1'); --trying to take pg_dump [edb@localhost bin]$ ./pg_dump -d postgres --schema=a -f d1.dump -Fp [edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root -f d2.dump -Fp [edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root --schema=a -f d3.dump -Fp pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents == 1' failed. Aborted (core dumped) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation On Fri, Aug 4, 2017 at 3:01 PM, Rushabh Lathiawrote: > > Here is an update patch, now renamed the switch to > --load-via-partition-root > and also added the documentation for the new switch into pg_dump as well > as pg_dumpall. > > > On Fri, Aug 4, 2017 at 7:13 AM, Amit Langote < > langote_amit...@lab.ntt.co.jp> wrote: > >> On 2017/08/04 1:08, David G. Johnston wrote: >> > On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane wrote: >> > >> >> Robert Haas writes: >> >>> So maybe --load-via-partition-root if nobody likes my previous >> >>> suggestion of --partition-data-via-root ? >> >> >> >> WFM. >> >> >> > >> > +1 >> >> +1. >> >> Thanks, >> Amit >> >> > > > Thanks, > Rushabh Lathia > www.EnterpriseDB.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] reload-through-the-top-parent switch the partition table
Here is an update patch, now renamed the switch to --load-via-partition-root and also added the documentation for the new switch into pg_dump as well as pg_dumpall. On Fri, Aug 4, 2017 at 7:13 AM, Amit Langotewrote: > On 2017/08/04 1:08, David G. Johnston wrote: > > On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane wrote: > > > >> Robert Haas writes: > >>> So maybe --load-via-partition-root if nobody likes my previous > >>> suggestion of --partition-data-via-root ? > >> > >> WFM. > >> > > > > +1 > > +1. > > Thanks, > Amit > > Thanks, Rushabh Lathia www.EnterpriseDB.com diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index bafa031..65f7e98 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -889,6 +889,16 @@ PostgreSQL documentation + --load-via-partition-root + + +Dump data via the top-most partitioned table (rather than partition +table) when dumping data for the partition table. + + + + + --section=sectionname diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml index aa944a2..54b8e7e 100644 --- a/doc/src/sgml/ref/pg_dumpall.sgml +++ b/doc/src/sgml/ref/pg_dumpall.sgml @@ -431,6 +431,16 @@ PostgreSQL documentation + --load-via-partition-root + + +Dump data via the top-most partitioned table (rather than partition +table) when dumping data for the partition table. + + + + + --use-set-session-authorization diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 144068a..ce3100f 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -157,6 +157,7 @@ typedef struct _dumpOptions int outputNoTablespaces; int use_setsessauth; int enable_row_security; + int load_via_partition_root; /* default, if no "inclusion" switches appear, is to dump everything */ bool include_everything; diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 393b9e2..27e23ca 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -269,6 +269,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions, const char *prefix, Archive *fout); static char *get_synchronized_snapshot(Archive *fout); static void setupDumpWorker(Archive *AHX); +static TableInfo *getRootTableInfo(TableInfo *tbinfo); int @@ -345,6 +346,7 @@ main(int argc, char **argv) {"lock-wait-timeout", required_argument, NULL, 2}, {"no-tablespaces", no_argument, , 1}, {"quote-all-identifiers", no_argument, _all_identifiers, 1}, + {"load-via-partition-root", no_argument, _via_partition_root, 1}, {"role", required_argument, NULL, 3}, {"section", required_argument, NULL, 5}, {"serializable-deferrable", no_argument, _deferrable, 1}, @@ -959,6 +961,7 @@ help(const char *progname) printf(_(" --no-tablespaces do not dump tablespace assignments\n")); printf(_(" --no-unlogged-table-data do not dump unlogged table data\n")); printf(_(" --quote-all-identifiers quote all identifiers, even if not key words\n")); + printf(_(" --load-via-partition-rootload partition table via the root relation\n")); printf(_(" --section=SECTIONdump named section (pre-data, data, or post-data)\n")); printf(_(" --serializable-deferrablewait until the dump can run without anomalies\n")); printf(_(" --snapshot=SNAPSHOT use given snapshot for the dump\n")); @@ -1902,8 +1905,32 @@ dumpTableData_insert(Archive *fout, void *dcontext) if (insertStmt == NULL) { insertStmt = createPQExpBuffer(); + +/* + * When load-via-partition-root is set, get the root relation name + * for the partition table, so that we can reload data through + * the root table. + */ +if (dopt->load_via_partition_root && tbinfo->ispartition) +{ + TableInfo *parentTbinfo; + + parentTbinfo = getRootTableInfo(tbinfo); + + /* + * When we loading data through the root, we will qualify + * the table name. This is needed because earlier + * search_path will be set for the partition table. + */ + classname = (char *) fmtQualifiedId(fout->remoteVersion, + parentTbinfo->dobj.namespace->dobj.name, + parentTbinfo->dobj.name); +} +else + classname = fmtId(tbinfo->dobj.name); + appendPQExpBuffer(insertStmt, "INSERT INTO %s ", - fmtId(classname)); + classname); /* corner case for zero-column table */ if (nfields == 0) @@ -2025,6 +2052,20 @@ dumpTableData_insert(Archive *fout, void *dcontext) return 1; } +/* + * getRootTableInfo: + * get the root TableInfo for the given partition table. + */ +static TableInfo *
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On 2017/08/04 1:08, David G. Johnston wrote: > On Thu, Aug 3, 2017 at 8:53 AM, Tom Lanewrote: > >> Robert Haas writes: >>> So maybe --load-via-partition-root if nobody likes my previous >>> suggestion of --partition-data-via-root ? >> >> WFM. >> > > +1 +1. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Thu, Aug 3, 2017 at 8:53 AM, Tom Lanewrote: > Robert Haas writes: > > So maybe --load-via-partition-root if nobody likes my previous > > suggestion of --partition-data-via-root ? > > WFM. > +1 David J.
Re: [HACKERS] reload-through-the-top-parent switch the partition table
Robert Haaswrites: > So maybe --load-via-partition-root if nobody likes my previous > suggestion of --partition-data-via-root ? WFM. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 2, 2017 at 4:08 PM, Peter Eisentrautwrote: > On 8/2/17 13:58, Tom Lane wrote: >> I notice that the option list already includes some references to >> "insert", so maybe "--insert-via-partition-root"? Although you could >> argue that that's confusing when we're using COPY. > > "load" could be more general. But I'm also OK with "restore". "load" seems better than "restore" to me, both because it's shorter and because it sounds less like pg_dump will be doing the job of pg_restore. So maybe --load-via-partition-root if nobody likes my previous suggestion of --partition-data-via-root ? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Thu, Aug 3, 2017 at 9:01 AM, Rushabh Lathiawrote: > --unpartition-partitioned-table is very confusing. +1. > I liked the previous option which is --use-partitioned-table > [partition-name, ...]. > The Only problem with --use-partitioned-table is, a user needs to specify > the > partition-name in the options list. Imagine if someone having 100's of > partitions then specifying those name is pg_dump option is a pain. Yeah, that won't work. > Rather than that: > > --use-partitioned-table [partitioned_name, ...] # if > names are omitted it defaults to all the partitioned tables. > > Here user need to specify the root relation name in the option - and any > partition table have that as a ROOT, will load the data through > top-parent-relation. We could do that, but I'm not sure it's a good idea to use getopt_long() with optional options. Sometimes that creates confusion -- is pg_dump --use-partitioned-table salad an attempt to dump the salad database with the --use-partitioned-table option, or an attempt to apply --use-partitioned-table only to partitions whose parent is the salad table? getopt_long() has an answer, but some people may guess incorrectly about what it is. I would be more inclined to make this a global option than something that modifies the behavior for certain tables; the only per-table flags we have right now are just to include/exclude individual tables. You could make --inserts or --no-unlogged-table-data apply to some but not all tables, but we didn't; why start here? I don't like the specific name --use-partitioned-table much either. Use it for what? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Thursday, August 3, 2017, Rushabh Lathiawrote: > > > --use-partitioned-table [partitioned_name, ...] # if > names are omitted it defaults to all the partitioned tables. > > Here user need to specify the root relation name in the option - and any > partition table have that as a ROOT, will load the data through > top-parent-relation. > > This is indeed what I intended to convey. Only specify "root" names. David J.
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Thu, Aug 3, 2017 at 3:39 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > On Wed, Aug 2, 2017 at 11:47 PM, David G. Johnston >wrote: > > On Wed, Aug 2, 2017 at 10:58 AM, Tom Lane wrote: > >> > >> Robert Haas writes: > >> > On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane wrote: > >> >> --restore-via-partition-root ? > >> > >> > I worry someone will think that pg_dump is now restoring stuff, but it > >> > isn't. > >> > >> Well, the point is that the commands it emits will cause the eventual > >> restore to go through the root. Anyway, I think trying to avoid using > >> a verb altogether is going to result in a very stilted option name. > >> > >> I notice that the option list already includes some references to > >> "insert", so maybe "--insert-via-partition-root"? Although you could > >> argue that that's confusing when we're using COPY. > > > > > > --use-partitioned-table [partition-name, ...] # if names are omitted it > > defaults to all partitioned tables > > I like this idea since it allows using this feature for selected > tables e.g. hash tables. Otherwise, users will be forced to use this > option even when there is only a single hash partitioned table and > many other list/range partitioned tables. > > +1. > What we are trying to do here is dump the data in a partitioned table > as if it's not partitioned. Combine that with --data-only dumps, and > one could use it to load partitioned data into unpartitioned or > differently partitioned table. So, how about naming the option as > > --unpartition-partitioned-table [partitioned-table-name, ] # if > names are omitted it defaults to all the partitioned tables. > > --unpartition-partitioned-table is very confusing. I liked the previous option which is --use-partitioned-table [partition-name, ...]. The Only problem with --use-partitioned-table is, a user needs to specify the partition-name in the options list. Imagine if someone having 100's of partitions then specifying those name is pg_dump option is a pain. Rather than that: --use-partitioned-table [partitioned_name, ...] # if names are omitted it defaults to all the partitioned tables. Here user need to specify the root relation name in the option - and any partition table have that as a ROOT, will load the data through top-parent-relation. That really says what dump is really doing without focusing on how the > data will be used like restoring/inserting/copying etc. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > -- Rushabh Lathia
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 2, 2017 at 11:47 PM, David G. Johnstonwrote: > On Wed, Aug 2, 2017 at 10:58 AM, Tom Lane wrote: >> >> Robert Haas writes: >> > On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane wrote: >> >> --restore-via-partition-root ? >> >> > I worry someone will think that pg_dump is now restoring stuff, but it >> > isn't. >> >> Well, the point is that the commands it emits will cause the eventual >> restore to go through the root. Anyway, I think trying to avoid using >> a verb altogether is going to result in a very stilted option name. >> >> I notice that the option list already includes some references to >> "insert", so maybe "--insert-via-partition-root"? Although you could >> argue that that's confusing when we're using COPY. > > > --use-partitioned-table [partition-name, ...] # if names are omitted it > defaults to all partitioned tables I like this idea since it allows using this feature for selected tables e.g. hash tables. Otherwise, users will be forced to use this option even when there is only a single hash partitioned table and many other list/range partitioned tables. What we are trying to do here is dump the data in a partitioned table as if it's not partitioned. Combine that with --data-only dumps, and one could use it to load partitioned data into unpartitioned or differently partitioned table. So, how about naming the option as --unpartition-partitioned-table [partitioned-table-name, ] # if names are omitted it defaults to all the partitioned tables. That really says what dump is really doing without focusing on how the data will be used like restoring/inserting/copying etc. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On 8/2/17 13:58, Tom Lane wrote: > I notice that the option list already includes some references to > "insert", so maybe "--insert-via-partition-root"? Although you could > argue that that's confusing when we're using COPY. "load" could be more general. But I'm also OK with "restore". -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 2, 2017 at 1:58 PM, Tom Lanewrote: > Robert Haas writes: >> On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane wrote: >>> --restore-via-partition-root ? > >> I worry someone will think that pg_dump is now restoring stuff, but it isn't. > > Well, the point is that the commands it emits will cause the eventual > restore to go through the root. Anyway, I think trying to avoid using > a verb altogether is going to result in a very stilted option name. > > I notice that the option list already includes some references to > "insert", so maybe "--insert-via-partition-root"? Although you could > argue that that's confusing when we're using COPY. Yeah, that's definitely confusing. I realize that my verbless version is a little odd, but there are numerous precedents for it: --inserts, --column-inserts, --if-exists, --strict-names, ... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 2, 2017 at 10:58 AM, Tom Lanewrote: > Robert Haas writes: > > On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane wrote: > >> --restore-via-partition-root ? > > > I worry someone will think that pg_dump is now restoring stuff, but it > isn't. > > Well, the point is that the commands it emits will cause the eventual > restore to go through the root. Anyway, I think trying to avoid using > a verb altogether is going to result in a very stilted option name. > > I notice that the option list already includes some references to > "insert", so maybe "--insert-via-partition-root"? Although you could > argue that that's confusing when we're using COPY. --use-partitioned-table [partition-name, ...] # if names are omitted it defaults to all partitioned tables I don't know that we need to use "root" in the argument name to communicate "the top-most if partitioned tables are nested". We have the docs to describe exactly what it does. "Partitioned Table" is what we are calling the main routing table in the docs. "Use" seems adequate. FWIW my first thought was "--insert-via-partition-root" and I didn't mind that "COPY" commands would be affected implicitly. David J.
Re: [HACKERS] reload-through-the-top-parent switch the partition table
Robert Haaswrites: > On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane wrote: >> --restore-via-partition-root ? > I worry someone will think that pg_dump is now restoring stuff, but it isn't. Well, the point is that the commands it emits will cause the eventual restore to go through the root. Anyway, I think trying to avoid using a verb altogether is going to result in a very stilted option name. I notice that the option list already includes some references to "insert", so maybe "--insert-via-partition-root"? Although you could argue that that's confusing when we're using COPY. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lanewrote: > Robert Haas writes: >> The patch itself looks just fine on a quick glance, modulo the lack of >> documentation, but I think we need to bikeshed the name of the flag. >> --reload-through-root is clear as daylight to me, but I'm not sure >> users will agree. The lack of the word "partition" is perhaps a >> significant flaw, and pg_dump doesn't really reload anything; it just >> dumps. > >> The best thing I can come up with after brief thought is >> --partition-data-via-root, but maybe somebody else has a better idea? > > --restore-via-partition-root ? I worry someone will think that pg_dump is now restoring stuff, but it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
Robert Haaswrites: > The patch itself looks just fine on a quick glance, modulo the lack of > documentation, but I think we need to bikeshed the name of the flag. > --reload-through-root is clear as daylight to me, but I'm not sure > users will agree. The lack of the word "partition" is perhaps a > significant flaw, and pg_dump doesn't really reload anything; it just > dumps. > The best thing I can come up with after brief thought is > --partition-data-via-root, but maybe somebody else has a better idea? --restore-via-partition-root ? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 2, 2017 at 1:01 AM, Rushabh Lathiawrote: > Looking at the dbObjectTypePriority comments that seems like data > restoration > will *absolutely always* follow all CREATE TABLE commands. Hmm. I wasn't very convinced by those comments, but Tom's commit a1ef01fe163b304760088e3e30eb22036910a495 convinces me that it has to work that way. So I think we are OK on that score. The patch itself looks just fine on a quick glance, modulo the lack of documentation, but I think we need to bikeshed the name of the flag. --reload-through-root is clear as daylight to me, but I'm not sure users will agree. The lack of the word "partition" is perhaps a significant flaw, and pg_dump doesn't really reload anything; it just dumps. The best thing I can come up with after brief thought is --partition-data-via-root, but maybe somebody else has a better idea? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Wed, Aug 2, 2017 at 3:55 AM, Robert Haaswrote: > On Tue, Aug 1, 2017 at 5:34 AM, Rushabh Lathia > wrote: > > My colleague Robert and I had doubt about the order in of TABLE > > and TABLE_DATA. We thought earlier that reload-thought-root might > > might not solve the purpose which has been discussed in the above > > mentioned thread. But later looking into code I realize the sort order > > for DO_TABLE and DO_TABLE_DATA are different, so we don't need > > to worry about that issue. > > Hmm. Does that mean that table data restoration will *absolutely > always* follow all CREATE TABLE commands, or is that something that's > *normally* true but potentially false if dependency sorting switches > things around? > > Looking at the dbObjectTypePriority comments that seems like data restoration will *absolutely always* follow all CREATE TABLE commands. -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Thanks, Rushabh Lathia www.EnterpriseDB.com
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Tue, Aug 1, 2017 at 5:34 AM, Rushabh Lathiawrote: > My colleague Robert and I had doubt about the order in of TABLE > and TABLE_DATA. We thought earlier that reload-thought-root might > might not solve the purpose which has been discussed in the above > mentioned thread. But later looking into code I realize the sort order > for DO_TABLE and DO_TABLE_DATA are different, so we don't need > to worry about that issue. Hmm. Does that mean that table data restoration will *absolutely always* follow all CREATE TABLE commands, or is that something that's *normally* true but potentially false if dependency sorting switches things around? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers