Hi David

Thanks for reviewing.

Please hold back reviewing this v12 patch until I have verified that
it passes CfBot and until I have finished my testing with 17TB table.

I would appreciate pointers or help on adding the data correctness
tests to the tap tests as real tests.
For now I put them as DO$$ ... $$ blocks in the parallel test .pl and
I check manually that the table data checksums match
If we add them we should add them to other places in pg_dump tests as
well. Currently we just test that dump and restore do not fail, not
that the restored data is correct.

On Fri, Jan 23, 2026 at 3:15 AM David Rowley <[email protected]> wrote:
>
> On Fri, 23 Jan 2026 at 06:05, Hannu Krosing <[email protected]> wrote:
> >
> > Fixing all the warnings
>
> I think overall this needs significantly more care and precision than
> what you've given it so far. For example, you have:
>
> +    if(dopt->max_table_segment_pages != InvalidBlockNumber)
> +        appendPQExpBufferStr(query,
> "pg_relation_size(c.oid)/current_setting('block_size')::int AS
> relpages, ");
> +    else
> +        appendPQExpBufferStr(query, "c.relpages, ");
>
> Note that pg_class.relpages is "int". Later the code in master does:
>
> tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages));

I have now fixed the base issue by changing the data type of
TableInfo.relpages to BlockNumber, and also changed the way we get it
by

1. converting it to unsigned int ( c.relpages::oid ) in the query
2. reading it from the result using strtoul()

(technically it should have been enough to just use strtoul() as it
already wraps signed ints to unsigned ones, but having it converted in
the query seems cleaner)

This allowed removing casts to (BlockNumber) everywhere where
.relpages was used.

Functionally value was ever only used for ordering and even this
loosley, which explains why patch v10 did not break anything.

I also changed the data type of TocEntry.dataLength from pgoff_t to
uint64. The current clearly had an overflow in case when off_t was 32
bit and sum of relpages from heap and toast was larger than allowed
for it.

> If you look in vacuum.c, you'll see "pgcform->relpages = (int32)
> num_pages;" that the value stored in relpages will be negative when
> the table is >= 16TB (assuming 8k pages). Your pg_relation_size
> expression is not going to produce an INT. It'll produce a BIGINT, per
> "select pg_typeof(pg_relation_size('pg_class') /
> current_setting('block_size')::int);". So the atoi() can receive a
> string of digits representing an integer larger than INT_MAX in this
> case. Looking at [1], I see:

As said above this should be fixed now by using correct type in struch
and strtoul().
To be sure  I have now created a 17TB  table and running some tests on
this as well.
Will let you know here when done.

> "7.22.1 Numeric conversion functions 1 The functions atof, atoi, atol,
> and atoll need not affect the value of the integer expression errno on
> an error. If the value of the result cannot be represented, *the
> behavior is undefined.*"
>
> And testing locally, I see that my Microsoft compiler will just return
> INT_MAX on overflow, whereas I see gcc does nothing to prevent
> overflows and just continues to multiply by 10 regardless of what
> overflows occur, which I think would just make the code work by
> accident.

As .relpages was only ever used for ordering parallel copies it does
work just not optimally.

The old code has similar overflow/wraparound for case when off_t is 32
bit int and the sum of relpages from heap and toast table is above
INT_MAX

I have removed the whole part where this was partially fixed for the
case when one of them was > 0x7fffffff (i.e. negative) by pinning the
dataLength to INT_MAX in that case

> Aside from that, nothing in the documentation mentions that this is
> for "heap" tables only. That should be mentioned as it'll just result
> in people posting questions about why it's not working for some other
> table access method. There's also not much care for white space.
> You've introduced a bunch of whitespace changes unrelated to code
> changes you've made, plus there's not much regard for following
> project standard. For example, you commonly do "if(" and don't
> consistently follow the bracing rules, e.g:
>
> + for(chkptr = optarg; *chkptr != '\0'; chkptr++)
> +     if(*chkptr == '-')

I assumed that it is the classical "single statemet -- no braces.

Do we have a writeup of our coding standards somewhere ?

Now this specific case is rewritten using while() so shoud be ok.

> Things like the following help convey the level of care that's gone into this:
>
> +/*
> + * option_parse_int
> + *
> + * Parse integer value for an option.  If the parsing is successful, returns
> + * true and stores the result in *result if that's given; if parsing fails,
> + * returns false.
> + */
> +bool
> +option_parse_uint32(const char *optarg, const char *optname,
>
> i.e zero effort gone in to modify the comments after pasting them from
> option_parse_int().
>
> Another example:
>
> + pg_log_error("%s musst be in range %lu..%lu",
>
> Also, I have no comprehension of why you'd use uint64 for the valid
> range when the function is for processing uint32 types in:

The uint64 there I picked up from the referenced long unsigned usage
in pg_resetval after I managed to get pg_log_warning to print out -1
for format %u and did not want to go to debug why that happens.

I have now made all the arguments uint32

> +bool
> +option_parse_uint32(const char *optarg, const char *optname,
> + uint64 min_range, uint64 max_range,
> + uint32 *result)
>
> In its current state, it's quite hard to take this patch seriously.
> Please spend longer self-reviewing it before posting. You could
> temporarily hard-code something for testing which makes at least 1
> table appear to be larger than 16TB and ensure your code works. What
> you have is visually broken and depends on whatever the atoi
> implementation opts to do in the overflow case. These are all things
> diligent commiters will be testing and it's sad to see how little
> effort you're putting into this. How do you expect this community to
> scale with this quality level of patch submissions? You've been around
> long enough and should know and do better.  Are you just expecting the
> committer to fix these things for you? That work does not get done via
> magic wand. Being on v10 already, I'd have expected the patch to be
> far beyond proof of concept grade. If you're withholding investing
> time on this until you see more community buy-in, then I'd suggest you
> write that and withhold further revisions until you're happy with the
> level of buy-in.

> I'm also still not liking your de-normalised TableInfo representation
> for "is_segment".
> IMO, InvalidBlockNumber should be used to represent
> open bounded ranges, and if there's no chunking, then startPage and
> endPage will both be InvalidBlockNumber.

That's what I ended up doing

I switched to using startPage = InvalidBlockNumber to indicate that no
chunking is in effect.

This is safe because when chunking is in use I always try to set both
chunk end pages, and lower bound I can always set the lower bound.

Only for the last page is the endPage left to InvalidBlockNumber.

> IMO, what you have now
> needlessly allows invalid states where is_segment == true and
> startPage, endPage are not set correctly. If you want to keep the code
> simple, hide the complexity in a macro or an inline function. There's
> just no performance reason to materialise the more complex condition
> into a dedicated boolean flag.
From 2b76c8242fe9b1b294b5ad443612fe583e51792a Mon Sep 17 00:00:00 2001
From: Hannu Krosing <[email protected]>
Date: Tue, 27 Jan 2026 22:51:03 +0100
Subject: [PATCH v12] * changed flag name to max-table-segment-pages * added
 check for amname = "heap" * added simple chunked dump and restore test *
 changed the data type of TableInfo.relpages to BlockNumber,   * select it
 using relpages:oid to get unsigned int out   * read it in from query result
 using strtoul()   * removed a bunch of casts from .relpages to (BlocNumber) *
 changed the data type of TocEntry.dataLength to uint64   current pgoff_t
 certainly had an overflow in 32bit case when heap relpages + toast relpages >
 INT_MAX * switched to using of
 pg_relation_size(c.oid)/current_setting('block_size')::int when
 --max-table-segment-pages is set * added documentation * added
 option_parse_uint32(...) to be used for full range of pages numbers

* TESTS: added a WARNING with count and table data hash to source and chunked restore database
---
 doc/src/sgml/ref/pg_dump.sgml             |  24 +++
 src/bin/pg_dump/pg_backup.h               |   2 +
 src/bin/pg_dump/pg_backup_archiver.c      |   2 +
 src/bin/pg_dump/pg_backup_archiver.h      |   2 +-
 src/bin/pg_dump/pg_dump.c                 | 169 +++++++++++++++++-----
 src/bin/pg_dump/pg_dump.h                 |  22 ++-
 src/bin/pg_dump/t/004_pg_dump_parallel.pl |  52 +++++++
 src/fe_utils/option_utils.c               |  55 +++++++
 src/include/fe_utils/option_utils.h       |   3 +
 9 files changed, 289 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 688e23c0e90..1811c67d141 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1088,6 +1088,30 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-table-segment-pages=<replaceable class="parameter">npages</replaceable></option></term>
+      <listitem>
+       <para>
+        Dump data in segments based on number of pages in the main relation.
+        If the number of data pages in the relation is more than <replaceable class="parameter">npages</replaceable> 
+        the data is split into segments based on that number of pages.
+        Individual segments can be dumped in parallel.
+       </para>
+
+       <note>
+        <para>
+         The option <option>--max-table-segment-pages</option> is applied to only pages
+         in the main heap and if the table has a large TOASTed part this has to be
+         taken into account when deciding on the number of pages to use.
+         In the extreme case a single 8kB heap page can have ~200 toast pointers each 
+         corresponding to 1GB of data. If this data is also non-compressible then a 
+         single-page segment can dump as 200GB file.
+        </para>
+       </note>
+
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--no-comments</option></term>
       <listitem>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index d9041dad720..b63ae05d895 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -27,6 +27,7 @@
 #include "common/file_utils.h"
 #include "fe_utils/simple_list.h"
 #include "libpq-fe.h"
+#include "storage/block.h"
 
 
 typedef enum trivalue
@@ -178,6 +179,7 @@ typedef struct _dumpOptions
 	bool		aclsSkip;
 	const char *lockWaitTimeout;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
+	BlockNumber	max_table_segment_pages; /* chunk when relpages is above this */
 
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 4a63f7392ae..ed1913d66bc 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -44,6 +44,7 @@
 #include "pg_backup_archiver.h"
 #include "pg_backup_db.h"
 #include "pg_backup_utils.h"
+#include "storage/block.h"
 
 #define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n"
 #define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n"
@@ -154,6 +155,7 @@ InitDumpOptions(DumpOptions *opts)
 	opts->dumpSchema = true;
 	opts->dumpData = true;
 	opts->dumpStatistics = false;
+	opts->max_table_segment_pages = InvalidBlockNumber;
 }
 
 /*
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 325b53fc9bd..b6a9f16a122 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -377,7 +377,7 @@ struct _tocEntry
 	size_t		defnLen;		/* length of dumped definition */
 
 	/* working state while dumping/restoring */
-	pgoff_t		dataLength;		/* item's data size; 0 if none or unknown */
+	uint64		dataLength;		/* item's data size; 0 if none or unknown */
 	int			reqs;			/* do we need schema and/or data of object
 								 * (REQ_* bit mask) */
 	bool		created;		/* set for DATA member if TABLE was created */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 687dc98e46d..0badb245b55 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -539,6 +539,7 @@ main(int argc, char **argv)
 		{"exclude-extension", required_argument, NULL, 17},
 		{"sequence-data", no_argument, &dopt.sequence_data, 1},
 		{"restrict-key", required_argument, NULL, 25},
+		{"max-table-segment-pages", required_argument, NULL, 26},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -803,6 +804,13 @@ main(int argc, char **argv)
 				dopt.restrict_key = pg_strdup(optarg);
 				break;
 
+			case 26:
+				if (!option_parse_uint32(optarg, "--max-table-segment-pages", 1, MaxBlockNumber,
+									  &dopt.max_table_segment_pages))
+					exit_nicely(1);
+				pg_log_warning("CHUNKING: set dopt.max_table_segment_pages to [%u]", dopt.max_table_segment_pages);
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -1372,6 +1380,9 @@ help(const char *progname)
 	printf(_("  --extra-float-digits=NUM     override default setting for extra_float_digits\n"));
 	printf(_("  --filter=FILENAME            include or exclude objects and data from dump\n"
 			 "                               based on expressions in FILENAME\n"));
+	printf(_("  --max-table-segment-pages=NUMPAGES\n"
+		     "                               Number of main table pages above which data is \n"
+			 "                               copied out in chunks, also determines the chunk size\n"));
 	printf(_("  --if-exists                  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "                               include data of foreign tables on foreign\n"
@@ -2412,7 +2423,7 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 	 * a filter condition was specified.  For other cases a simple COPY
 	 * suffices.
 	 */
-	if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
+	if (tdinfo->filtercond || is_segment(tdinfo) || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
 	{
 		/* Temporary allows to access to foreign tables to dump data */
 		if (tbinfo->relkind == RELKIND_FOREIGN_TABLE)
@@ -2428,9 +2439,23 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 		else
 			appendPQExpBufferStr(q, "* ");
 
-		appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
+		appendPQExpBuffer(q, "FROM %s %s",
 						  fmtQualifiedDumpable(tbinfo),
 						  tdinfo->filtercond ? tdinfo->filtercond : "");
+		if (is_segment(tdinfo))
+		{
+			appendPQExpBufferStr(q, tdinfo->filtercond?" AND ":" WHERE ");
+			if(tdinfo->startPage == 0)
+				appendPQExpBuffer(q, "ctid <= '(%u,32000)'", tdinfo->endPage);			
+			else if(tdinfo->endPage != InvalidBlockNumber)
+				appendPQExpBuffer(q, "ctid BETWEEN '(%u,1)' AND '(%u,32000)'",
+								 tdinfo->startPage, tdinfo->endPage);
+			else
+				appendPQExpBuffer(q, "ctid >= '(%u,1)'", tdinfo->startPage);
+			pg_log_warning("CHUNKING: pages [%u:%u]",tdinfo->startPage, tdinfo->endPage);
+		}
+
+		appendPQExpBuffer(q, ") TO stdout;");
 	}
 	else
 	{
@@ -2438,6 +2463,9 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 						  fmtQualifiedDumpable(tbinfo),
 						  column_list);
 	}
+
+	pg_log_warning("CHUNKING: data query: %s", q->data);
+	
 	res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
 	PQclear(res);
 	destroyPQExpBuffer(clistBuf);
@@ -2933,42 +2961,95 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo)
 	{
 		TocEntry   *te;
 
-		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
-						  ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
-									   .namespace = tbinfo->dobj.namespace->dobj.name,
-									   .owner = tbinfo->rolname,
-									   .description = "TABLE DATA",
-									   .section = SECTION_DATA,
-									   .createStmt = tdDefn,
-									   .copyStmt = copyStmt,
-									   .deps = &(tbinfo->dobj.dumpId),
-									   .nDeps = 1,
-									   .dumpFn = dumpFn,
-									   .dumpArg = tdinfo));
-
-		/*
-		 * Set the TocEntry's dataLength in case we are doing a parallel dump
-		 * and want to order dump jobs by table size.  We choose to measure
-		 * dataLength in table pages (including TOAST pages) during dump, so
-		 * no scaling is needed.
-		 *
-		 * However, relpages is declared as "integer" in pg_class, and hence
-		 * also in TableInfo, but it's really BlockNumber a/k/a unsigned int.
-		 * Cast so that we get the right interpretation of table sizes
-		 * exceeding INT_MAX pages.
+		/* data chunking works off relpages, which are computed exactly using
+		 * pg_relation_size() when --max-table-segment-pages was set
+		 * 
+		 * We also don't chunk if table access method is not "heap"
+		 * TODO: we may add chunking for other access methods later, maybe 
+		 * based on primary key tranges
 		 */
-		te->dataLength = (BlockNumber) tbinfo->relpages;
-		te->dataLength += (BlockNumber) tbinfo->toastpages;
+		if (tbinfo->relpages <= dopt->max_table_segment_pages || 
+			strcmp(tbinfo->amname, "heap") != 0)
+		{
+			te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = "TABLE DATA",
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = tdinfo));
 
-		/*
-		 * If pgoff_t is only 32 bits wide, the above refinement is useless,
-		 * and instead we'd better worry about integer overflow.  Clamp to
-		 * INT_MAX if the correct result exceeds that.
-		 */
-		if (sizeof(te->dataLength) == 4 &&
-			(tbinfo->relpages < 0 || tbinfo->toastpages < 0 ||
-			 te->dataLength < 0))
-			te->dataLength = INT_MAX;
+			/*
+			 * Set the TocEntry's dataLength in case we are doing a parallel dump
+			 * and want to order dump jobs by table size.  We choose to measure
+			 * dataLength in table pages (including TOAST pages) during dump, so
+			 * no scaling is needed.
+			 *
+			 * While pg_class.relpages which stores BlockNumber, a/k/a unsigned int,
+			 * is declared as "integer" we convert it back and store it as 
+			 * BlockNumber in TableInfo.
+			 * And dataLenght is pgoff_t (long int) so does now overflow for
+			 * 2 x UINT32_MAX 
+			 */
+			te->dataLength = tbinfo->relpages;
+			te->dataLength += tbinfo->toastpages;
+		}
+		else
+		{
+			uint64 current_chunk_start = 0;
+			PQExpBuffer chunk_desc = createPQExpBuffer();
+			
+			pg_log_warning("CHUNKING: toc for chunked relpages [%u]", tbinfo->relpages);
+
+			/* TODO - use uint 64 for current_chunk_start to avoid wraparound */
+			while (current_chunk_start < tbinfo->relpages)
+			{
+				TableDataInfo *chunk_tdinfo = (TableDataInfo *) pg_malloc(sizeof(TableDataInfo));
+
+				memcpy(chunk_tdinfo, tdinfo, sizeof(TableDataInfo));
+				AssignDumpId(&chunk_tdinfo->dobj);
+				//addObjectDependency(&chunk_tdinfo->dobj, tbinfo->dobj.dumpId); /* do we need this here */
+//				chunk_tdinfo->is_segment = true;
+				chunk_tdinfo->startPage = (BlockNumber) current_chunk_start;
+				chunk_tdinfo->endPage = chunk_tdinfo->startPage + dopt->max_table_segment_pages - 1;
+
+				pg_log_warning("CHUNKING: toc for pages [%u:%u]",chunk_tdinfo->startPage, chunk_tdinfo->endPage);
+				
+				current_chunk_start += dopt->max_table_segment_pages;
+				if (current_chunk_start >= tbinfo->relpages)
+					chunk_tdinfo->endPage = InvalidBlockNumber; /* last chunk is for "all the rest" */
+
+				printfPQExpBuffer(chunk_desc, "TABLE DATA (pages %u:%u)", chunk_tdinfo->startPage, chunk_tdinfo->endPage);
+
+				te = ArchiveEntry(fout, chunk_tdinfo->dobj.catId, chunk_tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = chunk_desc->data,
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = chunk_tdinfo));
+
+				if(chunk_tdinfo->endPage == InvalidBlockNumber)
+					te->dataLength = tbinfo->relpages - chunk_tdinfo->startPage;
+				else
+					te->dataLength = dopt->max_table_segment_pages;
+				/* let's assume toast pages distribute evenly among chunks */
+				if(tbinfo->relpages)
+					te->dataLength += te->dataLength * tbinfo->toastpages / tbinfo->relpages;
+			}
+
+			destroyPQExpBuffer(chunk_desc);
+		}
 	}
 
 	destroyPQExpBuffer(copyBuf);
@@ -3092,6 +3173,8 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo)
 	tdinfo->dobj.namespace = tbinfo->dobj.namespace;
 	tdinfo->tdtable = tbinfo;
 	tdinfo->filtercond = NULL;	/* might get set later */
+	tdinfo->startPage = InvalidBlockNumber; /* we use this as indication that no chunking is needed */
+	tdinfo->endPage = InvalidBlockNumber;
 	addObjectDependency(&tdinfo->dobj, tbinfo->dobj.dumpId);
 
 	/* A TableDataInfo contains data, of course */
@@ -7254,8 +7337,16 @@ getTables(Archive *fout, int *numTables)
 						 "c.relnamespace, c.relkind, c.reltype, "
 						 "c.relowner, "
 						 "c.relchecks, "
-						 "c.relhasindex, c.relhasrules, c.relpages, "
-						 "c.reltuples, c.relallvisible, ");
+						 "c.relhasindex, c.relhasrules, ");
+
+	/* fetch current relation size if chunking is requested */
+	if(dopt->max_table_segment_pages != InvalidBlockNumber)
+		appendPQExpBufferStr(query, "pg_relation_size(c.oid)/current_setting('block_size')::int AS relpages, ");
+	else
+		/* pg_class.relpages stores BlockNumber (uint32) in an int field, convert to oid to get unsigned int out */
+		appendPQExpBufferStr(query, "c.relpages::oid, ");
+
+	appendPQExpBufferStr(query, "c.reltuples, c.relallvisible, ");
 
 	if (fout->remoteVersion >= 180000)
 		appendPQExpBufferStr(query, "c.relallfrozen, ");
@@ -7495,7 +7586,7 @@ getTables(Archive *fout, int *numTables)
 		tblinfo[i].ncheck = atoi(PQgetvalue(res, i, i_relchecks));
 		tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
 		tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
-		tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages));
+		tblinfo[i].relpages = strtoul(PQgetvalue(res, i, i_relpages), NULL, 10);
 		if (PQgetisnull(res, i, i_toastpages))
 			tblinfo[i].toastpages = 0;
 		else
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 4c4b14e5fc7..be71661ac41 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -16,6 +16,7 @@
 
 #include "pg_backup.h"
 #include "catalog/pg_publication_d.h"
+#include "storage/block.h"
 
 
 #define oidcmp(x,y) ( ((x) < (y) ? -1 : ((x) > (y)) ?  1 : 0) )
@@ -335,7 +336,11 @@ typedef struct _tableInfo
 	Oid			owning_tab;		/* OID of table owning sequence */
 	int			owning_col;		/* attr # of column owning sequence */
 	bool		is_identity_sequence;
-	int32		relpages;		/* table's size in pages (from pg_class) */
+	BlockNumber	relpages;		/* table's size in pages (from pg_class) 
+	                             * converted to unsigned integer
+								 * when --max-table-segment-pages is set
+								 * the computed from pog_relation_size()
+	                             */
 	int			toastpages;		/* toast table's size in pages, if any */
 
 	bool		interesting;	/* true if need to collect more data */
@@ -413,8 +418,21 @@ typedef struct _tableDataInfo
 	DumpableObject dobj;
 	TableInfo  *tdtable;		/* link to table to dump */
 	char	   *filtercond;		/* WHERE condition to limit rows dumped */
+	/* startPage and endPage to support segmented dump */
+	BlockNumber	startPage;		/* As we always know the lowest segment page
+								 * number we can use InvalidBlockNumber here
+								 * to recognize no segmenting case.
+								 * When 0 for the first page of first
+								 * segment we can omit in range query */
+	BlockNumber	endPage;		/* last page in segment for page-range dump,
+	                    		 * startPage+max_table_segment_pages-1 for 
+								 * most segments, but InvalidBlockNumber for
+								 * the last one to indicate open range
+								 */
 } TableDataInfo;
 
+#define is_segment(tdiptr) (tdiptr->startPage != InvalidBlockNumber)
+
 typedef struct _indxInfo
 {
 	DumpableObject dobj;
@@ -448,7 +466,7 @@ typedef struct _indexAttachInfo
 typedef struct _relStatsInfo
 {
 	DumpableObject dobj;
-	int32		relpages;
+	BlockNumber	relpages;
 	char	   *reltuples;
 	int32		relallvisible;
 	int32		relallfrozen;
diff --git a/src/bin/pg_dump/t/004_pg_dump_parallel.pl b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
index 738f34b1c1b..88af25d2889 100644
--- a/src/bin/pg_dump/t/004_pg_dump_parallel.pl
+++ b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
@@ -11,6 +11,7 @@ use Test::More;
 my $dbname1 = 'regression_src';
 my $dbname2 = 'regression_dest1';
 my $dbname3 = 'regression_dest2';
+my $dbname4 = 'regression_dest3';
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
@@ -21,6 +22,7 @@ my $backupdir = $node->backup_dir;
 $node->run_log([ 'createdb', $dbname1 ]);
 $node->run_log([ 'createdb', $dbname2 ]);
 $node->run_log([ 'createdb', $dbname3 ]);
+$node->run_log([ 'createdb', $dbname4 ]);
 
 $node->safe_psql(
 	$dbname1,
@@ -44,6 +46,18 @@ create table tht_p1 partition of tht for values with (modulus 3, remainder 0);
 create table tht_p2 partition of tht for values with (modulus 3, remainder 1);
 create table tht_p3 partition of tht for values with (modulus 3, remainder 2);
 insert into tht select (x%10)::text::digit, x from generate_series(1,1000) x;
+
+-- raise warning so I can check in .log if data was correct
+DO \$\$
+DECLARE
+    thash_rec RECORD;
+BEGIN
+    SELECT 'tplain', count(*), sum(hashtext(t::text)) as tablehash 
+	  INTO thash_rec
+	  FROM tplain AS t;
+    RAISE WARNING 'thash: %', thash_rec;
+END;
+\$\$;
 	});
 
 $node->command_ok(
@@ -87,4 +101,42 @@ $node->command_ok(
 	],
 	'parallel restore as inserts');
 
+$node->command_ok(
+	[
+		'pg_dump',
+		'--format' => 'directory',
+		'--max-table-segment-pages' => 2,
+		'--no-sync',
+		'--jobs' => 2,
+		'--file' => "$backupdir/dump3",
+		$node->connstr($dbname1),
+	],
+	'parallel dump with chunks of two heap pages');
+
+$node->command_ok(
+	[
+		'pg_restore', '--verbose',
+		'--dbname' => $node->connstr($dbname4),
+		'--jobs' => 3,
+		"$backupdir/dump3",
+	],
+	'parallel restore with chunks of two heap pages');
+
+$node->safe_psql(
+	$dbname4,
+	qq{
+
+-- raise warning so I can check in .log if data was correct
+DO \$\$
+DECLARE
+    thash_rec RECORD;
+BEGIN
+    SELECT 'tplain', count(*), sum(hashtext(t::text)) as tablehash 
+	  INTO thash_rec
+	  FROM tplain AS t;
+    RAISE WARNING 'thash after parallel chunked restore: %', thash_rec;
+END;
+\$\$;
+	});
+
 done_testing();
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index cc483ae176c..aff1fbd31a3 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -83,6 +83,61 @@ option_parse_int(const char *optarg, const char *optname,
 	return true;
 }
 
+/*
+ * option_parse_uint32
+ *
+ * Parse unsigned integer value for an option.  If the parsing is successful,
+ * returns true and stores the result in *result if that's given;
+ * if parsing fails, returns false.
+ */
+bool
+option_parse_uint32(const char *optarg, const char *optname,
+				 uint32 min_range, uint32 max_range,
+				 uint32 *result)
+{
+	char	   		*endptr;
+	unsigned long	val;
+
+	/* Fail if there is a minus sign at the start of value */
+	while(isspace((unsigned char) *optarg))
+		optarg++;
+	if(*optarg == '-')
+	{
+		pg_log_error("value \"%s\" for option %s can not be negative",
+					optarg, optname);
+		return false;
+	}
+
+	errno = 0;
+	val = strtoul(optarg, &endptr, 10);
+
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail.
+	 */
+	while (*endptr != '\0' && isspace((unsigned char) *endptr))
+		endptr++;
+
+	if (*endptr != '\0')
+	{
+		pg_log_error("invalid value \"%s\" for option %s",
+					 optarg, optname);
+		return false;
+	}
+
+	/* as min_range and max_range are uint32 then the range check will
+	 * catch the case where unsigned long val is outside 32 bit range */
+	if (errno == ERANGE || val < min_range || val > max_range)
+	{
+		pg_log_error("%s not in range %u..%u", optname, min_range, max_range);
+		return false;
+	}
+
+	if (result)
+		*result = (uint32) val;
+	return true;
+}
+
 /*
  * Provide strictly harmonized handling of the --sync-method option.
  */
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index 0db6e3b6e91..c74cd1fb595 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -22,6 +22,9 @@ extern void handle_help_version_opts(int argc, char *argv[],
 extern bool option_parse_int(const char *optarg, const char *optname,
 							 int min_range, int max_range,
 							 int *result);
+extern bool option_parse_uint32(const char *optarg, const char *optname,
+							 uint32 min_range, uint32 max_range,
+							 uint32 *result);
 extern bool parse_sync_method(const char *optarg,
 							  DataDirSyncMethod *sync_method);
 
-- 
2.43.0

Reply via email to