As per the  discussion on the thread, here is the patch which

a) Make checksum for manifest file optional.
b) Allow user to choose a particular algorithm.

Currently with the WIP patch SHA256 and CRC checksum algorithm
supported.  Patch also changed the manifest file format to append
the used algorithm name before the checksum, this way it will be
easy to validator to know which algorithm to used.

Ex:
./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256

$ cat bksha/backup_manifest  | more
PostgreSQL-Backup-Manifest-Version 1
File backup_label 226 2019-12-04 17:46:46 GMT
SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
File pg_xact/0000 8192 2019-12-04 17:46:46 GMT
SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26

./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
PostgreSQL-Backup-Manifest-Version 1
File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133

Pending TODOs:
- Documentation update
- Code cleanup
- Testing.

I will further continue to work on the patch and meanwhile feel free to
provide
thoughts/inputs.

Thanks,


On Mon, Nov 25, 2019 at 11:13 PM Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Nov 22, 2019 at 5:15 PM Tels <nospam-pg-ab...@bloodgate.com>
> wrote:
> > It is related to the number of states...
>
> Thanks for this explanation. See my reply to David where I also
> discuss this point.
>
> > However, if you choose a hash, please do not go below SHA-256. Both MD5
> > and SHA-1 already had collision attacks, and these only got to be bound
> > to be worse.
> >
> >    https://www.mscs.dal.ca/~selinger/md5collision/
> >    https://shattered.io/
>
> Yikes, that second link, about SHA-1, is depressing. Now, it's not
> likely that an attacker has access to your backup repository and can
> spend 6500 years of CPU time to engineer a Trojan file there (maybe
> more, because the files are probably bigger than the PDFs they used in
> that case) and then induce you to restore and rely upon that backup.
> However, it's entirely likely that somebody is going to eventually ban
> SHA-1 as the attacks get better, which is going to be a problem for us
> whether the underlying exposures are problems or not.
>
> > It might even be a wise idea to encode the used Hash-Algorithm into the
> > manifest file, so it can be changed later. The hash length might be not
> > enough to decide which algorithm is the one used.
>
> I agree. Let's write
> SHA256:bc1c3a57369acd0d2183a927fb2e07acbbb1c97f317bbc3b39d93ec65b754af5
> or similar rather than just the hash. That way even if the entire SHA
> family gets cracked, we can easily substitute in something else that
> hasn't been cracked yet.
>
> (It is unclear to me why anyone supposes that *any* popular hash
> function won't eventually be cracked. For a K-bit hash function, there
> are 2^K possible outputs, where K is probably in the hundreds. But
> there are 2^{2^33} possible 1GB files. So for every possible output
> value, there are 2^{2^33-K} inputs that produce that value, which is a
> very very big number. The probability that any given input produces a
> certain output is very low, but the number of possible inputs that
> produce a given output is very high; so assuming that nobody's ever
> going to figure out how to construct them seems optimistic.)
>
> > To get a feeling one can use:
> >
> >     openssl speed md5 sha1 sha256 sha512
> >
> > On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:
> >
> >   The 'numbers' are in 1000s of bytes per second processed.
> >    type       16 bytes     64 bytes    256 bytes   1024 bytes   8192
> > bytes  16384 bytes
> >    md5       122638.55k   277023.96k   487725.57k   630806.19k
> > 683892.74k   688553.98k
> >    sha1      127226.45k   313891.52k   632510.55k   865753.43k
> > 960995.33k   977215.19k
> >    sha256     77611.02k   173368.15k   325460.99k   412633.43k
> > 447022.92k   448020.48k
> >    sha512     51164.77k   205189.87k   361345.79k   543883.26k
> > 638372.52k   645933.74k
> >
> > Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
> > 427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
> > need a
> > pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these
> > speeds
> > and then you'd use a real CPU for your server, not some poor Intel
> > powersaving
> > surfing thingy-majingy :)
>
> I mean, how fast is in theory doesn't matter nearly as much as what
> happens when you benchmark the proposed implementation, and the
> results we have so far don't support the theory that this is so cheap
> as to be negligible.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Rushabh Lathia
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 66aa0fc..c6bb74d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -54,6 +54,20 @@ typedef struct
 	bool		sendtblspcmapfile;
 } basebackup_options;
 
+/* Checksum algorithm option for manifest */
+enum manifestCheckSum
+{
+	MC_NONE = 0,
+	MC_SHA256,
+	MC_CRC
+};
+
+/* checksum algorithm context */
+typedef union checksumCtx
+{
+	pg_sha256_ctx	sha256_ctx;
+	pg_crc32c		crc_ctx;
+}	ChecksumCtx;
 
 static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
 					 List *tablespaces, bool sendtblspclinks,
@@ -72,7 +86,7 @@ static void SendBackupHeader(List *tablespaces);
 static void InitializeManifest(StringInfo manifest);
 static void AddFileToManifest(StringInfo manifest, const char *tsoid,
 							  const char *filename, size_t size, time_t mtime,
-							  uint8 *shabuf);
+							  ChecksumCtx *cCtx);
 static void SendBackupManifest(StringInfo manifest);
 static char *escape_field_for_manifest(const char *s);
 static void base_backup_cleanup(int code, Datum arg);
@@ -82,6 +96,9 @@ static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
 static int	compareWalFileNames(const ListCell *a, const ListCell *b);
 static void throttle(size_t increment);
 static bool is_checksummed_file(const char *fullpath, const char *filename);
+static void initilize_manifest_checksum(ChecksumCtx *cCtx);
+static void update_manifest_checksum(ChecksumCtx *cCtx, const char *buf, off_t cnt);
+static int final_manifest_checksum(ChecksumCtx *cCtx, char *checksumbuf);
 
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
@@ -132,8 +149,8 @@ static long long int total_checksum_failures;
 /* Do not verify checksums. */
 static bool noverify_checksums = false;
 
-/* Add file entry in to manifest with checksums. */
-static bool manifest_with_checksums = false;
+
+static enum manifestCheckSum manifest_with_checksums = MC_NONE;
 
 /*
  * The contents of these directories are removed or recreated during server
@@ -677,6 +694,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
 	bool		o_noverify_checksums = false;
+	bool		o_manifest_with_checksums = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -767,11 +785,23 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 		}
 		else if (strcmp(defel->defname, "manifest_with_checksums") == 0)
 		{
-			if (manifest_with_checksums)
+			char  *manifest_with_checksum_algo = NULL;
+			if (o_manifest_with_checksums)
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("duplicate option \"%s\"", defel->defname)));
-			manifest_with_checksums = true;
+			manifest_with_checksum_algo = strVal(defel->arg);
+
+			if (pg_strcasecmp(manifest_with_checksum_algo, "SHA256") == 0)
+				manifest_with_checksums = MC_SHA256;
+			else if (pg_strcasecmp(manifest_with_checksum_algo, "CRC") == 0)
+				manifest_with_checksums = MC_CRC;
+			else if (pg_strcasecmp(manifest_with_checksum_algo, "NONE") == 0)
+				manifest_with_checksums = MC_NONE;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid manifest_with_checksums option \"%s\"", manifest_with_checksum_algo)));
 		}
 
 		else
@@ -907,14 +937,16 @@ InitializeManifest(StringInfo manifest)
 static void
 AddFileToManifest(StringInfo manifest, const char *tsoid,
 				  const char *filename, size_t size, time_t mtime,
-				  uint8 *shabuf)
+				  ChecksumCtx *cCtx)
 {
 	char	pathbuf[MAXPGPATH];
 	char   *escaped_filename;
 	static char timebuf[128];
-	static char shatextbuf[PG_SHA256_DIGEST_LENGTH * 2 + 1];
-	int		shatextlen;
+	static char checksumbuf[256];
+	char encode_checksumbuf[256];
 	struct pg_tm *tm;
+	char *checksumlabel = NULL;
+	int	   checksumbuflen;
 
 	/*
 	 * If this file is part of a tablespace, the filename passed to this
@@ -941,19 +973,32 @@ AddFileToManifest(StringInfo manifest, const char *tsoid,
 		elog(ERROR, "could not convert epoch to timestamp: %m");
 	pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z", tm);
 
-	/* Convert checksum to hexadecimal. */
-	if (manifest_with_checksums)
+	/* Generate final checksum and Convert it to hexadecimal. */
+	if (manifest_with_checksums != MC_NONE)
 	{
-		shatextlen =
-			hex_encode((char *) shabuf, PG_SHA256_DIGEST_LENGTH, shatextbuf);
-		Assert(shatextlen + 1 == sizeof(shatextbuf));
-		shatextbuf[shatextlen] = '\0';
+		checksumbuflen = final_manifest_checksum(cCtx, checksumbuf);
+		switch (manifest_with_checksums)
+		{
+			case MC_SHA256:
+				checksumlabel = "SHA256:";
+				break;
+			case MC_CRC:
+				checksumlabel = "CRC:";
+				break;
+			case MC_NONE:
+				break;
+		}
+		checksumbuflen = hex_encode(checksumbuf,
+									checksumbuflen,
+									encode_checksumbuf);
+		encode_checksumbuf[checksumbuflen] = '\0';
 	}
 
 	/* Add to manifest. */
-	appendStringInfo(manifest, "File\t%s\t%zu\t%s\t%s\n",
+	appendStringInfo(manifest, "File\t%s\t%zu\t%s\t%s%s\n",
 					 escaped_filename == NULL ? filename : escaped_filename,
-					 size, timebuf, manifest_with_checksums ? shatextbuf : "-");
+					 size, timebuf, checksumlabel ? checksumlabel : "",
+					 manifest_with_checksums != MC_NONE ? encode_checksumbuf : "-");
 
 	/* Avoid leaking memory. */
 	if (escaped_filename != NULL)
@@ -966,24 +1011,34 @@ AddFileToManifest(StringInfo manifest, const char *tsoid,
 static void
 SendBackupManifest(StringInfo manifest)
 {
-	pg_sha256_ctx	sha256_ctx;
-	uint8			shabuf[PG_SHA256_DIGEST_LENGTH];
+	char			checksumbuf[256];
 	StringInfoData	protobuf;
-	int				shastringlen;
+	int				checksumbuflen;
+	ChecksumCtx		cCtx;
+
 
 	/* Checksum the manifest. */
-	if (manifest_with_checksums)
+	if (manifest_with_checksums != MC_NONE)
 	{
-		pg_sha256_init(&sha256_ctx);
-		pg_sha256_update(&sha256_ctx, (uint8 *) manifest->data, manifest->len);
-		pg_sha256_final(&sha256_ctx, shabuf);
+		initilize_manifest_checksum(&cCtx);
+		update_manifest_checksum(&cCtx, manifest->data, manifest->len);
+		checksumbuflen = final_manifest_checksum(&cCtx, (char *) checksumbuf);
 		appendStringInfoString(manifest, "Manifest-Checksum\t");
-		shastringlen = PG_SHA256_DIGEST_LENGTH * 2;
-		enlargeStringInfo(manifest, shastringlen);
-		shastringlen = hex_encode((char *) shabuf, PG_SHA256_DIGEST_LENGTH,
+		switch (manifest_with_checksums)
+		{
+			case MC_SHA256:
+				appendStringInfoString(manifest, "SHA256:");
+				break;
+			case MC_CRC:
+				appendStringInfoString(manifest, "CRC:");
+				break;
+			case MC_NONE:
+				break;
+		}
+		enlargeStringInfo(manifest, checksumbuflen * 2);
+		checksumbuflen = hex_encode(checksumbuf, checksumbuflen,
 				manifest->data + manifest->len);
-		Assert(shastringlen == PG_SHA256_DIGEST_LENGTH * 2);
-		manifest->len += shastringlen;
+		manifest->len += checksumbuflen;
 		appendStringInfoChar(manifest, '\n');
 	}
 
@@ -1115,11 +1170,7 @@ sendFileWithContent(const char *filename, const char *content,
 	struct stat statbuf;
 	int			pad,
 				len;
-	pg_sha256_ctx	sha256_ctx;
-	uint8		shabuf[PG_SHA256_DIGEST_LENGTH];
-
-	if (manifest_with_checksums)
-		pg_sha256_init(&sha256_ctx);
+	ChecksumCtx cCtx;
 
 	len = strlen(content);
 
@@ -1153,14 +1204,14 @@ sendFileWithContent(const char *filename, const char *content,
 		pq_putmessage('d', buf, pad);
 	}
 
-	if (manifest_with_checksums)
+	if (manifest_with_checksums != MC_NONE)
 	{
-		pg_sha256_update(&sha256_ctx, (uint8 *) content, len);
-		pg_sha256_final(&sha256_ctx, shabuf);
+		initilize_manifest_checksum(&cCtx);
+		update_manifest_checksum(&cCtx, content, len);
 	}
 
 	AddFileToManifest(manifest, NULL, filename, len, statbuf.st_mtime,
-					  shabuf);
+					  &cCtx);
 }
 
 /*
@@ -1559,7 +1610,6 @@ is_checksummed_file(const char *fullpath, const char *filename)
  * Copied from pg_dump, but modified to work with libpq for sending
  */
 
-
 /*
  * Given the member, write the TAR header & send the file.
  *
@@ -1591,11 +1641,9 @@ sendFile(const char *readfilename, const char *tarfilename,
 	int			segmentno = 0;
 	char	   *segmentpath;
 	bool		verify_checksum = false;
-	pg_sha256_ctx	sha256_ctx;
-	uint8		shabuf[PG_SHA256_DIGEST_LENGTH];
+	ChecksumCtx cCtx;
 
-	if (manifest_with_checksums)
-		pg_sha256_init(&sha256_ctx);
+	initilize_manifest_checksum(&cCtx);
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1766,8 +1814,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 					(errmsg("base backup could not send data, aborting backup")));
 
 		/* Also feed it to the checksum machinery. */
-		if (manifest_with_checksums)
-			pg_sha256_update(&sha256_ctx, (uint8 *) buf, cnt);
+		update_manifest_checksum(&cCtx, buf, cnt);
 
 		len += cnt;
 		throttle(cnt);
@@ -1793,8 +1840,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 		{
 			cnt = Min(sizeof(buf), statbuf->st_size - len);
 			pq_putmessage('d', buf, cnt);
-			if (manifest_with_checksums)
-				pg_sha256_update(&sha256_ctx, (uint8 *) buf, cnt);
+			update_manifest_checksum(&cCtx, buf, cnt);
 			len += cnt;
 			throttle(cnt);
 		}
@@ -1826,11 +1872,8 @@ sendFile(const char *readfilename, const char *tarfilename,
 	}
 
 	total_checksum_failures += checksum_failures;
-
-	if (manifest_with_checksums)
-		pg_sha256_final(&sha256_ctx, shabuf);
 	AddFileToManifest(manifest, tsoid, tarfilename, statbuf->st_size,
-					  statbuf->st_mtime, shabuf);
+					  statbuf->st_mtime, &cCtx);
 
 	return true;
 }
@@ -1966,3 +2009,63 @@ throttle(size_t increment)
 	 */
 	throttled_last = GetCurrentTimestamp();
 }
+
+/*
+ * Initilize the manifest checksum context according to the provided algorithm.
+ */
+static void
+initilize_manifest_checksum(ChecksumCtx *cCtx)
+{
+	switch (manifest_with_checksums)
+	{
+		case MC_SHA256:
+			pg_sha256_init(&cCtx->sha256_ctx);
+			break;
+		case MC_CRC:
+			INIT_CRC32C(cCtx->crc_ctx);
+			break;
+		case MC_NONE:
+			break;
+	}
+}
+
+static void
+update_manifest_checksum(ChecksumCtx *cCtx, const char *buf, off_t cnt)
+{
+	switch (manifest_with_checksums)
+	{
+		case MC_SHA256:
+			pg_sha256_update(&cCtx->sha256_ctx, (uint8 *) buf, cnt);
+			break;
+		case MC_CRC:
+			COMP_CRC32C(cCtx->crc_ctx, buf, cnt);
+			break;
+		case MC_NONE:
+			break;
+	}
+}
+
+/*
+ * Function calculate the final checksum for the provided context and returns
+ * the length of checksum.
+ */
+static int
+final_manifest_checksum(ChecksumCtx *cCtx, char *checksumbuf)
+{
+	int checksumlen = 0;
+	switch (manifest_with_checksums)
+	{
+		case MC_SHA256:
+			pg_sha256_final(&cCtx->sha256_ctx, (uint8 *)checksumbuf);
+			checksumlen = PG_SHA256_DIGEST_LENGTH;
+			break;
+		case MC_CRC:
+			FIN_CRC32C(cCtx->crc_ctx);
+			pg_ltoa(cCtx->crc_ctx, checksumbuf);
+			checksumlen = strlen(checksumbuf);
+			break;
+		case MC_NONE:
+			break;
+	}
+	return checksumlen;
+}
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 542a3f7..409bd23 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -215,10 +215,10 @@ base_backup_opt:
 				  $$ = makeDefElem("noverify_checksums",
 								   (Node *)makeInteger(true), -1);
 				}
-			| K_MANIFEST_WITH_CHECKSUMS
+			| K_MANIFEST_WITH_CHECKSUMS SCONST
 				{
 				  $$ = makeDefElem("manifest_with_checksums",
-								   (Node *)makeInteger(true), -1);
+								   (Node *)makeString($2), -1);
 				}
 			;
 
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index f4f8ffe..f7b16ab 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -141,7 +141,7 @@ static bool temp_replication_slot = true;
 static bool create_slot = false;
 static bool no_slot = false;
 static bool verify_checksums = true;
-static bool manifest_with_checksums = false;
+static char *manifest_with_checksums = NULL;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -400,7 +400,7 @@ usage(void)
 	printf(_("      --no-verify-checksums\n"
 			 "                         do not verify checksums\n"));
 	printf(_("      --manifest-with-checksums\n"
-			 "                         do calculate checksums for manifest files\n"));
+			 "                         calculate checksums for manifest files using provided algorithm\n"));
 	printf(_("  -?, --help             show this help, then exit\n"));
 	printf(_("\nConnection options:\n"));
 	printf(_("  -d, --dbname=CONNSTR   connection string\n"));
@@ -1824,7 +1824,7 @@ BaseBackup(void)
 	}
 
 	basebkp =
-		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s %s",
+		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s MANIFEST_WITH_CHECKSUMS '%s'",
 				 escaped_label,
 				 showprogress ? "PROGRESS" : "",
 				 includewal == FETCH_WAL ? "WAL" : "",
@@ -1833,7 +1833,7 @@ BaseBackup(void)
 				 maxrate_clause ? maxrate_clause : "",
 				 format == 't' ? "TABLESPACE_MAP" : "",
 				 verify_checksums ? "" : "NOVERIFY_CHECKSUMS",
-				 manifest_with_checksums ? "MANIFEST_WITH_CHECKSUMS" : "");
+				 manifest_with_checksums ? manifest_with_checksums : "NONE");
 
 	if (PQsendQuery(conn, basebkp) == 0)
 	{
@@ -2166,7 +2166,7 @@ main(int argc, char **argv)
 		{"waldir", required_argument, NULL, 1},
 		{"no-slot", no_argument, NULL, 2},
 		{"no-verify-checksums", no_argument, NULL, 3},
-		{"manifest-with-checksums", no_argument, NULL, 4},
+		{"manifest-with-checksums", required_argument, NULL, 'm'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -2194,7 +2194,7 @@ main(int argc, char **argv)
 
 	atexit(cleanup_directories_atexit);
 
-	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
+	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPm:",
 							long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -2335,8 +2335,8 @@ main(int argc, char **argv)
 			case 3:
 				verify_checksums = false;
 				break;
-			case 4:
-				manifest_with_checksums = true;
+			case 'm':
+				manifest_with_checksums = pg_strdup(optarg);
 				break;
 			default:
 

Reply via email to