Hi there,

in the past we've repeatedly discussed the option of using a different compression algorithm (e.g. lz4), but every time the discussion died off because of fear of possible patent issues [1] [2] and many other threads. Have we decided it's not worth the risks, making patches in this area futile?

The reason why I'm asking about this is the multivariate statistics patch - while optimizing the planning overhead, I realized that considerable amount of time is spent decompressing the statistics (serialized as bytea), and using an algorithm with better decompression performance (lz4 comes to mind) would help a lot. The statistics may be a few tens/hundreds kB, and in the planner every millisecond counts.

Would a differentiated approach work? That is, either adding an initdb option allowing the user to choose an alternative compression algorithm (and thus let him consider the possible patent issues), or using different algorithms for different pieces of data (e.g. keep pglz for the user data, and lz4 for statistics).

The first option is quite trivial to implement - I already have an experimental patch implementing that (attached, but a bit dirty). The second option is probably more difficult (we'd have to teach tuple toaster about multiple compression algorithms and pass that information somehow). Also, I'm not sure it'd make the patent concerns go away ...

I'm a bit confused though, because I've noticed various other FOSS projects adopting lz4 over the past few years and I'm yet to find a project voicing the same concerns about patents. So either they're reckless or we're excessively paranoid.

Also, lz4 is not the only compression algorithm available - I've done a bunch of tests with lz4, lz4hc, lzo and snappy, and lzo actually performed better than lz4 (not claiming that's a universal truth). But I suppose that the patent concerns are not somehow specific to lz4 but about the compression in general.


[1] http://www.postgresql.org/message-id/50ea7976.5060...@lab.ntt.co.jp
[2] http://www.postgresql.org/message-id/20130614230142.gc19...@awork2.anarazel.de

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1afad8fcb509fb49c2f1a4336e3f154d46ad3d45 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@pgaddict.com>
Date: Sat, 18 Apr 2015 21:26:20 +0200
Subject: [PATCH] support for additional compression algorithms (next to pglz)

This adds support for LZ4, LZO and snappy algorithms, selectable
at initdb time. After that, the algorithm is fixed.

The libraries are not compiled in by default, and need to be
selected when calling configure by adding

  --with-lz4
  --with-lzo
  --with-snappy

and the selected algorithm needs to be passed to initdb using
the "-C" option (e.g. "-C lz4").

The algorithm 'lz4hc' performs two passes, one using the default
lz4 algorithm, and the second one using the lz4hc variant (which
is slower, but hopefully on much smaller amount of data).

There's also 'none' option, disabling the compression entirely.
This is mostly experiemntal, to see the effect of compression.
---
 configure                               | 252 ++++++++++++++++++++++++++
 configure.in                            |  42 +++++
 src/Makefile.global.in                  |   3 +
 src/backend/access/heap/tuptoaster.c    |  10 +-
 src/backend/access/transam/xlog.c       |  37 ++++
 src/backend/access/transam/xloginsert.c |   8 +-
 src/backend/access/transam/xlogreader.c |   4 +-
 src/backend/bootstrap/bootstrap.c       |   8 +-
 src/backend/utils/misc/guc.c            |  23 +++
 src/bin/initdb/initdb.c                 |  49 ++++-
 src/common/Makefile                     |   4 +-
 src/common/compression.c                | 307 ++++++++++++++++++++++++++++++++
 src/include/access/xlog.h               |   1 +
 src/include/catalog/pg_control.h        |   3 +
 src/include/common/compression.h        |  69 +++++++
 src/include/pg_config.h.in              |   9 +
 16 files changed, 812 insertions(+), 17 deletions(-)
 create mode 100644 src/common/compression.c
 create mode 100644 src/include/common/compression.h

diff --git a/configure b/configure
index 7c0bd0c..010ae00 100755
--- a/configure
+++ b/configure
@@ -703,6 +703,9 @@ LDFLAGS_EX
 ELF_SYS
 EGREP
 GREP
+with_snappy
+with_lzo
+with_lz4
 with_zlib
 with_system_tzdata
 with_libxslt
@@ -839,6 +842,9 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_lz4
+with_lzo
+with_snappy
 with_gnu_ld
 enable_largefile
 enable_float4_byval
@@ -1529,6 +1535,9 @@ Optional Packages:
   --with-system-tzdata=DIR
                           use system time zone data in DIR
   --without-zlib          do not use Zlib
+  --with-lz4              support lz4 compression
+  --with-lzo              support lzo compression
+  --with-snappy           support snappy compression
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
 
 Some influential environment variables:
@@ -5990,6 +5999,93 @@ fi
 
 
 #
+# LZ4
+#
+
+
+
+# Check whether --with-lz4 was given.
+if test "${with_lz4+set}" = set; then :
+  withval=$with_lz4;
+  case $withval in
+    yes)
+      :
+      ;;
+    no)
+      :
+      ;;
+    *)
+      as_fn_error $? "no argument expected for --with-lz4 option" "$LINENO" 5
+      ;;
+  esac
+
+else
+  with_lz4=no
+
+fi
+
+
+
+
+#
+# LZO
+#
+
+
+
+# Check whether --with-lzo was given.
+if test "${with_lzo+set}" = set; then :
+  withval=$with_lzo;
+  case $withval in
+    yes)
+      :
+      ;;
+    no)
+      :
+      ;;
+    *)
+      as_fn_error $? "no argument expected for --with-lzo option" "$LINENO" 5
+      ;;
+  esac
+
+else
+  with_lzo=no
+
+fi
+
+
+
+
+#
+# Snappy
+#
+
+
+
+# Check whether --with-snappy was given.
+if test "${with_snappy+set}" = set; then :
+  withval=$with_snappy;
+  case $withval in
+    yes)
+      :
+      ;;
+    no)
+      :
+      ;;
+    *)
+      as_fn_error $? "no argument expected for --with-snappy option" "$LINENO" 5
+      ;;
+  esac
+
+else
+  with_snappy=no
+
+fi
+
+
+
+
+#
 # Elf
 #
 
@@ -8451,6 +8547,162 @@ fi
 
 fi
 
+if test "$with_lz4" = yes; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for LZ4_compress in -llz4" >&5
+$as_echo_n "checking for LZ4_compress in -llz4... " >&6; }
+if ${ac_cv_lib_lz4_LZ4_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-llz4  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char LZ4_compress ();
+int
+main ()
+{
+return LZ4_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_lz4_LZ4_compress=yes
+else
+  ac_cv_lib_lz4_LZ4_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_lz4_LZ4_compress" >&5
+$as_echo "$ac_cv_lib_lz4_LZ4_compress" >&6; }
+if test "x$ac_cv_lib_lz4_LZ4_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBLZ4 1
+_ACEOF
+
+  LIBS="-llz4 $LIBS"
+
+else
+  as_fn_error $? "lz4 library not found
+If you have lz4 already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory." "$LINENO" 5
+fi
+
+fi
+
+if test "$with_lzo" = yes; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for lzo1x_1_compress in -llzo2" >&5
+$as_echo_n "checking for lzo1x_1_compress in -llzo2... " >&6; }
+if ${ac_cv_lib_lzo2_lzo1x_1_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-llzo2  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char lzo1x_1_compress ();
+int
+main ()
+{
+return lzo1x_1_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_lzo2_lzo1x_1_compress=yes
+else
+  ac_cv_lib_lzo2_lzo1x_1_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_lzo2_lzo1x_1_compress" >&5
+$as_echo "$ac_cv_lib_lzo2_lzo1x_1_compress" >&6; }
+if test "x$ac_cv_lib_lzo2_lzo1x_1_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBLZO2 1
+_ACEOF
+
+  LIBS="-llzo2 $LIBS"
+
+else
+  as_fn_error $? "lzo library not found
+If you have lzo already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory." "$LINENO" 5
+fi
+
+fi
+
+if test "$with_snappy" = yes; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for snappy_compress in -lsnappy" >&5
+$as_echo_n "checking for snappy_compress in -lsnappy... " >&6; }
+if ${ac_cv_lib_snappy_snappy_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lsnappy  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char snappy_compress ();
+int
+main ()
+{
+return snappy_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_snappy_snappy_compress=yes
+else
+  ac_cv_lib_snappy_snappy_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_snappy_snappy_compress" >&5
+$as_echo "$ac_cv_lib_snappy_snappy_compress" >&6; }
+if test "x$ac_cv_lib_snappy_snappy_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBSNAPPY 1
+_ACEOF
+
+  LIBS="-lsnappy $LIBS"
+
+else
+  as_fn_error $? "snappy library not found
+If you have snappy already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory." "$LINENO" 5
+fi
+
+fi
+
 if test "$enable_spinlocks" = yes; then
 
 $as_echo "#define HAVE_SPINLOCKS 1" >>confdefs.h
diff --git a/configure.in b/configure.in
index 1cd9e1e..aa610d6 100644
--- a/configure.in
+++ b/configure.in
@@ -804,6 +804,27 @@ PGAC_ARG_BOOL(with, zlib, yes,
 AC_SUBST(with_zlib)
 
 #
+# LZ4
+#
+PGAC_ARG_BOOL(with, lz4, no,
+              [support lz4 compression])
+AC_SUBST(with_lz4)
+
+#
+# LZO
+#
+PGAC_ARG_BOOL(with, lzo, no,
+              [support lzo compression])
+AC_SUBST(with_lzo)
+
+#
+# Snappy
+#
+PGAC_ARG_BOOL(with, snappy, no,
+              [support snappy compression])
+AC_SUBST(with_snappy)
+
+#
 # Elf
 #
 
@@ -958,6 +979,27 @@ failure.  It is possible the compiler isn't looking in the proper directory.
 Use --without-zlib to disable zlib support.])])
 fi
 
+if test "$with_lz4" = yes; then
+  AC_CHECK_LIB(lz4, LZ4_compress, [],
+               [AC_MSG_ERROR([lz4 library not found
+If you have lz4 already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.])])
+fi
+
+if test "$with_lzo" = yes; then
+  AC_CHECK_LIB(lzo2, lzo1x_1_compress, [],
+               [AC_MSG_ERROR([lzo library not found
+If you have lzo already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.])])
+fi
+
+if test "$with_snappy" = yes; then
+  AC_CHECK_LIB(snappy, snappy_compress, [],
+               [AC_MSG_ERROR([snappy library not found
+If you have snappy already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.])])
+fi
+
 if test "$enable_spinlocks" = yes; then
   AC_DEFINE(HAVE_SPINLOCKS, 1, [Define to 1 if you have spinlocks.])
 else
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 4b06fc2..8f7badd 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -172,6 +172,9 @@ with_libxslt	= @with_libxslt@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
+with_lz4	= @with_lz4@
+with_lzo	= @with_lzo@
+with_snappy	= @with_snappy@
 enable_rpath	= @enable_rpath@
 enable_nls	= @enable_nls@
 enable_debug	= @enable_debug@
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 8464e87..2144bf7 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -35,7 +35,7 @@
 #include "access/tuptoaster.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
-#include "common/pg_lzcompress.h"
+#include "common/compression.h"
 #include "miscadmin.h"
 #include "utils/fmgroids.h"
 #include "utils/rel.h"
@@ -1273,11 +1273,11 @@ toast_compress_datum(Datum value)
 		valsize > PGLZ_strategy_default->max_input_size)
 		return PointerGetDatum(NULL);
 
-	tmp = (struct varlena *) palloc(PGLZ_MAX_OUTPUT(valsize) +
+	tmp = (struct varlena *) palloc(COMPRESSION_MAX_OUTPUT(valsize) +
 									TOAST_COMPRESS_HDRSZ);
 
 	/*
-	 * We recheck the actual size even if pglz_compress() reports success,
+	 * We recheck the actual size even if pg_compress() reports success,
 	 * because it might be satisfied with having saved as little as one byte
 	 * in the compressed data --- which could turn into a net loss once you
 	 * consider header and alignment padding.  Worst case, the compressed
@@ -1286,7 +1286,7 @@ toast_compress_datum(Datum value)
 	 * only one header byte and no padding if the value is short enough.  So
 	 * we insist on a savings of more than 2 bytes to ensure we have a gain.
 	 */
-	len = pglz_compress(VARDATA_ANY(DatumGetPointer(value)),
+	len = pg_compress(VARDATA_ANY(DatumGetPointer(value)),
 						valsize,
 						TOAST_COMPRESS_RAWDATA(tmp),
 						PGLZ_strategy_default);
@@ -2158,7 +2158,7 @@ toast_decompress_datum(struct varlena * attr)
 		palloc(TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
 	SET_VARSIZE(result, TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
 
-	if (pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
+	if (pg_decompress(TOAST_COMPRESS_RAWDATA(attr),
 						VARSIZE(attr) - TOAST_COMPRESS_HDRSZ,
 						VARDATA(result),
 						TOAST_COMPRESS_RAWSIZE(attr)) < 0)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2580996..5e72632 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "catalog/catversion.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
+#include "common/compression.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
@@ -70,6 +71,7 @@
 #include "pg_trace.h"
 
 extern uint32 bootstrap_data_checksum_version;
+extern uint32 bootstrap_compression_algorithm;
 
 /* File path names (all relative to $PGDATA) */
 #define RECOVERY_COMMAND_FILE	"recovery.conf"
@@ -4368,6 +4370,9 @@ ReadControlFile(void)
 	/* Make the initdb settings visible as GUC variables, too */
 	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
 					PGC_INTERNAL, PGC_S_OVERRIDE);
+
+	SetConfigOption("compression_algorithm", CompressionAlgorithmSelected(),
+					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4433,6 +4438,37 @@ DataChecksumsEnabled(void)
 }
 
 /*
+ * What compression algorithm was selected for this cluster?
+ */
+char *
+CompressionAlgorithmSelected(void)
+{
+	Assert(ControlFile != NULL);
+
+	switch (ControlFile->compression_algorithm)
+	{
+		case COMPRESSION_PGLZ:
+			return "pglz";
+		case COMPRESSION_LZ4:
+			return "lz4";
+		case COMPRESSION_LZ4HC:
+			return "lz4hc";
+		case COMPRESSION_LZO:
+			return "lzo";
+		case COMPRESSION_SNAPPY:
+			return "snappy";
+		case COMPRESSION_NONE:
+			return "none";
+		default:
+			elog(WARNING, "unknown compression algorithm (%d)",
+						  ControlFile->compression_algorithm);
+	}
+
+	/* default */
+	return "pglz";
+}
+
+/*
  * Returns a fake LSN for unlogged relations.
  *
  * Each call generates an LSN that is greater than any previous value
@@ -4821,6 +4857,7 @@ BootStrapXLOG(void)
 	ControlFile->wal_log_hints = wal_log_hints;
 	ControlFile->track_commit_timestamp = track_commit_timestamp;
 	ControlFile->data_checksum_version = bootstrap_data_checksum_version;
+	ControlFile->compression_algorithm = bootstrap_compression_algorithm;
 
 	/* some additional ControlFile fields are set in WriteControlFile() */
 
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 618f879..f9c9d88 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -24,7 +24,7 @@
 #include "access/xlog_internal.h"
 #include "access/xloginsert.h"
 #include "catalog/pg_control.h"
-#include "common/pg_lzcompress.h"
+#include "common/compression.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/proc.h"
@@ -32,7 +32,7 @@
 #include "pg_trace.h"
 
 /* Buffer size required to store a compressed version of backup block image */
-#define PGLZ_MAX_BLCKSZ	PGLZ_MAX_OUTPUT(BLCKSZ)
+#define PGLZ_MAX_BLCKSZ	COMPRESSION_MAX_OUTPUT(BLCKSZ)
 
 /*
  * For each block reference registered with XLogRegisterBuffer, we fill in
@@ -765,12 +765,12 @@ XLogCompressBackupBlock(char * page, uint16 hole_offset, uint16 hole_length,
 		source = page;
 
 	/*
-	 * We recheck the actual size even if pglz_compress() reports success
+	 * We recheck the actual size even if pg_compress() reports success
 	 * and see if the number of bytes saved by compression is larger than
 	 * the length of extra data needed for the compressed version of block
 	 * image.
 	 */
-	len = pglz_compress(source, orig_len, dest, PGLZ_strategy_default);
+	len = pg_compress(source, orig_len, dest, PGLZ_strategy_default);
 	if (len >= 0 &&
 		len + extra_bytes < orig_len)
 	{
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 77be1b8..8cc5e73 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -20,7 +20,7 @@
 #include "access/xlog_internal.h"
 #include "access/xlogreader.h"
 #include "catalog/pg_control.h"
-#include "common/pg_lzcompress.h"
+#include "common/compression.h"
 
 static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
 
@@ -1302,7 +1302,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 	if (bkpb->bimg_info & BKPIMAGE_IS_COMPRESSED)
 	{
 		/* If a backup block image is compressed, decompress it */
-		if (pglz_decompress(ptr, bkpb->bimg_len, tmp,
+		if (pg_decompress(ptr, bkpb->bimg_len, tmp,
 							BLCKSZ - bkpb->hole_length) < 0)
 		{
 			report_invalid_record(record, "invalid compressed image at %X/%X, block %d",
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ad49964..66e5000 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -22,6 +22,7 @@
 #include "catalog/index.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "common/compression.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -44,7 +45,7 @@
 #include "utils/tqual.h"
 
 uint32		bootstrap_data_checksum_version = 0;		/* No checksum */
-
+uint32		bootstrap_compression_algorithm = COMPRESSION_LZ4;		/* PGLZ */
 
 #define ALLOC(t, c)		((t *) calloc((unsigned)(c), sizeof(t)))
 
@@ -214,7 +215,7 @@ AuxiliaryProcessMain(int argc, char *argv[])
 	/* If no -x argument, we are a CheckerProcess */
 	MyAuxProcType = CheckerProcess;
 
-	while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:x:-:")) != -1)
+	while ((flag = getopt(argc, argv, "B:c:C:d:D:Fkr:x:-:")) != -1)
 	{
 		switch (flag)
 		{
@@ -243,6 +244,9 @@ AuxiliaryProcessMain(int argc, char *argv[])
 			case 'k':
 				bootstrap_data_checksum_version = PG_DATA_CHECKSUM_VERSION;
 				break;
+			case 'C':
+				bootstrap_compression_algorithm = atoi(optarg);
+				break;
 			case 'r':
 				strlcpy(OutputFileName, optarg, MAXPGPATH);
 				break;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f43aff2..85517a8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -37,6 +37,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "commands/trigger.h"
+#include "common/compression.h"
 #include "funcapi.h"
 #include "libpq/auth.h"
 #include "libpq/be-fsstubs.h"
@@ -393,6 +394,18 @@ static const struct config_enum_entry row_security_options[] = {
 };
 
 /*
+ */
+static const struct config_enum_entry compression_options[] = {
+	{"pglz", COMPRESSION_PGLZ, false},
+	{"lz4", COMPRESSION_LZ4, false},
+	{"lz4hc", COMPRESSION_LZ4HC, false},
+	{"lzo", COMPRESSION_LZO, false},
+	{"snappy", COMPRESSION_SNAPPY, false},
+	{"none", COMPRESSION_NONE, false},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -3648,6 +3661,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"compression_algorithm", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Compression algorithm."),
+			gettext_noop("Determines algorithm used for compression.")
+		},
+		&compression_algorithm,
+		COMPRESSION_PGLZ, compression_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 8694920..5de0813 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -63,6 +63,7 @@
 #include "catalog/catalog.h"
 #include "common/restricted_token.h"
 #include "common/username.h"
+#include "common/compression.h"
 #include "mb/pg_wchar.h"
 #include "getaddrinfo.h"
 #include "getopt_long.h"
@@ -124,6 +125,7 @@ static bool do_sync = true;
 static bool sync_only = false;
 static bool show_setting = false;
 static bool data_checksums = false;
+static char *compression = "";
 static char *xlog_dir = "";
 
 
@@ -264,6 +266,7 @@ void		setup_data_file_paths(void);
 void		setup_locale_encoding(void);
 void		setup_signals(void);
 void		setup_text_search(void);
+void		setup_compression(void);
 void		create_data_directory(void);
 void		create_xlog_symlink(void);
 void		warn_on_mount_point(int error);
@@ -1552,9 +1555,10 @@ bootstrap_template1(void)
 	unsetenv("PGCLIENTENCODING");
 
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s\" --boot -x1 %s %s %s",
+			 "\"%s\" --boot -x1 %s %s %d %s %s",
 			 backend_exec,
 			 data_checksums ? "-k" : "",
+			 "-C", compression_algorithm,
 			 boot_options, talkargs);
 
 	PG_CMD_OPEN;
@@ -3101,6 +3105,41 @@ setup_text_search(void)
 
 
 void
+setup_compression(void)
+{
+	compression_algorithm = COMPRESSION_PGLZ;
+	if (strlen(compression) != 0)
+	{
+		if (strcmp(compression, "pglz") == 0)
+			compression_algorithm = COMPRESSION_PGLZ;
+		else if (strcmp(compression, "none") == 0)
+			compression_algorithm = COMPRESSION_NONE;
+#if HAVE_LIBLZ4
+		else if (strcmp(compression, "lz4") == 0)
+			compression_algorithm = COMPRESSION_LZ4;
+		else if (strcmp(compression, "lz4hc") == 0)
+			compression_algorithm = COMPRESSION_LZ4HC;
+#endif
+#if HAVE_LIBLZO2
+		else if (strcmp(compression, "lzo") == 0)
+			compression_algorithm = COMPRESSION_LZO;
+#endif
+#if HAVE_LIBSNAPPY
+		else if (strcmp(compression, "snappy") == 0)
+			compression_algorithm = COMPRESSION_SNAPPY;
+#endif
+		else
+			printf(_("%s: warning: unknown compression algorithm \"%s\"\n"),
+				   progname, compression);
+	}
+
+	printf(_("The cluster will use compression algorithm \"%s\".\n"),
+		   compression);
+
+}
+
+
+void
 setup_signals(void)
 {
 	/* some of these are not valid on Windows */
@@ -3413,6 +3452,7 @@ main(int argc, char *argv[])
 		{"sync-only", no_argument, NULL, 'S'},
 		{"xlogdir", required_argument, NULL, 'X'},
 		{"data-checksums", no_argument, NULL, 'k'},
+		{"compression", required_argument, NULL, 'c'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3453,7 +3493,7 @@ main(int argc, char *argv[])
 
 	/* process command-line options */
 
-	while ((c = getopt_long(argc, argv, "dD:E:kL:nNU:WA:sST:X:", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "dD:E:kL:nNU:WA:sST:X:C:", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -3544,6 +3584,9 @@ main(int argc, char *argv[])
 			case 'X':
 				xlog_dir = pg_strdup(optarg);
 				break;
+			case 'C':
+				compression = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -3617,6 +3660,8 @@ main(int argc, char *argv[])
 
 	setup_text_search();
 
+	setup_compression();
+
 	printf("\n");
 
 	if (data_checksums)
diff --git a/src/common/Makefile b/src/common/Makefile
index c47445e..313f28d 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -23,8 +23,8 @@ include $(top_builddir)/src/Makefile.global
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = exec.o pg_lzcompress.o pgfnames.o psprintf.o relpath.o \
-	rmtree.o string.o username.o wait_error.o
+OBJS_COMMON = compression.o exec.o pg_lzcompress.o pgfnames.o psprintf.o \
+	relpath.o rmtree.o string.o username.o wait_error.o
 
 OBJS_FRONTEND = $(OBJS_COMMON) fe_memutils.o restricted_token.o
 
diff --git a/src/common/compression.c b/src/common/compression.c
new file mode 100644
index 0000000..94f0dec
--- /dev/null
+++ b/src/common/compression.c
@@ -0,0 +1,307 @@
+#include "postgres.h"
+#include "common/compression.h"
+
+#if HAVE_LIBLZ4
+#include "lz4.h"
+#include "lz4hc.h"
+#endif
+
+#if HAVE_LIBLZO2
+#include <lzo/lzoconf.h>
+#include <lzo/lzo1x.h>
+#endif
+
+#if HAVE_LIBSNAPPY
+#include "snappy-c.h"
+#endif
+
+/* by default use the original pglz compression algorithm */
+int compression_algorithm = COMPRESSION_PGLZ;
+
+
+#if HAVE_LIBLZ4
+static int32
+lz4_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy);
+static int32
+lz4_decompress(const char *source, int32 slen, char *dest,
+				int32 rawsize);
+static int32
+lz4hc_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy);
+static int32
+lz4hc_decompress(const char *source, int32 slen, char *dest,
+				int32 rawsize);
+#endif
+
+#if HAVE_LIBLZO2
+static int32
+lzo_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy);
+static int32
+lzo_decompress(const char *source, int32 slen, char *dest,
+				int32 rawsize);
+#endif
+
+#if HAVE_LIBSNAPPY
+static int32
+snappy_compress_pg(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy);
+static int32
+snappy_decompress_pg(const char *source, int32 slen, char *dest,
+				int32 rawsize);
+#endif
+
+int32
+pg_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy)
+{
+	/*
+	 * Our fallback strategy is the default.
+	 */
+	if (strategy == NULL)
+		strategy = PGLZ_strategy_default;
+
+	/*
+	 * Logic shared by all the algorithms.
+	 */
+	if (strategy->match_size_good <= 0 ||
+		slen < strategy->min_input_size ||
+		slen > strategy->max_input_size)
+		return -1;
+
+	if (compression_algorithm == COMPRESSION_PGLZ)
+		return pglz_compress(source, slen, dest, strategy);
+
+	else if (compression_algorithm == COMPRESSION_NONE)
+		/* return (-1) which means 'incompressible' */
+		return -1;
+
+#if HAVE_LIBLZ4
+	else if (compression_algorithm == COMPRESSION_LZ4)
+		return lz4_compress(source, slen, dest, strategy);
+
+	else if (compression_algorithm == COMPRESSION_LZ4HC)
+		return lz4hc_compress(source, slen, dest, strategy);
+#endif
+
+#if HAVE_LIBLZO2
+	else if (compression_algorithm == COMPRESSION_LZO)
+		return lzo_compress(source, slen, dest, strategy);
+#endif
+
+#if HAVE_LIBSNAPPY
+	else if (compression_algorithm == COMPRESSION_SNAPPY)
+		return snappy_compress_pg(source, slen, dest, strategy);
+#endif
+
+	return -1;
+}
+
+int32
+pg_decompress(const char *source, int32 slen, char *dest,
+				int32 rawsize)
+{
+	if (compression_algorithm == COMPRESSION_PGLZ)
+		return pglz_decompress(source, slen, dest, rawsize);
+
+	else if (compression_algorithm == COMPRESSION_NONE)
+		return -1;
+
+#if HAVE_LIBLZ4
+	else if (compression_algorithm == COMPRESSION_LZ4)
+		return lz4_decompress(source, slen, dest, rawsize);
+
+	else if (compression_algorithm == COMPRESSION_LZ4HC)
+		return lz4hc_decompress(source, slen, dest, rawsize);
+#endif
+
+#if HAVE_LIBLZO2
+	else if (compression_algorithm == COMPRESSION_LZO)
+		return lzo_decompress(source, slen, dest, rawsize);
+#endif
+
+#if HAVE_LIBSNAPPY
+	else if (compression_algorithm == COMPRESSION_SNAPPY)
+		return snappy_decompress_pg(source, slen, dest, rawsize);
+#endif
+
+	return -1;
+}
+
+#if HAVE_LIBLZ4
+static int32
+lz4_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy)
+{
+	int32 ret;
+	int32 result_max;
+	int32 need_rate = strategy->min_comp_rate;
+
+	if (need_rate < 0)
+		need_rate = 0;
+	else if (need_rate > 99)
+		need_rate = 99;
+
+	result_max = (slen / 100) * (100 - need_rate);
+
+	ret = LZ4_compress_limitedOutput(source, dest, slen, result_max);
+
+	/* LZ4 uses 0 to signal error, we expect -1 in that case. */
+	if (ret == 0)
+		return -1;
+
+	return ret;
+}
+
+static int32
+lz4_decompress(const char *source, int32 slen, char *dest,
+				int32 rawsize)
+{
+	int32 ret = LZ4_decompress_safe (source, dest, slen, rawsize);
+
+	if (ret < 0)
+		return -1;
+
+	return ret;
+}
+
+/* first a pass of regular LZ4, then LZ4HC (mostly just an experiment) */
+
+static int32
+lz4hc_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy)
+{
+	int32 ret;
+	int32 result_max;
+	int32 need_rate = strategy->min_comp_rate;
+
+	char * tmp = palloc(slen);
+
+	if (need_rate < 0)
+		need_rate = 0;
+	else if (need_rate > 99)
+		need_rate = 99;
+
+	result_max = (slen / 100) * (100 - need_rate);
+
+	ret = LZ4_compress_limitedOutput(source, tmp, slen, result_max);
+
+	/* LZ4 uses 0 to signal error, we expect -1 in that case. */
+	if (ret == 0)
+	{
+		pfree(tmp);
+		return -1;
+	}
+
+	/*
+	 * this has the unfortunate consequence that if the first pass
+	 * is very efficient, but the second one does not reduce the size
+	 * sufficiently (and just gets us over result_max) we don't
+	 * compress the data at all
+	 */
+
+	ret = LZ4_compressHC_limitedOutput(tmp, dest, ret, result_max);
+
+	if (ret == 0)
+		ret = -1;
+
+	pfree(tmp);
+
+	return ret;
+}
+
+static int32
+lz4hc_decompress(const char *source, int32 slen, char *dest,
+				int32 rawsize)
+{
+	int32 ret;
+	char * tmp = palloc(rawsize);
+
+	ret = LZ4_decompress_safe (source, tmp, slen, rawsize);
+
+	if (ret < 0)
+	{
+		pfree(tmp);
+		return -1;
+	}
+
+	ret = LZ4_decompress_safe (tmp, dest, ret, rawsize);
+
+	if (ret < 0)
+		ret = -1;
+
+	pfree(tmp);
+	return ret;
+}
+#endif
+
+
+#if HAVE_LIBLZO2
+static int32
+lzo_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy)
+{
+	int r;
+	lzo_voidp wrkmem;
+	lzo_uint destlen;
+
+	if (lzo_init() != LZO_E_OK)
+		return -1;
+
+	/* LZO work memory */
+	wrkmem = (lzo_voidp) palloc(LZO1X_1_MEM_COMPRESS);
+
+	r = lzo1x_1_compress((lzo_bytep)source, (lzo_uint)slen,
+						 (lzo_bytep)dest, &destlen, wrkmem);
+	pfree(wrkmem);
+
+	if (r != LZO_E_OK)
+		return -1;
+
+	return destlen;
+}
+
+static int32
+lzo_decompress(const char *source, int32 slen, char *dest,
+				int32 rawsize)
+{
+	int r;
+	lzo_uint destlen;
+
+	r = lzo1x_decompress((lzo_bytep)source, (lzo_uint)slen,
+						 (lzo_bytep)dest, &destlen, NULL);
+
+	if (r != LZO_E_OK)
+		return -1;
+
+	return destlen;
+}
+#endif
+
+#if HAVE_LIBSNAPPY
+static int32
+snappy_compress_pg(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy)
+{
+	size_t dlen;
+
+	if (snappy_compress(source, slen, dest, &dlen) != 0)
+		return -1;
+
+	/* XXX size_t is >= int32 */
+	return dlen;
+}
+
+static int32
+snappy_decompress_pg(const char *source, int32 slen, char *dest,
+				int32 rawsize)
+{
+	size_t destlen;
+
+	if (snappy_uncompress(source, slen, dest, &destlen) != 0)
+		return -1;
+
+	return destlen;
+}
+#endif
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 2b1f423..5fa9d95 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -228,6 +228,7 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern bool DataChecksumsEnabled(void);
+extern char *CompressionAlgorithmSelected(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 2e4c381..3370832 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -223,6 +223,9 @@ typedef struct ControlFileData
 	/* Are data pages protected by checksums? Zero if no checksum version */
 	uint32		data_checksum_version;
 
+	/* What compression algorithm was selected? */
+	uint32		compression_algorithm;
+
 	/* CRC of all above ... MUST BE LAST! */
 	pg_crc32c	crc;
 } ControlFileData;
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
new file mode 100644
index 0000000..ae75417
--- /dev/null
+++ b/src/include/common/compression.h
@@ -0,0 +1,69 @@
+/* ----------
+ * pg_lzcompress.h -
+ *
+ *	Definitions for the builtin LZ compressor
+ *
+ * src/include/common/pg_lzcompress.h
+ * ----------
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/pg_lzcompress.h"
+
+#ifndef _PG_COMPRESSION_H_
+#define _PG_COMPRESSION_H_
+
+#define COMPRESSION_PGLZ	1
+#define COMPRESSION_LZ4		2
+#define COMPRESSION_LZ4HC	3
+#define COMPRESSION_LZO		4
+#define COMPRESSION_SNAPPY	5
+#define COMPRESSION_NONE	6
+
+extern int compression_algorithm;
+
+extern int32 pg_compress(const char *source, int32 slen, char *dest,
+			  const PGLZ_Strategy *strategy);
+
+extern int32 pg_decompress(const char *source, int32 slen, char *dest,
+			  int32 rawsize);
+
+/*
+ * Get maximum limit for all the (supported) algorithms (need provably
+ * static value, because of xloginsert.c).
+ */
+#define COMPRESSION_MAX_OUTPUT(_dlen) \
+	MAX(MAX(PGLZ_MAX_OUTPUT(_dlen), \
+	        LZ4_MAX_COMPRESSED_SIZE(_dlen)), \
+	MAX(LZO2_MAX_COMPRESSED_SIZE(_dlen), \
+	    SNAPPY_MAX_COMPRESSED_SIZE(_dlen)))
+
+#if HAVE_LIBLZ4
+#define LZ4_MAX_COMPRESSED_SIZE(_dlen) \
+	((unsigned int)(_dlen) > (unsigned int)0x7E000000 ? 0 : (_dlen) + ((_dlen)/255) + 16) /* LZ4_compressBound(_dlen) */
+#else
+#define LZ4_MAX_COMPRESSED_SIZE(_dlen)		(0)
+#endif
+
+#if HAVE_LIBLZO2
+#define LZO2_MAX_COMPRESSED_SIZE(_dlen) \
+	(_dlen + _dlen / 16 + 64 + 3)
+#else
+#define LZO2_MAX_COMPRESSED_SIZE(_dlen)		(0)
+#endif
+
+#if HAVE_LIBSNAPPY
+#define SNAPPY_MAX_COMPRESSED_SIZE(_dlen) \
+	(32 + _dlen + _dlen / 6)	/* snappy_max_compressed_length(_dlen) */
+#else
+#define SNAPPY_MAX_COMPRESSED_SIZE(_dlen)	(0)
+#endif
+
+#define MAX(a,b) ((a>b) ? (a) : (b))
+
+#endif   /* _PG_COMPRESSION_H_ */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 5688f75..d5abdfa 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -294,6 +294,12 @@
 /* Define to 1 if you have the `ldap_r' library (-lldap_r). */
 #undef HAVE_LIBLDAP_R
 
+/* Define to 1 if you have the `lz4' library (-llz4). */
+#undef HAVE_LIBLZ4
+
+/* Define to 1 if you have the `lzo2' library (-llzo2). */
+#undef HAVE_LIBLZO2
+
 /* Define to 1 if you have the `m' library (-lm). */
 #undef HAVE_LIBM
 
@@ -306,6 +312,9 @@
 /* Define to 1 if you have the `selinux' library (-lselinux). */
 #undef HAVE_LIBSELINUX
 
+/* Define to 1 if you have the `snappy' library (-lsnappy). */
+#undef HAVE_LIBSNAPPY
+
 /* Define to 1 if you have the `ssl' library (-lssl). */
 #undef HAVE_LIBSSL
 
-- 
1.9.3

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to