On 25.09.2012 18:27, Amit Kapila wrote:
If you feel it is must to do the comparison, we can do it in same way as we
identify for HOT?

Yeah. (But as discussed, I think it would be even better to just treat the old and new tuple as an opaque chunk of bytes, and run them through a generic delta algorithm).

   Can you please explain me why you think that after doing encoding doing LZ
compression on it is better, as already we have reduced the amount of WAL
for update by only storing changed column information?

a. is it to further reduce the size of WAL
b. storing diff WAL in some standard format
c. or does it give any other kind of benefit

Potentially all of those. I don't know if it'd be better or worse, but my gut feeling is that it would be simpler, and produce even more compact WAL.

Attached is a simple patch to apply LZ compression to update WAL records. I modified the LZ compressor so that it can optionally use a separate "history" data, and the same history data must then be passed to the decompression function. That makes it work as a pretty efficient delta encoder, when you use the old tuple as the history data.

I ran some performance tests with the modified version of pgbench that you posted earlier:

Current PostgreSQL master
-------------------------

tps = 941.601924 (excluding connections establishing)
 pg_xlog_location_diff
-----------------------
             721227944

pglz_wal_update_records.patch
-----------------------------

tps = 1039.792527 (excluding connections establishing)
 pg_xlog_location_diff
-----------------------
             419395208

pglz_wal_update_records.patch, COMPRESS_ONLY
--------------------------------------------

tps = 1009.682002 (excluding connections establishing)
 pg_xlog_location_diff
-----------------------
             422505104


Amit's wal_update_changes_hot_update.patch
------------------------------------------

tps = 1092.703883 (excluding connections establishing)
 pg_xlog_location_diff
-----------------------
             436031544


The COMPRESS_ONLY result is with the attached patch, but it just uses LZ to compress the new tuple, without taking advantage of the old tuple. The pg_xlog_location_diff value is the amount of WAL generated during the pgbench run. Attached is also the shell script I used to run these tests.

The conclusion is that there isn't very much difference among the patches. They all squeeze the WAL to about the same size, and the increase in TPS is roughly the same.

I think more performance testing is required. The modified pgbench test isn't necessarily very representative of a real-life application. The gain (or loss) of this patch is going to depend a lot on how many columns are updated, and in what ways. Need to test more scenarios, with many different database schemas.

The LZ approach has the advantage that it can take advantage of all kinds of similarities between old and new tuple. For example, if you swap the values of two columns, LZ will encode that efficiently. Or if you insert a character in the middle of a long string. On the flipside, it's probably more expensive. Then again, you have to do a memcmp() to detect which columns have changed with your approach, and that's not free either. That was not yet included in the patch version I tested. Another consideration is that when you compress the record more, you have less data to calculate CRC for. CRC calculation tends to be quite expensive, so even quite aggressive compression might be a win. Yet another consideration is that the compression/encoding is done while holding a lock on the buffer. For the sake of concurrency, you want to keep the duration the lock is held as short as possible.

- Heikki
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5a4591e..56b53a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -70,6 +70,7 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
+#include "utils/pg_lzcompress.h"
 
 
 /* GUC variable */
@@ -85,6 +86,7 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 				ItemPointerData from, Buffer newbuf, HeapTuple newtup,
+				HeapTuple oldtup,
 				bool all_visible_cleared, bool new_all_visible_cleared);
 static bool HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs,
 					   HeapTuple oldtup, HeapTuple newtup);
@@ -3195,10 +3197,12 @@ l2:
 	/* XLOG stuff */
 	if (RelationNeedsWAL(relation))
 	{
-		XLogRecPtr	recptr = log_heap_update(relation, buffer, oldtup.t_self,
-											 newbuf, heaptup,
-											 all_visible_cleared,
-											 all_visible_cleared_new);
+		XLogRecPtr	recptr;
+
+		recptr = log_heap_update(relation, buffer, oldtup.t_self,
+								 newbuf, heaptup, &oldtup,
+								 all_visible_cleared,
+								 all_visible_cleared_new);
 
 		if (newbuf != buffer)
 		{
@@ -4428,7 +4432,7 @@ log_heap_visible(RelFileNode rnode, BlockNumber block, Buffer vm_buffer,
  */
 static XLogRecPtr
 log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
-				Buffer newbuf, HeapTuple newtup,
+				Buffer newbuf, HeapTuple newtup, HeapTuple oldtup,
 				bool all_visible_cleared, bool new_all_visible_cleared)
 {
 	xl_heap_update xlrec;
@@ -4437,6 +4441,16 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
 	XLogRecPtr	recptr;
 	XLogRecData rdata[4];
 	Page		page = BufferGetPage(newbuf);
+	union
+	{
+		PGLZ_Header pglzheader;
+		char buf[BLCKSZ];
+	} buf;
+	char	   *newtupdata;
+	int			newtuplen;
+	char	   *oldtupdata;
+	int			oldtuplen;
+	bool		compressed = false;
 
 	/* Caller should not call me on a non-WAL-logged relation */
 	Assert(RelationNeedsWAL(reln));
@@ -4446,11 +4460,43 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
 	else
 		info = XLOG_HEAP_UPDATE;
 
+	newtupdata = ((char *) newtup->t_data) + offsetof(HeapTupleHeaderData, t_bits);
+	newtuplen = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
+	oldtupdata = ((char *) oldtup->t_data) + offsetof(HeapTupleHeaderData, t_bits);
+	oldtuplen = oldtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
+
+	if (oldbuf == newbuf && oldtup)
+	{
+		/*
+		 * enable this if you only want to compress the new tuple as is,
+		 * without taking advantage of the old tuple.
+		 */
+#ifdef COMPRESS_ONLY
+		oldtuplen = 0;
+#endif
+
+		/* Delta-encode the new tuple using the old tuple */
+		/* XXX: assert that the output buffer is large enough (PGLZ_MAX_OUTPUT) */
+		if (pglz_compress_with_history(newtupdata, newtuplen,
+									   oldtupdata, oldtuplen,
+									   (PGLZ_Header *) &buf.pglzheader, NULL))
+		{
+			compressed = true;
+			newtupdata = (char *) &buf.pglzheader;
+			newtuplen = VARSIZE(&buf.pglzheader);
+		}
+	}
+
+	xlrec.flags = 0;
 	xlrec.target.node = reln->rd_node;
 	xlrec.target.tid = from;
-	xlrec.all_visible_cleared = all_visible_cleared;
+	if (all_visible_cleared)
+		xlrec.flags |= XL_HEAP_UPDATE_ALL_VISIBLE_CLEARED;
 	xlrec.newtid = newtup->t_self;
-	xlrec.new_all_visible_cleared = new_all_visible_cleared;
+	if (new_all_visible_cleared)
+		xlrec.flags |= XL_HEAP_UPDATE_NEW_ALL_VISIBLE_CLEARED;
+	if (compressed)
+		xlrec.flags |= XL_HEAP_UPDATE_DELTA_ENCODED;
 
 	rdata[0].data = (char *) &xlrec;
 	rdata[0].len = SizeOfHeapUpdate;
@@ -4478,12 +4524,13 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
 	rdata[2].next = &(rdata[3]);
 
 	/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
-	rdata[3].data = (char *) newtup->t_data + offsetof(HeapTupleHeaderData, t_bits);
-	rdata[3].len = newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
+	rdata[3].data = newtupdata;
+	rdata[3].len = newtuplen;
 	rdata[3].buffer = newbuf;
 	rdata[3].buffer_std = true;
 	rdata[3].next = NULL;
 
+
 	/* If new tuple is the single and first tuple on page... */
 	if (ItemPointerGetOffsetNumber(&(newtup->t_self)) == FirstOffsetNumber &&
 		PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
@@ -5232,6 +5279,8 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
 	OffsetNumber offnum;
 	ItemId		lp = NULL;
 	HeapTupleHeader htup;
+	HeapTupleHeader oldtup = NULL;
+	uint32		old_tup_len = 0;
 	struct
 	{
 		HeapTupleHeaderData hdr;
@@ -5246,7 +5295,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XL_HEAP_UPDATE_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(xlrec->target.node);
 		BlockNumber block = ItemPointerGetBlockNumber(&xlrec->target.tid);
@@ -5289,7 +5338,8 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
 	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
 		elog(PANIC, "heap_update_redo: invalid lp");
 
-	htup = (HeapTupleHeader) PageGetItem(page, lp);
+	oldtup = htup = (HeapTupleHeader) PageGetItem(page, lp);
+	old_tup_len = ItemIdGetLength(lp);
 
 	htup->t_infomask &= ~(HEAP_XMAX_COMMITTED |
 						  HEAP_XMAX_INVALID |
@@ -5308,7 +5358,7 @@ heap_xlog_update(XLogRecPtr lsn, XLogRecord *record, bool hot_update)
 	/* Mark the page as a candidate for pruning */
 	PageSetPrunable(page, record->xl_xid);
 
-	if (xlrec->all_visible_cleared)
+	if (xlrec->flags & XL_HEAP_UPDATE_ALL_VISIBLE_CLEARED)
 		PageClearAllVisible(page);
 
 	/*
@@ -5330,7 +5380,7 @@ newt:;
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->new_all_visible_cleared)
+	if (xlrec->flags & XL_HEAP_UPDATE_NEW_ALL_VISIBLE_CLEARED)
 	{
 		Relation	reln = CreateFakeRelcacheEntry(xlrec->target.node);
 		BlockNumber block = ItemPointerGetBlockNumber(&xlrec->newtid);
@@ -5380,16 +5430,40 @@ newsame:;
 	hsize = SizeOfHeapUpdate + SizeOfHeapHeader;
 
 	newlen = record->xl_len - hsize;
+
 	Assert(newlen <= MaxHeapTupleSize);
 	memcpy((char *) &xlhdr,
 		   (char *) xlrec + SizeOfHeapUpdate,
 		   SizeOfHeapHeader);
 	htup = &tbuf.hdr;
 	MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
-	/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
-	memcpy((char *) htup + offsetof(HeapTupleHeaderData, t_bits),
-		   (char *) xlrec + hsize,
-		   newlen);
+
+	/*
+	 * If the new tuple was delta-encoded, decode it.
+	 */
+	if (xlrec->flags & XL_HEAP_UPDATE_DELTA_ENCODED)
+	{
+		PGLZ_Header *encoded_data = (PGLZ_Header *) (((char *) xlrec) + hsize);
+
+		/*
+		 * FIXME: this won't work on architectures with strict alignment,
+		 * because encoded_data might not be aligned and pglz_decompress
+		 * assumes that the PGLZ_Header is correctly aligned. XXX: also add
+		 * some sanity checks with PGLZ_RAW_SIZE here.
+		 */
+		pglz_decompress_with_history(encoded_data,
+									 ((char *) htup) + offsetof(HeapTupleHeaderData, t_bits),
+									 ((char *) oldtup) + offsetof(HeapTupleHeaderData, t_bits),
+									 old_tup_len - offsetof(HeapTupleHeaderData, t_bits));
+		newlen = encoded_data->rawsize;
+	}
+	else
+	{
+		/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
+		memcpy((char *) htup + offsetof(HeapTupleHeaderData, t_bits),
+			   (char *) xlrec + hsize,
+			   newlen);
+	}
 	newlen += offsetof(HeapTupleHeaderData, t_bits);
 	htup->t_infomask2 = xlhdr.t_infomask2;
 	htup->t_infomask = xlhdr.t_infomask;
@@ -5404,7 +5478,7 @@ newsame:;
 	if (offnum == InvalidOffsetNumber)
 		elog(PANIC, "heap_update_redo: failed to add tuple");
 
-	if (xlrec->new_all_visible_cleared)
+	if (xlrec->flags & XL_HEAP_UPDATE_NEW_ALL_VISIBLE_CLEARED)
 		PageClearAllVisible(page);
 
 	freespace = PageGetHeapFreeSpace(page);		/* needed to update FSM below */
diff --git a/src/backend/utils/adt/pg_lzcompress.c b/src/backend/utils/adt/pg_lzcompress.c
index 466982e..eb24355 100644
--- a/src/backend/utils/adt/pg_lzcompress.c
+++ b/src/backend/utils/adt/pg_lzcompress.c
@@ -482,6 +482,20 @@ bool
 pglz_compress(const char *source, int32 slen, PGLZ_Header *dest,
 			  const PGLZ_Strategy *strategy)
 {
+	return pglz_compress_with_history(source, slen, NULL, 0, dest, strategy);
+}
+
+/*
+ * Like pglz_compress, but uses another piece of data to initialize the
+ * history table. When decompressing, you must pass the same history data
+ * to pglz_decompress_with_history(). This makes it possible to do simple
+ * delta compression.
+ */
+bool
+pglz_compress_with_history(const char *source, int32 slen,
+						   const char *history, int32 hlen,
+						   PGLZ_Header *dest, const PGLZ_Strategy *strategy)
+{
 	unsigned char *bp = ((unsigned char *) dest) + sizeof(PGLZ_Header);
 	unsigned char *bstart = bp;
 	int			hist_next = 0;
@@ -560,6 +574,24 @@ pglz_compress(const char *source, int32 slen, PGLZ_Header *dest,
 	 * hist_entries[] array; its entries are initialized as they are used.
 	 */
 	memset(hist_start, 0, sizeof(hist_start));
+	if (hlen > 0)
+	{
+		const char *hp = history;
+		const char *hend = history + hlen;
+		while (hp < hend)
+		{
+			/*
+			 * XXX: I think this doesn't handle the last few bytes of the
+			 * history correctly, or at least not in the most efficient way.
+			 * Logically, we should behave like the history and the source
+			 * strings are concatenated, but we use 'hend' here.
+			 */
+			pglz_hist_add(hist_start, hist_entries,
+						  hist_next, hist_recycle,
+						  hp, hend);
+			hp++;			/* Do not do this ++ in the line above! */
+		}
+	}
 
 	/*
 	 * Compress the source directly into the output buffer.
@@ -647,10 +679,21 @@ pglz_compress(const char *source, int32 slen, PGLZ_Header *dest,
 void
 pglz_decompress(const PGLZ_Header *source, char *dest)
 {
+	pglz_decompress_with_history(source, dest, NULL, 0);
+}
+
+void
+pglz_decompress_with_history(const PGLZ_Header *source, char *dest,
+							 const char *history, int32 hlen)
+{
 	const unsigned char *sp;
 	const unsigned char *srcend;
 	unsigned char *dp;
 	unsigned char *destend;
+	unsigned char *hend = NULL;
+
+	if (hlen > 0)
+		hend = (unsigned char *) history + hlen;
 
 	sp = ((const unsigned char *) source) + sizeof(PGLZ_Header);
 	srcend = ((const unsigned char *) source) + VARSIZE(source);
@@ -707,7 +750,20 @@ pglz_decompress(const PGLZ_Header *source, char *dest)
 				 */
 				while (len--)
 				{
-					*dp = dp[-off];
+					if (off > (dp - (unsigned char *) dest))
+					{
+						/*
+						 * this offset refers to the history passed by
+						 * the caller in a separate buffer.
+						 */
+						int hoff = off - (dp - (unsigned char *) dest);
+						Assert(hoff < hlen);
+						*dp = hend[-hoff];
+					}
+					else
+					{
+						*dp = dp[-off];
+					}
 					dp++;
 				}
 			}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 8ec710e..5dd2809 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -142,12 +142,16 @@ typedef struct xl_heap_update
 {
 	xl_heaptid	target;			/* deleted tuple id */
 	ItemPointerData newtid;		/* new inserted tuple id */
-	bool		all_visible_cleared;	/* PD_ALL_VISIBLE was cleared */
-	bool		new_all_visible_cleared;		/* same for the page of newtid */
+	char		flags;
+
 	/* NEW TUPLE xl_heap_header AND TUPLE DATA FOLLOWS AT END OF STRUCT */
 } xl_heap_update;
 
-#define SizeOfHeapUpdate	(offsetof(xl_heap_update, new_all_visible_cleared) + sizeof(bool))
+#define XL_HEAP_UPDATE_ALL_VISIBLE_CLEARED		0x01
+#define XL_HEAP_UPDATE_NEW_ALL_VISIBLE_CLEARED	0x02
+#define XL_HEAP_UPDATE_DELTA_ENCODED			0x04
+
+#define SizeOfHeapUpdate	(offsetof(xl_heap_update, flags) + sizeof(char))
 
 /*
  * This is what we need to know about vacuum page cleanup/redirect
diff --git a/src/include/utils/pg_lzcompress.h b/src/include/utils/pg_lzcompress.h
index 4af24a3..cddc476 100644
--- a/src/include/utils/pg_lzcompress.h
+++ b/src/include/utils/pg_lzcompress.h
@@ -107,6 +107,12 @@ extern const PGLZ_Strategy *const PGLZ_strategy_always;
  */
 extern bool pglz_compress(const char *source, int32 slen, PGLZ_Header *dest,
 			  const PGLZ_Strategy *strategy);
+extern bool pglz_compress_with_history(const char *source, int32 slen,
+						   const char *history, int32 hlen,
+						   PGLZ_Header *dest,
+						   const PGLZ_Strategy *strategy);
 extern void pglz_decompress(const PGLZ_Header *source, char *dest);
+extern void pglz_decompress_with_history(const PGLZ_Header *source, char *dest,
+										 const char *history, int32 hlen);
 
 #endif   /* _PG_LZCOMPRESS_H_ */

Attachment: pgbench-xlogtest.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to