subject:"\[HACKERS\] Performance Improvement by reducing WAL for Update Operation"

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-12 Thread Heikki Linnakangas


On 03/04/2014 01:58 PM, Amit Kapila wrote:

On Mon, Mar 3, 2014 at 7:57 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 02/16/2014 01:51 PM, Amit Kapila wrote:


On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:


Thanks. I have to agree with Robert though that using the pglz encoding when
we're just checking for a common prefix/suffix is a pretty crappy way of
going about it [1].

As the patch stands, it includes the NULL bitmap when checking for a common
prefix. That's probably not a good idea, because it defeats the prefix
detection in a the common case that you update a field from NULL to not-NULL
or vice versa.

Attached is a rewritten version, which does the prefix/suffix tests directly
in heapam.c, and adds the prefix/suffix lengths directly as fields in the
WAL record. If you could take one more look at this version, to check if
I've missed anything.


I had verified the patch and found few minor points:
...


Fixed those.


One Question:
+ rdata[1].data = (char *) xlrec;
Earlier it seems to store record hearder as first segment rdata[0],
whats the reason of changing it?


I found the code easier to read that way. The order of rdata entries 
used to be:


0: xl_heap_update struct
1: full-page reference to oldbuf (no data)
2: xl_heap_header_len struct for the new tuple
3-7: logical decoding stuff

The prefix/suffix fields made that order a bit awkward, IMHO. They are 
logically part of the header, even though they're not part of the struct 
(they are documented in comments inside the struct). So they ought to 
stay together with the xl_heap_update struct. Another option would've 
been to move it after the xl_heap_header_len struct.


Note that this doesn't affect the on-disk format of the WAL record, 
because the moved rdata entry is just a full-page reference, with no 
payload of its own.



I have verified the patch by doing crash recovery for below scenario's
and it worked fine:
a. no change in old and new tuple
b. all changed in new tuple
c. half changed (update half of the values to NULLS) in new tuple
d. only prefix same in new tuple
e. only suffix same in new tuple
f.  prefix-suffix same, other columns values changed in new tuple.


Thanks!


Conclusion is that patch shows good WAL reduction and performance
improvement for favourable cases without CPU overhead for non-favourable
cases.


Ok, great. Committed!

I left out the regression tests. It was good to have them while 
developing this, but I don't think there's a lot of value in including 
them permanently in the regression suite. Low-level things like the 
alignment-sensitive test are fragile, and can easily stop testing the 
thing it's supposed to test, depending on the platform and future 
changes in the code. And the current algorithm doesn't care much about 
alignment anyway.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-12 Thread Robert Haas

On Wed, Mar 12, 2014 at 5:30 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Ok, great. Committed!

Awesome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-10 Thread Amit Kapila

On Tue, Mar 4, 2014 at 4:13 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Agreed. Amit, do you have the test setup at hand, can you check the
 performance of this one more time?

Are you expecting more performance numbers than I have posted?
Is there anything more left for patch which you are expecting?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-04 Thread Heikki Linnakangas


On 03/03/2014 04:57 PM, Andres Freund wrote:

On 2014-03-03 16:27:05 +0200, Heikki Linnakangas wrote:

Attached is a rewritten version, which does the prefix/suffix tests directly
in heapam.c, and adds the prefix/suffix lengths directly as fields in the
WAL record. If you could take one more look at this version, to check if
I've missed anything.


Have you rerun the benchmarks?


No.


I'd guess the CPU overhead of this version is lower than earlier
versions,


That's what I would expect too.


but seing it tested won't be a bad idea.


Agreed. Amit, do you have the test setup at hand, can you check the 
performance of this one more time?


Also, I removed the GUC and table level options, on the assumption that 
this is cheap enough even when it's not helping that we don't need to 
make it configurable.



This ought to be tested with the new logical decoding stuff as it modified
the WAL update record format which the logical decoding stuff also relies,
but I don't know anything about that.


Hm, I think all it needs to do disable delta encoding if
need_tuple_data (which is dependent on wal_level=logical).


That's a pity, but we can live with it. If we did this at a higher level 
and checked which columns have been modified, we could include just the 
modified fields in the record, which should to be enough for logical 
decoding. It might be even more useful for logical decoding too to know 
exactly which fields were changed.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-04 Thread Andres Freund

On 2014-03-04 12:43:48 +0200, Heikki Linnakangas wrote:
 This ought to be tested with the new logical decoding stuff as it modified
 the WAL update record format which the logical decoding stuff also relies,
 but I don't know anything about that.
 
 Hm, I think all it needs to do disable delta encoding if
 need_tuple_data (which is dependent on wal_level=logical).
 
 That's a pity, but we can live with it.

Agreed. This is hardly the first optimization that only works for some
wal_levels.

 If we did this at a higher level and
 checked which columns have been modified, we could include just the modified
 fields in the record, which should to be enough for logical decoding. It
 might be even more useful for logical decoding too to know exactly which
 fields were changed.

Yea, I argued that way elsewhere in this thread. I do think we're going
to need per column info for further features in the near future. It's a
bit absurd that we're computing various sets of changed columns (HOT,
key, identity) plus the pre/postfix with this patchset.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-04 Thread Amit Kapila

On Mon, Mar 3, 2014 at 7:57 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 02/16/2014 01:51 PM, Amit Kapila wrote:

 On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas
 hlinnakan...@vmware.com  wrote:

 Thanks. I have to agree with Robert though that using the pglz encoding when
 we're just checking for a common prefix/suffix is a pretty crappy way of
 going about it [1].

 As the patch stands, it includes the NULL bitmap when checking for a common
 prefix. That's probably not a good idea, because it defeats the prefix
 detection in a the common case that you update a field from NULL to not-NULL
 or vice versa.

 Attached is a rewritten version, which does the prefix/suffix tests directly
 in heapam.c, and adds the prefix/suffix lengths directly as fields in the
 WAL record. If you could take one more look at this version, to check if
 I've missed anything.

I had verified the patch and found few minor points:
1.
+extern bool heap_delta_encode(TupleDesc tupleDesc, HeapTuple oldtup,
+  HeapTuple newtup, char *encdata, uint32 *enclen);
+extern void heap_delta_decode (char *encdata, uint32 enclen, HeapTuple oldtup,
+ HeapTuple newtup);

Declaration for above functions are not required now.

2.
+ * later, but will not cause any problem because this function is used only to
+ * identify whether EWT is required for update.
+ */
+bool
+XLogCheckBufferNeedsBackup(Buffer buffer)

Here, I think we can change the comment to avoid word
EWT (Encoded WAL tuple), as now we changed compression
mechanism and it's not used anywhere else.

One Question:
+ rdata[1].data = (char *) xlrec;
Earlier it seems to store record hearder as first segment rdata[0],
whats the reason of changing it?


I have verified the patch by doing crash recovery for below scenario's
and it worked fine:
a. no change in old and new tuple
b. all changed in new tuple
c. half changed (update half of the values to NULLS) in new tuple
d. only prefix same in new tuple
e. only suffix same in new tuple
f.  prefix-suffix same, other columns values changed in new tuple.

Performance Data


Non-Default settings

autovacuum = off
checkpoitnt_segments = 256
checkpoint_timeout =15min
full_page_writes = off

Unpatched

testname | wal_generated | duration
-+---+--
 one short and one long field, no change | 573506704 | 9.56587505340576
 one short and one long field, no change | 575351216 | 9.97713398933411
 one short and one long field, no change | 573501848 | 9.76377606391907
 hundred tiny fields, all changed| 364894056 | 13.3053929805756
 hundred tiny fields, all changed| 364891536 | 13.3533811569214
 hundred tiny fields, all changed| 364889264 | 13.3041989803314
 hundred tiny fields, half changed   | 365411920 | 14.1831648349762
 hundred tiny fields, half changed   | 365918216 | 13.6393811702728
 hundred tiny fields, half changed   | 366456552 | 13.6420011520386
 hundred tiny fields, half nulled| 300705288 | 12.8859741687775
 hundred tiny fields, half nulled| 301665624 | 12.6988201141357
 hundred tiny fields, half nulled| 300700504 | 13.3536100387573
 9 short and 1 long, short changed   | 396983080 | 8.83671307563782
 9 short and 1 long, short changed   | 396987976 | 9.23769211769104
 9 short and 1 long, short changed   | 396984080 | 9.45178604125977


wal-update-prefix-suffix-5.patch

testname | wal_generated | duration
-+---+--
 one short and one long field, no change | 156278832 | 6.69434094429016
 one short and one long field, no change | 156277352 | 6.70855903625488
 one short and one long field, no change | 156280040 | 6.70657396316528
 hundred tiny fields, all changed| 364895152 | 13.6677348613739
 hundred tiny fields, all changed| 364892256 | 12.7107839584351
 hundred tiny fields, all changed| 364890424 | 13.7760601043701
 hundred tiny fields, half changed   | 365970360 | 13.1902158260345
 hundred tiny fields, half changed   | 364895120 | 13.5730090141296
 hundred tiny fields, half changed   | 367031168 | 13.7023210525513
 hundred tiny fields, half nulled| 204418576 | 12.1997199058533
 hundred tiny fields, half nulled| 204422880 | 11.4583330154419
 hundred tiny fields, half nulled| 204417464 | 12.0228970050812
 9 short and 1 long, short changed   | 220466016 | 8.14843511581421
 9 short and 1 long, short changed   | 220471168 | 8.03712797164917
 9 short and 1 long, short changed   | 220464464 | 8.55907511711121
(15 rows)


Conclusion is that patch shows good WAL reduction and performance
improvement for favourable cases without CPU overhead for

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Heikki Linnakangas


On 02/16/2014 01:51 PM, Amit Kapila wrote:

On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

I'm pretty sure the overhead of that would be negligible, so we could always
enable it. There are certainly a lot of scenarios where prefix/suffix
detection alone wouldn't help, but so what.

Attached is a quick patch for that, if you want to test it.

I have updated the patch to correct few problems, addressed review comments
by Andres and done some optimizations to improve CPU overhead for worst
case.


Thanks. I have to agree with Robert though that using the pglz encoding 
when we're just checking for a common prefix/suffix is a pretty crappy 
way of going about it [1].


As the patch stands, it includes the NULL bitmap when checking for a 
common prefix. That's probably not a good idea, because it defeats the 
prefix detection in a the common case that you update a field from NULL 
to not-NULL or vice versa.


Attached is a rewritten version, which does the prefix/suffix tests 
directly in heapam.c, and adds the prefix/suffix lengths directly as 
fields in the WAL record. If you could take one more look at this 
version, to check if I've missed anything.


This ought to be tested with the new logical decoding stuff as it 
modified the WAL update record format which the logical decoding stuff 
also relies, but I don't know anything about that.


[1] 
http://www.postgresql.org/message-id/ca+tgmozstdqdku7dhcljchvmbrh1_yfouse+fuxesevnc4j...@mail.gmail.com


- Heikki
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index de4befa..cf23db5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6594,10 +6594,15 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	xl_heap_header_len xlhdr;
 	xl_heap_header_len xlhdr_idx;
 	uint8		info;
+	uint16		prefix_suffix[2];
+	uint16		prefixlen = 0,
+suffixlen = 0;
 	XLogRecPtr	recptr;
-	XLogRecData rdata[7];
+	XLogRecData rdata[9];
 	Page		page = BufferGetPage(newbuf);
 	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	int			nr;
+	Buffer		newbufref;
 
 	/* Caller should not call me on a non-WAL-logged relation */
 	Assert(RelationNeedsWAL(reln));
@@ -6607,6 +6612,54 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	else
 		info = XLOG_HEAP_UPDATE;
 
+	/*
+	 * If the old and new tuple are on the same page, we only need to log
+	 * the parts of the new tuple that were changed.  That saves on the amount
+	 * of WAL we need to write.  Currently, we just count any unchanged bytes
+	 * in the beginning or end of the tuples. That's quick to check, and
+	 * perfectly covers the common case that only one field is updated.
+	 *
+	 * We could do this even if the old and new tuple are on different pages,
+	 * but only if we don't make a full-page image of the old page, which is
+	 * difficult to know in advance.  Also, if the old tuple is corrupt for
+	 * some reason, it would allow propagate the corruption to the new page,
+	 * so it seems best to avoid.  Under the general assumption that for long
+	 * runs most updates tend to create new tuple version on same page, there
+	 * should not be significant impact on WAL reduction or performance.
+	 *
+	 * Skip this if we're taking a full-page image of the new page, as we don't
+	 * include the new tuple in the WAL record in that case.
+	 */
+	if (oldbuf == newbuf  (need_tuple_data || !XLogCheckBufferNeedsBackup(newbuf)))
+	{
+		char	   *oldp = (char *) oldtup-t_data + oldtup-t_data-t_hoff;
+		char	   *newp = (char *) newtup-t_data + newtup-t_data-t_hoff;
+		int			oldlen = oldtup-t_len - oldtup-t_data-t_hoff;
+		int			newlen = newtup-t_len - newtup-t_data-t_hoff;
+
+		/* Check for common prefix between old and new tuple */
+		for (prefixlen = 0; prefixlen  Min(oldlen, newlen); prefixlen++)
+		{
+			if (newp[prefixlen] != oldp[prefixlen])
+break;
+		}
+		/*
+		 * Storing the length of the prefix takes 2 bytes, so we need to save
+		 * at least 3 bytes or there's no point.
+		 */
+		if (prefixlen  3)
+			prefixlen = 0;
+
+		/* Same for suffix */
+		for (suffixlen = 0; suffixlen  Min(oldlen, newlen) - prefixlen; suffixlen++)
+		{
+			if (newp[newlen - suffixlen - 1] != oldp[oldlen - suffixlen - 1])
+break;
+		}
+		if (suffixlen  3)
+			suffixlen = 0;
+	}
+
 	xlrec.target.node = reln-rd_node;
 	xlrec.target.tid = oldtup-t_self;
 	xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup-t_data);
@@ -6619,41 +6672,119 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	xlrec.newtid = newtup-t_self;
 	if (new_all_visible_cleared)
 		xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
+	if (prefixlen  0)
+		xlrec.flags |= XLOG_HEAP_PREFIX_FROM_OLD;
+	if (suffixlen  0)
+		xlrec.flags |= XLOG_HEAP_SUFFIX_FROM_OLD;
 
-	rdata[0].data = (char *) xlrec;
-	rdata[0].len = SizeOfHeapUpdate;
-	rdata[0].buffer = InvalidBuffer;
+	/* If new tuple is the single and first tuple on page... */
+	if (ItemPointerGetOffsetNumber((newtup-t_self)) ==

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Andres Freund

On 2014-03-03 16:27:05 +0200, Heikki Linnakangas wrote:
 Thanks. I have to agree with Robert though that using the pglz encoding when
 we're just checking for a common prefix/suffix is a pretty crappy way of
 going about it [1].
 
 As the patch stands, it includes the NULL bitmap when checking for a common
 prefix. That's probably not a good idea, because it defeats the prefix
 detection in a the common case that you update a field from NULL to not-NULL
 or vice versa.
 
 Attached is a rewritten version, which does the prefix/suffix tests directly
 in heapam.c, and adds the prefix/suffix lengths directly as fields in the
 WAL record. If you could take one more look at this version, to check if
 I've missed anything.

Have you rerun the benchmarks? I'd guess the CPU overhead of this
version is lower than earlier versions, but seing it tested won't be a
bad idea.

 This ought to be tested with the new logical decoding stuff as it modified
 the WAL update record format which the logical decoding stuff also relies,
 but I don't know anything about that.

Hm, I think all it needs to do disable delta encoding if
need_tuple_data (which is dependent on wal_level=logical).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Andres Freund

On 2014-03-03 10:35:03 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 9:57 AM, Andres Freund and...@2ndquadrant.com wrote:
  Hm, I think all it needs to do disable delta encoding if
  need_tuple_data (which is dependent on wal_level=logical).
 
 Why does it need to do that?  The logical decoding stuff should be
 able to reverse out the delta encoding.

Against what should it perform the delta? Unless I misunderstand how the
patch works, it computes the delta against the old tuple in the heap
page?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 9:57 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hm, I think all it needs to do disable delta encoding if
 need_tuple_data (which is dependent on wal_level=logical).

Why does it need to do that?  The logical decoding stuff should be
able to reverse out the delta encoding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 10:38 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-03 10:35:03 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 9:57 AM, Andres Freund and...@2ndquadrant.com wrote:
  Hm, I think all it needs to do disable delta encoding if
  need_tuple_data (which is dependent on wal_level=logical).

 Why does it need to do that?  The logical decoding stuff should be
 able to reverse out the delta encoding.

 Against what should it perform the delta? Unless I misunderstand how the
 patch works, it computes the delta against the old tuple in the heap
 page?

Oh, maybe I need more caffeine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-15 Thread Andres Freund

Hi,


Some quick review comments:

On 2014-02-13 18:14:54 +0530, Amit Kapila wrote:
 + /*
 +  * EWT can be generated for all new tuple versions created by Update
 +  * operation. Currently we do it when both the old and new tuple 
 versions
 +  * are on same page, because during recovery if the page containing old
 +  * tuple is corrupt, it should not cascade that corruption to other 
 pages.
 +  * Under the general assumption that for long runs most updates tend to
 +  * create new tuple version on same page, there should not be 
 significant
 +  * impact on WAL reduction or performance.
 +  *
 +  * We should not generate EWT when we need to backup the whole block in
 +  * WAL as in that case there is no saving by reduced WAL size.
 +  */
 +
 + if (RelationIsEnabledForWalCompression(reln) 
 + (oldbuf == newbuf) 
 + !XLogCheckBufferNeedsBackup(newbuf))
 + {
 + uint32  enclen;

You should note that thew check for RelationIsEnabledForWalCompression()
here is racy and that that's ok because the worst that can happen is
that a uselessly generated delta.
   xlrec.target.node = reln-rd_node;
   xlrec.target.tid = oldtup-t_self;
   xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup-t_data);
 @@ -6619,6 +6657,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
   xlrec.newtid = newtup-t_self;
   if (new_all_visible_cleared)
   xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
 + if (compressed)
 + xlrec.flags |= XLOG_HEAP_DELTA_ENCODED;

I think this also needs to unset XLOG_HEAP_CONTAINS_NEW_TUPLE and
conditional on !need_tuple_data.



  /*
 + * Determine whether the buffer referenced has to be backed up. Since we 
 don't
 + * yet have the insert lock, fullPageWrites and forcePageWrites could change
 + * later, but will not cause any problem because this function is used only 
 to
 + * identify whether EWT is required for update.
 + */
 +bool
 +XLogCheckBufferNeedsBackup(Buffer buffer)
 +{

Should note very, very boldly that this can only be used in contexts
where a race is acceptable.

 diff --git a/src/backend/utils/adt/pg_rbcompress.c 
 b/src/backend/utils/adt/pg_rbcompress.c
 new file mode 100644
 index 000..877ccd7
 --- /dev/null
 +++ b/src/backend/utils/adt/pg_rbcompress.c
 @@ -0,0 +1,355 @@
 +/* --
 + * pg_rbcompress.c -
 + *
 + *   This is a delta encoding scheme specific to PostgreSQL and 
 designed
 + *   to compress similar tuples. It can be used as it is or extended 
 for
 + *   other purpose in PostgrSQL if required.
 + *
 + *   Currently, this just checks for a common prefix and/or suffix, 
 but
 + *   the output format is similar to the LZ format used in 
 pg_lzcompress.c.
 + *
 + * Copyright (c) 1999-2014, PostgreSQL Global Development Group
 + *
 + * src/backend/utils/adt/pg_rbcompress.c
 + * --
 + */

This needs significantly more explanations about the algorithm and the
reasoning behind it.


 +static const PGRB_Strategy strategy_default_data = {
 + 32, /* Data chunks 
 less than 32 bytes are not
 +  * compressed */
 + INT_MAX,/* No upper limit on 
 what we'll try to
 +  * compress */
 + 35, /* Require 25% 
 compression rate, or not worth
 +  * it */
 +};

compression rate looks like it's mismatch between comment and code.

 +/* --
 + * pgrb_out_ctrl -
 + *
 + *   Outputs the last and allocates a new control byte if needed.
 + * --
 + */
 +#define pgrb_out_ctrl(__ctrlp,__ctrlb,__ctrl,__buf) \
 +do { \
 + if ((__ctrl  0xff) == 0)   
 \
 + {   
 \
 + *(__ctrlp) = __ctrlb;   
 \
 + __ctrlp = (__buf)++;
 \
 + __ctrlb = 0;
 \
 + __ctrl = 1; 
 \
 + }   
 \
 +} while (0)
 +

double underscore variables are reserved for the

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-15 Thread Amit Kapila

On Sat, Feb 15, 2014 at 8:21 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,


 Some quick review comments:
Thanks for the review, I shall handle/reply to comments with the
updated version in which I am planing to fix a bug (right now preparing a
test to reproduce it) in this code.
Bug:
Tag can handle maximum length of 273 bytes, but this patch is not
considering it.

 I have to admit, I have serious doubts about this approach. I have a
 very hard time believing this won't cause performance regression in many
 common cases...

Actually, till now I was majorly focusing on worst case (i.e at boundary of
compression ratio) thinking that most others will do good. However I shall
produce data for much more common cases as well.
Please let me know if you have anything specific thing in mind where this
will not work well.

More importantly I don't think doing the compression on
 this level is that interesting. I know Heikki argued for it, but I think
 extending the bitmap that's computed for HOT to cover all columns and
 doing this on a column level sounds much more sensible to me.

Previously we have tried to do at column boundaries, but the main problem
turned out to be in worst cases where we spend time in extracting values
from tuples based on column boundaries and later found that data is not
compressible.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-15 Thread Andres Freund

On 2014-02-15 21:01:07 +0530, Amit Kapila wrote:
 More importantly I don't think doing the compression on
  this level is that interesting. I know Heikki argued for it, but I think
  extending the bitmap that's computed for HOT to cover all columns and
  doing this on a column level sounds much more sensible to me.
 
 Previously we have tried to do at column boundaries, but the main problem
 turned out to be in worst cases where we spend time in extracting values
 from tuples based on column boundaries and later found that data is not
 compressible.

I think that hugely depends on how you implement it. I think you'd need
to have a loop traversing over the both tuples at the same time on the
level of heap_deform_tuple(). If you'd use the result to get rid of
HeapSatisfiesHOTandKeyUpdate() at the same time I am pretty sure you
wouldn't see very high overhead.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-13 Thread Bruce Momjian

On Thu, Feb 13, 2014 at 10:20:46AM +0530, Amit Kapila wrote:
  Why not *only* prefix/suffix?
 
 To represent prefix/suffix match, we atleast need a way to tell
 that the offset and len of matched bytes and then how much
 is the length of unmatched bytes we have copied.
 I agree that a simpler format could be devised if we just want to
 do prefix-suffix match, but that would require much more test
 during recovery to ensure everything is fine, advantage with LZ
 format is that we don't need to bother about decoding, it will work
 as without any much change in LZ decode routine.

Based on the numbers I think prefix/suffix-only needs to be explored. 
Consider if you just change one field of a row --- prefix/suffix would
find all the matching parts.  If you change the first and last fields,
you get no compression at all, but your prefix/suffix test isn't going
to get that either.

As I understand it, the only place prefix/suffix with LZ compression is
a win over prefix/suffix-only is when you change two middle fields, and
there are common fields unchanged between them.  If we are looking at
11% CPU overhead for that, it isn't worth it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Bruce Momjian

On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote:
 By issue, I assume you mean to say, which compression algorithm is
 best for this patch.
 For this patch, currently we have 2 algorithm's for which results have been
 posted. As far as I understand Heikki is pretty sure that the latest algorithm
 (compression using prefix-suffix match in old and new tuple) used for this
 patch is better than the other algorithm in terms of CPU gain or overhead.
 The performance data taken by me for the worst case for this algorithm
 shows there is a CPU overhead for this algorithm as well.
 
 OTOH the another algorithm (compression using old tuple as history) can be
 a bigger win in terms I/O reduction in more number of cases.
 
 In short, it is still not decided which algorithm to choose and whether
 it can be enabled by default or it is better to have table level switch
 to enable/disable it.
 
 So I think the decision to be taken here is about below points:
 1.  Are we okay with I/O reduction at the expense of CPU for *worst* cases
  and I/O reduction without impacting CPU (better overall tps) for
  *favourable* cases?
 2.  If we are not okay with worst case behaviour, then can we provide
  a table-level switch, so that it can be decided by user?
 3.  If none of above, then is there any other way to mitigate the worst
  case behaviour or shall we just reject this patch and move on.
 
 Given a choice to me, I would like to go with option-2, because I think
 for most cases UPDATE statement will have same data for old and
 new tuples except for some part of tuple (generally column's having large
 text data are not modified), so we will be end up mostly in favourable cases
 and surely for worst cases we don't want user to suffer from CPU overhead,
 so a table-level switch is also required.

I think 99.9% of users are never going to adjust this so we had better
choose something we are happy to enable for effectively everyone.  In my
reading, prefix/suffix seemed safe for everyone.  We can always revisit
this if we think of something better later, as WAL format changes are not
a problem for pg_upgrade.

I also think making it user-tunable is so hard for users to know when to
adjust as to be almost not worth the user interface complexity it adds.

I suggest we go with always-on prefix/suffix mode, then add some check
so the worst case is avoided by just giving up on compression.

As I said previously, I think compressing the page images is the next
big win in this area.

 I think here one might argue that for some users it is not feasible to
 decide whether their tuples data for UPDATE is going to be similar
 or completely different and they are not at all ready for any risk for
 CPU overhead, but they would be happy to see I/O reduction in which
 case it is difficult to decide what should be the value of table-level
 switch. Here I think the only answer is nothing is free in this world,
 so either make sure about the application's behaviour for UPDATE
 statement before going to production or just don't enable this switch and
 be happy with the current behaviour.

Again, can't set do a minimal attempt at prefix/suffix compression so
there is no measurable overhead?

 On the other side there will be users who will be pretty certain about their
 usage of UPDATE statement or atleast are ready to evaluate their
 application if they can get such a huge gain, so it would be quite useful
 feature for such users.
 
 can we move move forward with the full-page compression patch?
 
 In my opinion, it is not certain that whatever compression algorithm got
 decided for this patch (if any) can be directly used for full-page
 compression, some ideas could be used or may be the algorithm could be
 tweaked a bit to make it usable for full-page compression.

Thanks, I understand that now.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Amit Kapila

On Wed, Feb 12, 2014 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote:

 I think 99.9% of users are never going to adjust this so we had better
 choose something we are happy to enable for effectively everyone.  In my
 reading, prefix/suffix seemed safe for everyone.  We can always revisit
 this if we think of something better later, as WAL format changes are not
 a problem for pg_upgrade.

  Agreed.

 I also think making it user-tunable is so hard for users to know when to
 adjust as to be almost not worth the user interface complexity it adds.

 I suggest we go with always-on prefix/suffix mode, then add some check
 so the worst case is avoided by just giving up on compression.

 As I said previously, I think compressing the page images is the next
 big win in this area.

 I think here one might argue that for some users it is not feasible to
 decide whether their tuples data for UPDATE is going to be similar
 or completely different and they are not at all ready for any risk for
 CPU overhead, but they would be happy to see I/O reduction in which
 case it is difficult to decide what should be the value of table-level
 switch. Here I think the only answer is nothing is free in this world,
 so either make sure about the application's behaviour for UPDATE
 statement before going to production or just don't enable this switch and
 be happy with the current behaviour.

 Again, can't set do a minimal attempt at prefix/suffix compression so
 there is no measurable overhead?

Yes, currently it is there at 25%, which means there should be atleast 25%
match in prefix-suffix, then only we consider it for compression and that
is pretty fast and almost no overhead, but the worst case here is other
way i.e when the string has 25% match in prefix-suffix, but after that
there is no match or at least in next few bytes there is no match.

For example, consider below 2 cases:

Case-1

old tuple
aaa


new tuple
bbb
bbba

Here there is a suffix match for 25% of string, but after that there is no
match, so we have to copy all the 75% remaining bytes as it is byte-by-byte.
Now here with bit longer tuples (800 bytes), the performance data taken be me
shows around ~11% of CPU overhead. Now as this test is a fabricated test
to just see how much extra CPU it consumes for worst scenario, in reality
user might not see this, at least in synchronous commit mode on, because
there is always some I/O involved at end of transaction (unless there is some
error in between or user rollbacks transaction chances of which are very less).


First thing that comes to mind after seeing above scenario, is that why not
increase the minimum limit of 25%, because we have almost negligible
overhead in comparing prefix-suffix, so I have tried that by increasing it
to 35% or more but in that case it starts falling from other side like
for cases when there is 34% match and still we return.

Here one of the improvements which can be done is that after prefix-suffix
match, instead of going byte-by-byte copy as per LZ format we can directly
copy all the remaining part of tuple but I think that would require us to use
some different format than LZ which is also not too difficult to do, but the
question is do we really need such a change to handle the above kind of
worst case.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Claudio Freire

On Thu, Feb 13, 2014 at 1:20 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Here one of the improvements which can be done is that after prefix-suffix
 match, instead of going byte-by-byte copy as per LZ format we can directly
 copy all the remaining part of tuple but I think that would require us to use
 some different format than LZ which is also not too difficult to do, but the
 question is do we really need such a change to handle the above kind of
 worst case.


Why use LZ at all? Why not *only* prefix/suffix?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Amit Kapila

On Thu, Feb 13, 2014 at 10:07 AM, Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, Feb 13, 2014 at 1:20 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Here one of the improvements which can be done is that after prefix-suffix
 match, instead of going byte-by-byte copy as per LZ format we can directly
 copy all the remaining part of tuple but I think that would require us to use
 some different format than LZ which is also not too difficult to do, but the
 question is do we really need such a change to handle the above kind of
 worst case.


 Why use LZ at all?

We are just using LZ *format* to represent compressed string.
Just copied some text from pg_lzcompress.c, to explain what
exactly we are using

the first byte after the header tells what to do
 the next 8 times. We call this the control byte.

An unset bit in the control byte means, that one uncompressed
byte follows, which is copied from input to output.
A set bit in the control byte means, that a tag of 2-3 bytes
follows. A tag contains information to copy some bytes, that
are already in the output buffer, to the current location in
the output.

 Why not *only* prefix/suffix?

To represent prefix/suffix match, we atleast need a way to tell
that the offset and len of matched bytes and then how much
is the length of unmatched bytes we have copied.
I agree that a simpler format could be devised if we just want to
do prefix-suffix match, but that would require much more test
during recovery to ensure everything is fine, advantage with LZ
format is that we don't need to bother about decoding, it will work
as without any much change in LZ decode routine.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Robert Haas

On Tue, Feb 11, 2014 at 11:37 AM, Bruce Momjian br...@momjian.us wrote:
 Yes, that was my point.  I though the compression of full-page images
 was a huge win and that compression was pretty straight-forward, except
 for the compression algorithm.  If the compression algorithm issue is
 resolved, can we move move forward with the full-page compression patch?

Discussion of the full-page compression patch properly belongs on that
thread rather than this one.  However, based on what we've discovered
so far here, I won't be very surprised if that patch turns out to have
serious problems with CPU consumption.  The evidence from this thread
suggests that making even relatively lame attempts at compression is
extremely costly in terms of CPU overhead.  Now, the issues with
straight-up compression are somewhat different than for delta
compression and, in particular, it's easier to bail out of straight-up
compression sooner if things aren't working out.  But even with all
that, I expect it to be not too difficult to find cases where some
compression is achieved but with a dramatic increase in runtime on
CPU-bound workloads.  Which is basically the same problem this patch
has.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-12 Thread Robert Haas

On Mon, Feb 10, 2014 at 10:02 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think if we want to change LZ format, it will be bit more work and
 verification for decoding has to be done much more strenuously.

I don't think it'll be that big of a deal.  And anyway, the evidence
here suggests that we still need more speed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-11 Thread Bruce Momjian

On Wed, Feb  5, 2014 at 10:57:57AM -0800, Peter Geoghegan wrote:
 On Wed, Feb 5, 2014 at 12:50 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  I think there's zero overlap. They're completely complimentary features.
  It's not like normal WAL records have an irrelevant volume.
 
 
  Correct. Compressing a full-page image happens on the first update after a
  checkpoint, and the diff between old and new tuple is not used in that case.
 
 Uh, I really just meant that one thing that might overlap is
 considerations around the choice of compression algorithm. I think
 that there was some useful discussion of that on the other thread as
 well.

Yes, that was my point.  I though the compression of full-page images
was a huge win and that compression was pretty straight-forward, except
for the compression algorithm.  If the compression algorithm issue is
resolved, can we move move forward with the full-page compression patch?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-11 Thread Amit Kapila

On Tue, Feb 11, 2014 at 10:07 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Feb  5, 2014 at 10:57:57AM -0800, Peter Geoghegan wrote:
 On Wed, Feb 5, 2014 at 12:50 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  I think there's zero overlap. They're completely complimentary features.
  It's not like normal WAL records have an irrelevant volume.
 
 
  Correct. Compressing a full-page image happens on the first update after a
  checkpoint, and the diff between old and new tuple is not used in that 
  case.

 Uh, I really just meant that one thing that might overlap is
 considerations around the choice of compression algorithm. I think
 that there was some useful discussion of that on the other thread as
 well.

 Yes, that was my point.  I though the compression of full-page images
 was a huge win and that compression was pretty straight-forward, except
 for the compression algorithm.  If the compression algorithm issue is
 resolved,

By issue, I assume you mean to say, which compression algorithm is
best for this patch.
For this patch, currently we have 2 algorithm's for which results have been
posted. As far as I understand Heikki is pretty sure that the latest algorithm
(compression using prefix-suffix match in old and new tuple) used for this
patch is better than the other algorithm in terms of CPU gain or overhead.
The performance data taken by me for the worst case for this algorithm
shows there is a CPU overhead for this algorithm as well.

OTOH the another algorithm (compression using old tuple as history) can be
a bigger win in terms I/O reduction in more number of cases.

In short, it is still not decided which algorithm to choose and whether
it can be enabled by default or it is better to have table level switch
to enable/disable it.

So I think the decision to be taken here is about below points:
1.  Are we okay with I/O reduction at the expense of CPU for *worst* cases
 and I/O reduction without impacting CPU (better overall tps) for
 *favourable* cases?
2.  If we are not okay with worst case behaviour, then can we provide
 a table-level switch, so that it can be decided by user?
3.  If none of above, then is there any other way to mitigate the worst
 case behaviour or shall we just reject this patch and move on.

Given a choice to me, I would like to go with option-2, because I think
for most cases UPDATE statement will have same data for old and
new tuples except for some part of tuple (generally column's having large
text data are not modified), so we will be end up mostly in favourable cases
and surely for worst cases we don't want user to suffer from CPU overhead,
so a table-level switch is also required.

I think here one might argue that for some users it is not feasible to
decide whether their tuples data for UPDATE is going to be similar
or completely different and they are not at all ready for any risk for
CPU overhead, but they would be happy to see I/O reduction in which
case it is difficult to decide what should be the value of table-level
switch. Here I think the only answer is nothing is free in this world,
so either make sure about the application's behaviour for UPDATE
statement before going to production or just don't enable this switch and
be happy with the current behaviour.

On the other side there will be users who will be pretty certain about their
usage of UPDATE statement or atleast are ready to evaluate their
application if they can get such a huge gain, so it would be quite useful
feature for such users.

can we move move forward with the full-page compression patch?

In my opinion, it is not certain that whatever compression algorithm got
decided for this patch (if any) can be directly used for full-page
compression, some ideas could be used or may be the algorithm could be
tweaked a bit to make it usable for full-page compression.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-10 Thread Amit Kapila

On Thu, Feb 6, 2014 at 5:57 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Considering above change as correct, I have tried to see the worst
 case overhead for this patch by having new tuple such that after
 25% or so of suffix/prefix match, there is a small change in tuple
 and kept rest of tuple same as old tuple and it shows overhead
 for this patch as well.

 Updated test script is attached.

 Unpatched
  testname | wal_generated | duration
 --+---+--
  ten long fields, 8 bytes changed | 348843824 | 5.56866788864136
  ten long fields, 8 bytes changed | 348844800 | 5.84434294700623
  ten long fields, 8 bytes changed | 35050 | 5.92329406738281
 (3 rows)



 wal-update-prefix-suffix-encode-1.patch

  testname | wal_generated | duration
 --+---+--
  ten long fields, 8 bytes changed | 348845624 | 6.92243480682373
  ten long fields, 8 bytes changed | 348847000 | 8.35828399658203
  ten long fields, 8 bytes changed | 350204752 | 7.61826491355896
 (3 rows)

 One minor point, can we avoid having prefix tag if prefixlen is 0.

 + /* output prefix as a tag */
 + pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen);

I think generating out tag for suffix/prefix has one bug i.e it doesn't
consider the max length of 273 bytes (PGLZ_MAX_MATCH ) which
is mandatory for LZ format.

One more point about this patch is that in function pgrb_delta_encode(),
is it mandatory to return at end in below check:

if (result_size  result_max)
return false;

I mean to say as before starting for copying literal bytes we have
already ensured that the compressed data is greater than 25%, so
may be we can avoid this check. I have tried to take the data by
removing this check and found that it reduces overhead and improves
WAL reduction as well. The data is as below (compare this with data
in above mail for unpatched version data):


 testname | wal_generated | duration
--+---+--
 ten long fields, 8 bytes changed | 300705552 | 6.51416897773743
 ten long fields, 8 bytes changed | 300703816 | 6.85267090797424
 ten long fields, 8 bytes changed | 300701840 | 7.15832996368408
(3 rows)

If we want to go with this approach, then I think apart from above
points there is no major change required (may be some comments,
function names etc. can be improved).

 But if we really just want to do prefix/suffix compression, this is a
 crappy and expensive way to do it.  We needn't force everything
 through the pglz tag format just because we elide a common prefix or
 suffix.

Here are you bothered about below code where the patch is
doing byte-by-byte copy after prefix/suffix match?

/* output bytes between prefix and suffix as literals */
dp = source[prefixlen];
dend = source[slen - suffixlen];
while (dp  dend)
{
pgrb_out_literal(ctrlp, ctrlb, ctrl, bp, *dp);
dp++; /* Do not do this ++ in the line above! */
}

I think if we want to change LZ format, it will be bit more work and
verification for decoding has to be done much more strenuously.

Note - During performance test, I have focussed mainly on worst case,
because we already know that this idea is good for best and average cases.
However if we decide that this is better and good to proceed, I can take
the data for other cases as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-06 Thread Amit Kapila

On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Feb 5, 2014 at 8:50 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 02/05/2014 04:48 PM, Amit Kapila wrote:

 I have done one test where there is a large suffix match, but
 not large enough that it can compress more than 75% of string,
 the CPU overhead with wal-update-prefix-suffix-encode-1.patch is
 not much, but there is no I/O reduction as well.


 Hmm, it's supposed to compress if you save at least 25%, not 75%. Apparently
 I got that backwards in the patch...

 So If I understand the code correctly, the new check should be

 if (prefixlen + suffixlen  (slen * need_rate) / 100)
 return false;

 rather than

 if (slen - prefixlen - suffixlen  (slen * need_rate) / 100)
 return false;

Considering above change as correct, I have tried to see the worst
case overhead for this patch by having new tuple such that after
25% or so of suffix/prefix match, there is a small change in tuple
and kept rest of tuple same as old tuple and it shows overhead
for this patch as well.

Updated test script is attached.

Unpatched
 testname | wal_generated | duration
--+---+--
 ten long fields, 8 bytes changed | 348843824 | 5.56866788864136
 ten long fields, 8 bytes changed | 348844800 | 5.84434294700623
 ten long fields, 8 bytes changed | 35050 | 5.92329406738281
(3 rows)



wal-update-prefix-suffix-encode-1.patch

 testname | wal_generated | duration
--+---+--
 ten long fields, 8 bytes changed | 348845624 | 6.92243480682373
 ten long fields, 8 bytes changed | 348847000 | 8.35828399658203
 ten long fields, 8 bytes changed | 350204752 | 7.61826491355896
(3 rows)

One minor point, can we avoid having prefix tag if prefixlen is 0.

+ /* output prefix as a tag */
+ pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen);



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


wal-update-testsuite.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-06 Thread Amit Kapila

On Wed, Feb 5, 2014 at 8:56 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Feb 5, 2014 at 5:13 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 02/05/2014 07:54 AM, Amit Kapila wrote:

 That's not the worst case, by far.

 First, note that the skipping while scanning new tuple is only performed in
 the first loop. That means that as soon as you have a single match, you fall
 back to hashing every byte. So for the worst case, put one 4-byte field as
 the first column, and don't update it.

 Also, I suspect the runtimes in your test were dominated by I/O. When I
 scale down the number of rows involved so that the whole test fits in RAM, I
 get much bigger differences with and without the patch. You might also want
 to turn off full_page_writes, to make the effect clear with less data.

 So with this test, the overhead is very significant.

 With the skipping logic, another kind of worst case case is that you have
 a lot of similarity between the old and new tuple, but you miss it because
 you skip.

 This is exactly the reason why I have not kept skipping logic in second
 pass(loop), but I think may be it would have been better to keep it not
 as aggressive as in first pass.

I have tried to merge pass-1 and pass-2 and kept skipping logic as same,
and it have reduced the overhead to a good extent but not completely for
the new case you have added. This change is to check if it can reduce
overhead, if we want to proceed, may be we can limit the skip factor, so
that chance of skipping some match data is reduced.

New version of patch is attached with mail

Unpatched

   testname   | wal_generated | duration
--+---+--
 ten long fields, all changed | 348842856 | 6.93688106536865
 ten long fields, all changed | 348843672 | 7.53063702583313
 ten long fields, all changed | 352662344 | 7.76640701293945
(3 rows)


pgrb_delta_encoding_v8.patch
 testname | wal_generated | duration
--+---+--
 ten long fields, but one changed | 348848144 | 9.22694897651672
 ten long fields, but one changed | 348841376 | 9.11818099021912
 ten long fields, but one changed | 352963488 | 8.37875485420227
(3 rows)


pgrb_delta_encoding_v9.patch

 testname | wal_generated | duration
--+---+--
 ten long fields, but one changed | 350166320 | 8.84561610221863
 ten long fields, but one changed | 348840728 | 8.45299792289734
 ten long fields, but one changed | 348846656 | 8.34846496582031
(3 rows)


It appears to me that it can be good idea to merge both the patches
(prefix-suffix encoding + delta-encoding) in a way such that if we
get reasonable compression (50% or so) with prefix-suffix, then we
can return without doing delta encoding and if compression is lesser
than we can do delta encoding for rest of tuple. The reason I think it
will be good because by just doing prefix-suffix we might leave many
cases where good compression is possible.
If you think it is viable way, then I can merge both the patches and
check the results.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgrb_delta_encoding_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Heikki Linnakangas


On 02/04/2014 11:58 PM, Andres Freund wrote:

On February 4, 2014 10:50:10 PM CET, Peter Geoghegan p...@heroku.com wrote:

On Tue, Feb 4, 2014 at 11:11 AM, Andres Freund and...@2ndquadrant.com
wrote:

Does this feature relate to compression of WAL page images at all?


No.


So the obvious question is: where, if anywhere, do the two efforts
(this patch, and Fujii's patch) overlap? Does Fujii have any concerns
about this patch as it relates to his effort to compress FPIs?


I think there's zero overlap. They're completely complimentary features. It's 
not like normal WAL records have an irrelevant volume.


Correct. Compressing a full-page image happens on the first update after 
a checkpoint, and the diff between old and new tuple is not used in that 
case.


Compressing full page images makes a difference if you're doing random 
updates across a large table, so that you only update each buffer 1-2 
times. This patch will have no effect in that case. And when you update 
the same page many times between checkpoints, the full-page image is 
insignificant, and this patch has a big effect.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Heikki Linnakangas


On 02/05/2014 07:54 AM, Amit Kapila wrote:

On Tue, Feb 4, 2014 at 11:58 PM, Robert Haas robertmh...@gmail.com wrote:

On Tue, Feb 4, 2014 at 12:39 PM, Amit Kapila amit.kapil...@gmail.com wrote:

Now there is approximately 1.4~5% CPU gain for
hundred tiny fields, half nulled case



Assuming that the logic isn't buggy, a point in need of further study,
I'm starting to feel like we want to have this.  And I might even be
tempted to remove the table-level off switch.


I have tried to stress on worst case more, as you are thinking to
remove table-level switch and found that even if we increase the
data by approx. 8 times (ten long fields, all changed, each field contains
80 byte data), the CPU overhead is still  5% which clearly shows that
the overhead doesn't increase much even if the length of unmatched data
is increased by much larger factor.
So the data for worst case adds more weight to your statement
(remove table-level switch), however there is no harm in keeping
table-level option with default as 'true' and if some users are really sure
the updates in their system will have nothing in common, then they can
make this new option as 'false'.

Below is data for the new case  ten long fields, all changed added
in attached script file:


That's not the worst case, by far.

First, note that the skipping while scanning new tuple is only performed 
in the first loop. That means that as soon as you have a single match, 
you fall back to hashing every byte. So for the worst case, put one 
4-byte field as the first column, and don't update it.


Also, I suspect the runtimes in your test were dominated by I/O. When I 
scale down the number of rows involved so that the whole test fits in 
RAM, I get much bigger differences with and without the patch. You might 
also want to turn off full_page_writes, to make the effect clear with 
less data.


So, I came up with the attached worst case test, modified from your 
latest test suite.


unpatched:


   testname   | wal_generated | duration
--+---+--
 ten long fields, all but one changed | 343385312 | 2.20806908607483
 ten long fields, all but one changed | 336263592 | 2.18997097015381
 ten long fields, all but one changed | 336264504 | 2.17843413352966
(3 rows)


pgrb_delta_encoding_v8.patch:

   testname   | wal_generated | duration
--+---+--
 ten long fields, all but one changed | 338356944 | 3.33501315116882
 ten long fields, all but one changed | 344059272 | 3.37364101409912
 ten long fields, all but one changed | 336257840 | 3.36244201660156
(3 rows)

So with this test, the overhead is very significant.

With the skipping logic, another kind of worst case case is that you 
have a lot of similarity between the old and new tuple, but you miss it 
because you skip. For example, if you change the first few columns, but 
leave a large text column at the end of the tuple unchanged.


- Heikki


wal-update-testsuite.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Heikki Linnakangas


On 01/30/2014 08:53 AM, Amit Kapila wrote:

On Wed, Jan 29, 2014 at 8:13 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 01/29/2014 02:21 PM, Amit Kapila wrote:

The main reason to process in chunks as much as possible is to save
cpu cycles. For example if we build hash table byte-by-byte, then even
for best case where most of tuple has a match, it will have reasonable
overhead due to formation of hash table.


Hmm. One very simple optimization we could do is to just compare the two
strings byte by byte, before doing anything else, to find any common prefix
they might have. Then output a tag for the common prefix, and run the normal
algorithm on the rest of the strings. In many real-world tables, the 1-2
first columns are a key that never changes, so that might work pretty well
in practice. Maybe it would also be worthwhile to do the same for any common
suffix the tuples might have.


Is it possible to do for both prefix and suffix together, basically
the question I
have in mind is what will be deciding factor for switching from hash table
mechanism to string comparison mode for suffix. Do we switch when we find
long enough match?


I think you got it backwards. You don't switch from hash table mechanism 
to string comparison. You do the prefix/suffix comparison *first*, and 
run the hash table algorithm only on the middle part, between the 
common prefix and suffix.



Can we do this optimization after the basic version is acceptable?


I would actually suggest doing that first. Perhaps even ditch the whole 
history table approach and do *only* the scan for prefix and suffix. 
That's very cheap, and already covers a large fraction of UPDATEs that 
real applications do. In particular, it's optimal for the case that you 
update only a single column, something like UPDATE foo SET bar = bar + 1.


I'm pretty sure the overhead of that would be negligible, so we could 
always enable it. There are certainly a lot of scenarios where 
prefix/suffix detection alone wouldn't help, but so what.


Attached is a quick patch for that, if you want to test it.

- Heikki
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index e0b8a4e..c4ac2bd 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1014,6 +1014,22 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 /listitem
/varlistentry
 
+   varlistentry
+termliteralwal_compress_update/ (typeboolean/)/term
+listitem
+ para
+  Enables or disables the WAL tuple compression for commandUPDATE/
+  on this table.  Default value of this option is false to maintain
+  backward compatability for the command. If true, all the update
+  operations on this table which will place the new tuple on same page
+  as it's original tuple will compress the WAL for new tuple and
+  subsequently reduce the WAL volume.  It is recommended to enable
+  this option for tables where commandUPDATE/ changes less than
+  50 percent of tuple data.
+ /para
+ /listitem
+/varlistentry
+
/variablelist
 
   /refsect2
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..3bf5728 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -60,6 +60,7 @@
 #include access/sysattr.h
 #include access/tuptoaster.h
 #include executor/tuptable.h
+#include utils/pg_rbcompress.h
 
 
 /* Does att's datatype allow packing into the 1-byte-header varlena format? */
@@ -617,6 +618,44 @@ heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest)
 	memcpy((char *) dest-t_data, (char *) src-t_data, src-t_len);
 }
 
+/* 
+ * heap_delta_encode
+ *
+ *		Calculate the delta between two tuples and generate
+ *  encoded wal tuple (EWT), using pgrb. The result is stored
+ *  in *encdata.
+ * 
+ */
+bool
+heap_delta_encode(TupleDesc tupleDesc, HeapTuple oldtup, HeapTuple newtup,
+  char *encdata, uint32 *enclen)
+{
+	return pgrb_delta_encode(
+		(char *) newtup-t_data + offsetof(HeapTupleHeaderData, t_bits),
+		newtup-t_len - offsetof(HeapTupleHeaderData, t_bits),
+		(char *) oldtup-t_data + offsetof(HeapTupleHeaderData, t_bits),
+		oldtup-t_len - offsetof(HeapTupleHeaderData, t_bits),
+		encdata, enclen, NULL
+		);
+}
+
+/* 
+ * heap_delta_decode
+ *
+ *		Decode a tuple using delta-encoded WAL tuple and old tuple version.
+ * 
+ */
+void
+heap_delta_decode(char *encdata, uint32 enclen, HeapTuple oldtup, HeapTuple newtup)
+{
+	pgrb_delta_decode(encdata, enclen,
+			 (char *) newtup-t_data + offsetof(HeapTupleHeaderData, t_bits),
+			 MaxHeapTupleSize - offsetof(HeapTupleHeaderData, t_bits),
+			 newtup-t_len,
+			 (char *) oldtup-t_data + offsetof(HeapTupleHeaderData, t_bits),
+			 oldtup-t_len - offsetof(HeapTupleHeaderData, t_bits));
+}
+
 /*
  * heap_form_tuple
  *		construct a tuple from the

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Amit Kapila

On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/30/2014 08:53 AM, Amit Kapila wrote:

 Is it possible to do for both prefix and suffix together, basically
 the question I
 have in mind is what will be deciding factor for switching from hash table
 mechanism to string comparison mode for suffix. Do we switch when we find
 long enough match?


 I think you got it backwards. You don't switch from hash table mechanism to
 string comparison. You do the prefix/suffix comparison *first*, and run the
 hash table algorithm only on the middle part, between the common prefix
 and suffix.


 Can we do this optimization after the basic version is acceptable?


 I would actually suggest doing that first. Perhaps even ditch the whole
 history table approach and do *only* the scan for prefix and suffix. That's
 very cheap, and already covers a large fraction of UPDATEs that real
 applications do. In particular, it's optimal for the case that you update
 only a single column, something like UPDATE foo SET bar = bar + 1.

 I'm pretty sure the overhead of that would be negligible, so we could always
 enable it. There are certainly a lot of scenarios where prefix/suffix
 detection alone wouldn't help, but so what.

 Attached is a quick patch for that, if you want to test it.

I have done one test where there is a large suffix match, but
not large enough that it can compress more than 75% of string,
the CPU overhead with wal-update-prefix-suffix-encode-1.patch is
not much, but there is no I/O reduction as well. However for same
case there is both significant WAL reduction and CPU gain with
pgrb_delta_encoding_v8.patch

I have updated  ten long fields, all changed such that there is large
suffix match. Updated script is attached.

Unpatched
   testname   | wal_generated | duration
--+---+--
 ten long fields, all changed |1760986528 | 28.3700430393219
 ten long fields, all changed |1760981320 |   28.53244805336
 ten long fields, all changed |1764294992 | 28.6722140312195
(3 rows)


wal-update-prefix-suffix-encode-1.patch
   testname   | wal_generated | duration
--+---+--
 ten long fields, all changed |1760986016 | 29.4183659553528
 ten long fields, all changed |1760981904 | 29.7636449337006
 ten long fields, all changed |1762436104 |  29.508908033371
(3 rows)

pgrb_delta_encoding_v8.patch

   testname   | wal_generated | duration
--+---+--
 ten long fields, all changed | 733969304 |  23.916286945343
 ten long fields, all changed | 733977040 | 23.6019561290741
 ten long fields, all changed | 737384632 | 24.2645490169525


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


wal-update-testsuite.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Amit Kapila

On Wed, Feb 5, 2014 at 5:13 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 02/05/2014 07:54 AM, Amit Kapila wrote:

 On Tue, Feb 4, 2014 at 11:58 PM, Robert Haas robertmh...@gmail.com
 wrote:

 On Tue, Feb 4, 2014 at 12:39 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:

 Now there is approximately 1.4~5% CPU gain for
 hundred tiny fields, half nulled case


 Assuming that the logic isn't buggy, a point in need of further study,
 I'm starting to feel like we want to have this.  And I might even be
 tempted to remove the table-level off switch.


 I have tried to stress on worst case more, as you are thinking to
 remove table-level switch and found that even if we increase the
 data by approx. 8 times (ten long fields, all changed, each field
 contains
 80 byte data), the CPU overhead is still  5% which clearly shows that
 the overhead doesn't increase much even if the length of unmatched data
 is increased by much larger factor.
 So the data for worst case adds more weight to your statement
 (remove table-level switch), however there is no harm in keeping
 table-level option with default as 'true' and if some users are really
 sure
 the updates in their system will have nothing in common, then they can
 make this new option as 'false'.

 Below is data for the new case  ten long fields, all changed added
 in attached script file:


 That's not the worst case, by far.

 First, note that the skipping while scanning new tuple is only performed in
 the first loop. That means that as soon as you have a single match, you fall
 back to hashing every byte. So for the worst case, put one 4-byte field as
 the first column, and don't update it.

 Also, I suspect the runtimes in your test were dominated by I/O. When I
 scale down the number of rows involved so that the whole test fits in RAM, I
 get much bigger differences with and without the patch. You might also want
 to turn off full_page_writes, to make the effect clear with less data.

 So with this test, the overhead is very significant.

 With the skipping logic, another kind of worst case case is that you have
 a lot of similarity between the old and new tuple, but you miss it because
 you skip.

This is exactly the reason why I have not kept skipping logic in second
pass(loop), but I think may be it would have been better to keep it not
as aggressive as in first pass. The basic idea I had in mind is that if we
get match, then there is high chance that we get match in consecutive
positions.

I think we should see this patch as I/O reduction feature rather than CPU
gain/overhead because the I/O reduction in WAL has some other has
some other benefits like transfer for replication, in archiving, recovery,
basically where-ever there is disk read operation, as I/O reduction will
amount to less data read and which can be beneficial in many ways.

Sometime back, I was reading article on benefits of compression
in database where the benefits are shown something like what
I said above (atleast that is what I understood from it). Link to that
article is:
http://db2guys.wordpress.com/2013/08/23/compression/

Another thing is that I think it might be difficult to get negligible
overhead for data which is very less or non-compressible, that's
why it is preferred to have compression for table enabled with
switch.

Is it viable to see here, what is the best way to get I/O reduction
for most cases and provide a switch so that for worst cases
user can make it off?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Heikki Linnakangas


On 02/05/2014 04:48 PM, Amit Kapila wrote:

I have done one test where there is a large suffix match, but
not large enough that it can compress more than 75% of string,
the CPU overhead with wal-update-prefix-suffix-encode-1.patch is
not much, but there is no I/O reduction as well.


Hmm, it's supposed to compress if you save at least 25%, not 75%. 
Apparently I got that backwards in the patch...


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Amit Kapila

On Wed, Feb 5, 2014 at 8:50 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 02/05/2014 04:48 PM, Amit Kapila wrote:

 I have done one test where there is a large suffix match, but
 not large enough that it can compress more than 75% of string,
 the CPU overhead with wal-update-prefix-suffix-encode-1.patch is
 not much, but there is no I/O reduction as well.


 Hmm, it's supposed to compress if you save at least 25%, not 75%. Apparently
 I got that backwards in the patch...

Okay I think that is right, may be I can change the that check to see the
difference, but in general isn't it going to loose compression in much more
cases like if there is less than 25% match in prefix/suffix, but
more than 50% match in between the string.

While debugging, I noticed that it compresses less than history table
approach for general cases when internally update is done like for
Truncate table.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Peter Geoghegan

On Wed, Feb 5, 2014 at 12:50 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I think there's zero overlap. They're completely complimentary features.
 It's not like normal WAL records have an irrelevant volume.


 Correct. Compressing a full-page image happens on the first update after a
 checkpoint, and the diff between old and new tuple is not used in that case.

Uh, I really just meant that one thing that might overlap is
considerations around the choice of compression algorithm. I think
that there was some useful discussion of that on the other thread as
well.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Robert Haas

On Wed, Feb 5, 2014 at 6:59 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Attached is a quick patch for that, if you want to test it.

But if we really just want to do prefix/suffix compression, this is a
crappy and expensive way to do it.  We needn't force everything
through the pglz tag format just because we elide a common prefix or
suffix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Amit Kapila

On Wed, Feb 5, 2014 at 8:50 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 02/05/2014 04:48 PM, Amit Kapila wrote:

 I have done one test where there is a large suffix match, but
 not large enough that it can compress more than 75% of string,
 the CPU overhead with wal-update-prefix-suffix-encode-1.patch is
 not much, but there is no I/O reduction as well.


 Hmm, it's supposed to compress if you save at least 25%, not 75%. Apparently
 I got that backwards in the patch...

So If I understand the code correctly, the new check should be

if (prefixlen + suffixlen  (slen * need_rate) / 100)
return false;

rather than

if (slen - prefixlen - suffixlen  (slen * need_rate) / 100)
return false;

Please confirm, else any validation for this might not be useful?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Robert Haas

On Wed, Feb 5, 2014 at 6:43 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 So, I came up with the attached worst case test, modified from your latest
 test suite.

 unpatched:


testname   | wal_generated | duration
 --+---+--
  ten long fields, all but one changed | 343385312 | 2.20806908607483
  ten long fields, all but one changed | 336263592 | 2.18997097015381
  ten long fields, all but one changed | 336264504 | 2.17843413352966
 (3 rows)


 pgrb_delta_encoding_v8.patch:

testname   | wal_generated | duration
 --+---+--
  ten long fields, all but one changed | 338356944 | 3.33501315116882
  ten long fields, all but one changed | 344059272 | 3.37364101409912
  ten long fields, all but one changed | 336257840 | 3.36244201660156
 (3 rows)

 So with this test, the overhead is very significant.

Yuck.  Well that sucks.

 With the skipping logic, another kind of worst case case is that you have
 a lot of similarity between the old and new tuple, but you miss it because
 you skip. For example, if you change the first few columns, but leave a
 large text column at the end of the tuple unchanged.

I suspect there's no way to have our cake and eat it, too.  Most of
the work that Amit has done on this patch in the last few revs is to
cut back CPU overhead in the cases where the patch can't help because
the tuple has been radically modified.  If we're trying to get maximum
compression, we need to go the other way: for example, we could just
feed both the old and new tuples through pglz (or snappy, or
whatever).  That would allow us to take advantage not only of
similarity between the old and new tuples but also internal
duplication within either the old or the new tuple, but it would also
cost more CPU.  The concern with minimizing overhead in cases where
the compression doesn't help has thus far pushed us in the opposite
direction, namely passing over compression opportunities that a more
aggressive algorithm could find in order to keep the overhead low.

Off-hand, I'm wondering why we shouldn't apply the same skipping
algorithm that Amit is using at the beginning of the string for the
rest of it as well.  It might be a little too aggressive (maybe the
skip distance shouldn't increase by quite as much as doubling every
time, or not beyond 16/32 bytes?) but I don't see why the general
principle isn't sound wherever we are in the tuple.

Unfortunately, despite changing things to make a history entry only
every 4th character, building the history is still pretty expensive.
By the time we even begin looking at the tuple we're gonna compress,
we've already spent something like half the total effort, and of
course we have to go further than that before we know whether our
attempt to compress is actually going anywhere.  I think that's the
central problem here.  pglz has several safeguards to ensure that it
doesn't do too much work in vain: we abort if we find nothing
compressible within first_success_by bytes, or if we emit enough total
output to be certain that we won't meet the need_rate threshold.
Those safeguards are a lot less effective here because they can't be
applied until *after* we've already paid the cost of building the
history.  If we could figure out some way to apply those guards, or
other guards, earlier in the algorithm, we could do a better job
mitigating the worst-case scenarios, but I don't have a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-05 Thread Gavin Flower


On 06/02/14 16:59, Robert Haas wrote:

On Wed, Feb 5, 2014 at 6:43 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

So, I came up with the attached worst case test, modified from your latest
test suite.

unpatched:


testname   | wal_generated | duration
--+---+--
  ten long fields, all but one changed | 343385312 | 2.20806908607483
  ten long fields, all but one changed | 336263592 | 2.18997097015381
  ten long fields, all but one changed | 336264504 | 2.17843413352966
(3 rows)


pgrb_delta_encoding_v8.patch:

testname   | wal_generated | duration
--+---+--
  ten long fields, all but one changed | 338356944 | 3.33501315116882
  ten long fields, all but one changed | 344059272 | 3.37364101409912
  ten long fields, all but one changed | 336257840 | 3.36244201660156
(3 rows)

So with this test, the overhead is very significant.

Yuck.  Well that sucks.


With the skipping logic, another kind of worst case case is that you have
a lot of similarity between the old and new tuple, but you miss it because
you skip. For example, if you change the first few columns, but leave a
large text column at the end of the tuple unchanged.

I suspect there's no way to have our cake and eat it, too.  Most of
the work that Amit has done on this patch in the last few revs is to
cut back CPU overhead in the cases where the patch can't help because
the tuple has been radically modified.  If we're trying to get maximum
compression, we need to go the other way: for example, we could just
feed both the old and new tuples through pglz (or snappy, or
whatever).  That would allow us to take advantage not only of
similarity between the old and new tuples but also internal
duplication within either the old or the new tuple, but it would also
cost more CPU.  The concern with minimizing overhead in cases where
the compression doesn't help has thus far pushed us in the opposite
direction, namely passing over compression opportunities that a more
aggressive algorithm could find in order to keep the overhead low.

Off-hand, I'm wondering why we shouldn't apply the same skipping
algorithm that Amit is using at the beginning of the string for the
rest of it as well.  It might be a little too aggressive (maybe the
skip distance shouldn't increase by quite as much as doubling every
time, or not beyond 16/32 bytes?) but I don't see why the general
principle isn't sound wherever we are in the tuple.

Unfortunately, despite changing things to make a history entry only
every 4th character, building the history is still pretty expensive.
By the time we even begin looking at the tuple we're gonna compress,
we've already spent something like half the total effort, and of
course we have to go further than that before we know whether our
attempt to compress is actually going anywhere.  I think that's the
central problem here.  pglz has several safeguards to ensure that it
doesn't do too much work in vain: we abort if we find nothing
compressible within first_success_by bytes, or if we emit enough total
output to be certain that we won't meet the need_rate threshold.
Those safeguards are a lot less effective here because they can't be
applied until *after* we've already paid the cost of building the
history.  If we could figure out some way to apply those guards, or
other guards, earlier in the algorithm, we could do a better job
mitigating the worst-case scenarios, but I don't have a good idea.

Surely the weighting should be done according to the relative scarcity 
of processing power vs I/O bandwidth? I get the impression that 
different workloads and hardware configurations may favour conserving 
one of processor or I/O resources.  Would it be feasible to have 
different logic, depending on the trade-offs identified?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Robert Haas

On Tue, Feb 4, 2014 at 12:39 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 Now there is approximately 1.4~5% CPU gain for
 hundred tiny fields, half nulled case

I don't want to advocate too strongly for this patch because, number
one, Amit is a colleague and more importantly, number two, I can't
claim to be an expert on compression.  But that having been said, I
think these numbers are starting to look awfully good.  The only
remaining regressions are in the cases where a large fraction of the
tuple turns over, and they're not that big even then.  The two *worst*
tests now seem to be hundred tiny fields, all changed and hundred
tiny fields, half changed.  For the all changed case, the median
unpatched time is 16.3172590732574 and the median patched time is
16.9294109344482, a 4% loss; for the half changed case, the median
unpatched time is 16.5795118808746 and the median patched time is
17.0454230308533, a 3% loss.  Both cases show minimal change in WAL
volume.

Meanwhile, in friendlier cases, like one short and one long field, no
change, we're seeing big improvements.  That particular case shows a
speedup of 21% and a WAL reduction of 36%.  That's a pretty big deal,
and I think not unrepresentative of many real-world workloads.  Some
might well do better, having either more or longer unchanged fields.
Assuming that the logic isn't buggy, a point in need of further study,
I'm starting to feel like we want to have this.  And I might even be
tempted to remove the table-level off switch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Bruce Momjian

On Tue, Feb  4, 2014 at 01:28:38PM -0500, Robert Haas wrote:
 Meanwhile, in friendlier cases, like one short and one long field, no
 change, we're seeing big improvements.  That particular case shows a
 speedup of 21% and a WAL reduction of 36%.  That's a pretty big deal,
 and I think not unrepresentative of many real-world workloads.  Some
 might well do better, having either more or longer unchanged fields.
 Assuming that the logic isn't buggy, a point in need of further study,
 I'm starting to feel like we want to have this.  And I might even be
 tempted to remove the table-level off switch.

Does this feature relate to compression of WAL page images at all?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Andres Freund

On 2014-02-04 14:09:57 -0500, Bruce Momjian wrote:
 On Tue, Feb  4, 2014 at 01:28:38PM -0500, Robert Haas wrote:
  Meanwhile, in friendlier cases, like one short and one long field, no
  change, we're seeing big improvements.  That particular case shows a
  speedup of 21% and a WAL reduction of 36%.  That's a pretty big deal,
  and I think not unrepresentative of many real-world workloads.  Some
  might well do better, having either more or longer unchanged fields.
  Assuming that the logic isn't buggy, a point in need of further study,
  I'm starting to feel like we want to have this.  And I might even be
  tempted to remove the table-level off switch.
 
 Does this feature relate to compression of WAL page images at all?

No.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Bruce Momjian

On Tue, Feb  4, 2014 at 08:11:18PM +0100, Andres Freund wrote:
 On 2014-02-04 14:09:57 -0500, Bruce Momjian wrote:
  On Tue, Feb  4, 2014 at 01:28:38PM -0500, Robert Haas wrote:
   Meanwhile, in friendlier cases, like one short and one long field, no
   change, we're seeing big improvements.  That particular case shows a
   speedup of 21% and a WAL reduction of 36%.  That's a pretty big deal,
   and I think not unrepresentative of many real-world workloads.  Some
   might well do better, having either more or longer unchanged fields.
   Assuming that the logic isn't buggy, a point in need of further study,
   I'm starting to feel like we want to have this.  And I might even be
   tempted to remove the table-level off switch.
  
  Does this feature relate to compression of WAL page images at all?
 
 No.

I guess it bothers me we are working on compressing row change sets
while the majority(?) of WAL is page images.  I know we had a page image
compression patch that got stalled.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Peter Geoghegan

On Tue, Feb 4, 2014 at 11:11 AM, Andres Freund and...@2ndquadrant.com wrote:
 Does this feature relate to compression of WAL page images at all?

 No.

So the obvious question is: where, if anywhere, do the two efforts
(this patch, and Fujii's patch) overlap? Does Fujii have any concerns
about this patch as it relates to his effort to compress FPIs?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Andres Freund

On February 4, 2014 10:50:10 PM CET, Peter Geoghegan p...@heroku.com wrote:
On Tue, Feb 4, 2014 at 11:11 AM, Andres Freund and...@2ndquadrant.com
wrote:
 Does this feature relate to compression of WAL page images at all?

 No.

So the obvious question is: where, if anywhere, do the two efforts
(this patch, and Fujii's patch) overlap? Does Fujii have any concerns
about this patch as it relates to his effort to compress FPIs?

I think there's zero overlap. They're completely complimentary features. It's 
not like normal WAL records have an irrelevant volume.

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Peter Geoghegan

On Tue, Feb 4, 2014 at 1:58 PM, Andres Freund and...@2ndquadrant.com wrote:
 I think there's zero overlap. They're completely complimentary features. It's 
 not like normal WAL records have an irrelevant volume.

I'd have thought so too, but I would not like to assume. Like many
people commenting on this thread, I don't know very much about
compression.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-04 Thread Amit Kapila

On Tue, Feb 4, 2014 at 11:58 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Feb 4, 2014 at 12:39 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 Now there is approximately 1.4~5% CPU gain for
 hundred tiny fields, half nulled case

 Assuming that the logic isn't buggy, a point in need of further study,
 I'm starting to feel like we want to have this.  And I might even be
 tempted to remove the table-level off switch.

I have tried to stress on worst case more, as you are thinking to
remove table-level switch and found that even if we increase the
data by approx. 8 times (ten long fields, all changed, each field contains
80 byte data), the CPU overhead is still  5% which clearly shows that
the overhead doesn't increase much even if the length of unmatched data
is increased by much larger factor.
So the data for worst case adds more weight to your statement
(remove table-level switch), however there is no harm in keeping
table-level option with default as 'true' and if some users are really sure
the updates in their system will have nothing in common, then they can
make this new option as 'false'.

Below is data for the new case  ten long fields, all changed added
in attached script file:

Unpatched
   testname   | wal_generated | duration
--+---+--
 ten long fields, all changed |3473999520 | 45.0375978946686
 ten long fields, all changed |3473999864 | 45.2536928653717
 ten long fields, all changed |3474006880 | 45.1887288093567


After pgrb_delta_encoding_v8.patch
--
  testname   | wal_generated | duration
--+---+--
 ten long fields, all changed |3474006456 | 47.5744359493256
 ten long fields, all changed |3474000136 | 47.3830440044403
 ten long fields, all changed |3474002688 | 46.9923310279846



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


wal-update-testsuite.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-31 Thread Amit Kapila

On Fri, Jan 31, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 30, 2014 at 12:23 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Jan 29, 2014 at 8:13 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 After basic verification of  back-to-pglz-like-delta-encoding-1, I will
 take the data with both the patches and report the same.

 I have corrected the problems reported in back-to-pglz-like-delta-encoding-1
 and removed hindex from pgrb_delta_encoding_v6 and attached are
 new versions of both patches.

 I/O Reduction Data
 -
 Non-Default settings
 autovacuum = off
 checkpoitnt_segments = 256
 checkpoint_timeout =15min

 Observations
 
 1. With both the patches WAL reduction is similar i.e ~37% for
 one short and one long field, no change and 12% for
 hundred tiny fields, half nulled
 2. With pgrb_delta_encoding_v7, there is ~19% CPU reduction for best
 case one short and one long field, no change.
 3. With pgrb_delta_encoding_v7, there is approximately 8~9% overhead
 for cases where there is no match
 4. With pgrb_delta_encoding_v7, there is approximately 15~18% overhead
 for hundred tiny fields, half nulled case
 5. With back-to-pglz-like-delta-encoding-2, the data is mostly similar except
 for hundred tiny fields, half nulled where CPU overhead is much more.

 The case (hundred tiny fields, half nulled) where CPU overhead is visible
 is due to repetitive data and if take some random or different data, it will 
 not
 be there.

To verify this theory, I have added one new test which is almost similar to
hundred tiny fields, half nulled, the difference is that it has
non-repetive string
 and the results are as below:

Unpatch
--
testname   | wal_generated |
  duration
--+---+--
 nine short and one long field, thirty percent change | 698912496
| 12.1819660663605
 nine short and one long field, thirty percent change | 698906048
| 11.9409539699554
 nine short and one long field, thirty percent change | 698910904
| 11.9367880821228

Patch pgrb_delta_encoding_v7


   testname   | wal_generated
| duration
--+---+--
 nine short and one long field, thirty percent change | 559840840
| 11.6027710437775
 nine short and one long field, thirty percent change | 559829440
| 11.8239741325378
 nine short and one long field, thirty percent change | 560141352
| 11.6789472103119

Patch back-to-pglz-like-delta-encoding-2
--

  testname   | wal_generated |
duration
--+---+--
 nine short and one long field, thirty percent change | 544391432
| 12.3666560649872
 nine short and one long field, thirty percent change | 544378616
| 11.8833730220795
 nine short and one long field, thirty percent change | 544376888
| 11.9487581253052
(3 rows)


Basic idea of new test is that some part of tuple is unchanged and
other part is changed, here the unchanged part contains random string
rather than repetitive set of chars.
The new test is added with other tests in attached file.

Observation
---
LZ like delta encoding has more WAL reduction and chunk wise encoding
has bit better CPU usage, but overall both are almost similar.

 I think the main reason for overhead is that we store last offset
 of matching data in history at front, so during match, it has to traverse back
 many times to find longest possible match and in real world it won't be the
 case that most of history entries contain same hash index, so it should not
 effect.

If we want to improve CPU usage for cases like hundred tiny fields,
half nulled
(which I think is not important), forming history table by traversing from end
rather than beginning, can serve the purpose, I have not tried it but I think
it can certainly help.

Do you think overall data is acceptable?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


wal-update-testsuite.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-30 Thread Amit Kapila

On Thu, Jan 30, 2014 at 12:23 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Jan 29, 2014 at 8:13 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 Few observations in patch (back-to-pglz-like-delta-encoding-1):

 1.
 +#define pgrb_hash_unroll(_p, hindex) \
 + hindex = hindex ^ ((_p)[0]  8)

 shouldn't it shift by 6 rather than by 8.

 2.
 + if (bp - bstart = result_max)
 + return false;

 I think for nothing or lesser match case it will traverse whole tuple.
 Can we optimize such that if there is no match till 75%, we can bail out.
 Ideally, I think if we don't find any match in first 50 to 60%, we should
 come out.

 3. pg_rbcompress.h is missing.

 I am still working on patch back-to-pglz-like-delta-encoding-1 to see if
 it works well for all cases, but thought of sharing what I have done till
 now to you.

 After basic verification of  back-to-pglz-like-delta-encoding-1, I will
 take the data with both the patches and report the same.

Apart from above, the only other thing which I found problematic is
below code in find match

+ while (*ip == *hp  thislen  maxlen)
+ thislen++;

It should be
while (*ip++ == *hp++  thislen  maxlen)

Please confirm.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-29 Thread Heikki Linnakangas


On 01/28/2014 07:01 PM, Heikki Linnakangas wrote:

On 01/27/2014 07:03 PM, Amit Kapila wrote:

I have tried to improve algorithm in another way so that we can get
benefit of same chunks during find match (something similar to lz).
The main change is to consider chunks at fixed boundary (4 byte)
and after finding match, try to find if there is a longer match than
current chunk. While finding longer match, it still takes care that
next bigger match should be at chunk boundary. I am not
completely sure about the chunk boundary may be 8 or 16 can give
better results.


Since you're only putting a value in the history every 4 bytes, you
wouldn't need to calculate the hash in a rolling fashion. You could just
take next four bytes, calculate hash, put it in history table. Then next
four bytes, calculate hash, and so on. Might save some cycles when
building the history table...


On a closer look, you're putting a chunk in the history table only every 
four bytes, but you're *also* checking the history table for a match 
only every four bytes. That completely destroys the shift-resistence of 
the algorithm. For example, if the new tuple is an exact copy of the old 
tuple, except for one additional byte in the beginning, the algorithm 
would fail to recognize that. It would be good to add a test case like 
that in the test suite.


You can skip bytes when building the history table, or when finding 
matches, but not both. Or you could skip N bytes, and then check for 
matches for the next four bytes, then skip again and so forth, as long 
as you always check four consecutive bytes (because the hash function is 
calculated from four bytes).


I couldn't resist the challenge, and started hacking this. First, some 
observations from your latest patch (pgrb_delta_encoding_v5.patch):


1. There are a lot of comments and code that refers to chunks, which 
seem obsolete. For example, ck_size field in PGRB_HistEntry is always 
set to a constant, 4, except maybe at the very end of the history 
string. The algorithm has nothing to do with Rabin-Karp anymore.


2. The 'hindex' field in PGRB_HistEntry is unused. Also, ck_start_pos is 
redundant with the index of the entry in the array, because the array is 
filled in order. That only leaves us just the 'next' field, and that can 
be represented as a int16 rather than a pointer. So, we really only need 
a simple int16 array as the history entries array.


3. You're not gaining much by calculating the hash in a rolling fashion. 
A single rolling step requires two multiplications and two sums, plus 
shifting the variables around. Calculating the hash from scratch 
requires three multiplications and three sums.


4. Given that we're not doing the Rabin-Karp variable-length chunk 
thing, we could use a cheaper hash function to begin with. Like, the one 
used in pglz. The multiply-by-prime method probably gives fewer 
collisions than pglz's shift/xor method, but I don't think that matters 
as much as computation speed. No-one has mentioned or measured the 
effect of collisions in this thread; that either means that it's a 
non-issue or that no-one's just realized how big a problem it is yet. 
I'm guessing that it's not a problem, and if it is, it's mitigated by 
only trying to find matches every N bytes; collisions would make finding 
matches slower, and that's exactly what skipping helps with.


After addressing the above, we're pretty much back to PGLZ approach. I 
kept the change to only find matches every four bytes, that does make 
some difference. And I like having this new encoding code in a separate 
file, not mingled with pglz stuff, it's sufficiently different that 
that's better. I haven't done all much testing with this, so take it 
with a grain of salt.


I don't know if this is better or worse than the other patches that have 
been floated around, but I though I might as well share it..


- Heikki
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index e0b8a4e..c4ac2bd 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1014,6 +1014,22 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 /listitem
/varlistentry
 
+   varlistentry
+termliteralwal_compress_update/ (typeboolean/)/term
+listitem
+ para
+  Enables or disables the WAL tuple compression for commandUPDATE/
+  on this table.  Default value of this option is false to maintain
+  backward compatability for the command. If true, all the update
+  operations on this table which will place the new tuple on same page
+  as it's original tuple will compress the WAL for new tuple and
+  subsequently reduce the WAL volume.  It is recommended to enable
+  this option for tables where commandUPDATE/ changes less than
+  50 percent of tuple data.
+ /para
+ /listitem
+/varlistentry
+
/variablelist
 
   /refsect2
diff --git

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-29 Thread Amit Kapila

On Wed, Jan 29, 2014 at 3:41 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/28/2014 07:01 PM, Heikki Linnakangas wrote:

 On 01/27/2014 07:03 PM, Amit Kapila wrote:

 I have tried to improve algorithm in another way so that we can get
 benefit of same chunks during find match (something similar to lz).
 The main change is to consider chunks at fixed boundary (4 byte)
 and after finding match, try to find if there is a longer match than
 current chunk. While finding longer match, it still takes care that
 next bigger match should be at chunk boundary. I am not
 completely sure about the chunk boundary may be 8 or 16 can give
 better results.


 Since you're only putting a value in the history every 4 bytes, you
 wouldn't need to calculate the hash in a rolling fashion. You could just
 take next four bytes, calculate hash, put it in history table. Then next
 four bytes, calculate hash, and so on. Might save some cycles when
 building the history table...

First of all thanks for looking into patch.

Yes, this is right, we can save cycles by not doing rolling during hash
calculation and I was working to improve the patch on those lines. Earlier
it was their because of rabin's delta encoding where we need to check
for a special match after each byte.


 On a closer look, you're putting a chunk in the history table only every
 four bytes, but you're *also* checking the history table for a match only
 every four bytes. That completely destroys the shift-resistence of the
 algorithm.

You are right that it will loose the shift-resistence and even Robert has
pointed me this, that's why he wants to maintain the property of special
bytes at chunk boundaries as mentioned in Rabin encoding. The only
real reason to shift to fixed size was to improve CPU usage and I
thought most cases in Update will update the fixed length columns,
but it might not be true.

 For example, if the new tuple is an exact copy of the old tuple,
 except for one additional byte in the beginning, the algorithm would fail to
 recognize that. It would be good to add a test case like that in the test
 suite.

 You can skip bytes when building the history table, or when finding matches,
 but not both. Or you could skip N bytes, and then check for matches for the
 next four bytes, then skip again and so forth, as long as you always check
 four consecutive bytes (because the hash function is calculated from four
 bytes).

Can we do something like:
Build Phase
a. Calculate the hash and add the entry in history table at every 4 bytes.

Match Phase
a. Calculate the hash in rolling fashion and try to find match at every byte.
b. When match is found then skip only in chunks, something like I was
doing in find match function
+
+ /* consider only complete chunk matches. */
+ if (history_chunk_size == 0)
+ thislen += PGRB_MIN_CHUNK_SIZE;
+ }

Will this address the concern?

The main reason to process in chunks as much as possible is to save
cpu cycles. For example if we build hash table byte-by-byte, then even
for best case where most of tuple has a match, it will have reasonable
overhead due to formation of hash table.


 I couldn't resist the challenge, and started hacking this. First, some
 observations from your latest patch (pgrb_delta_encoding_v5.patch):

 1. There are a lot of comments and code that refers to chunks, which seem
 obsolete. For example, ck_size field in PGRB_HistEntry is always set to a
 constant, 4, except maybe at the very end of the history string. The
 algorithm has nothing to do with Rabin-Karp anymore.

 2. The 'hindex' field in PGRB_HistEntry is unused. Also, ck_start_pos is
 redundant with the index of the entry in the array, because the array is
 filled in order. That only leaves us just the 'next' field, and that can be
 represented as a int16 rather than a pointer. So, we really only need a
 simple int16 array as the history entries array.

 3. You're not gaining much by calculating the hash in a rolling fashion. A
 single rolling step requires two multiplications and two sums, plus shifting
 the variables around. Calculating the hash from scratch requires three
 multiplications and three sums.

 4. Given that we're not doing the Rabin-Karp variable-length chunk thing, we
 could use a cheaper hash function to begin with. Like, the one used in pglz.
 The multiply-by-prime method probably gives fewer collisions than pglz's
 shift/xor method, but I don't think that matters as much as computation
 speed. No-one has mentioned or measured the effect of collisions in this
 thread; that either means that it's a non-issue or that no-one's just
 realized how big a problem it is yet. I'm guessing that it's not a problem,
 and if it is, it's mitigated by only trying to find matches every N bytes;
 collisions would make finding matches slower, and that's exactly what
 skipping helps with.

 After addressing the above, we're pretty much back to PGLZ approach.

Here during match phase, I think we can avoid copying literal

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-29 Thread Heikki Linnakangas


On 01/29/2014 02:21 PM, Amit Kapila wrote:

On Wed, Jan 29, 2014 at 3:41 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

For example, if the new tuple is an exact copy of the old tuple,
except for one additional byte in the beginning, the algorithm would fail to
recognize that. It would be good to add a test case like that in the test
suite.

You can skip bytes when building the history table, or when finding matches,
but not both. Or you could skip N bytes, and then check for matches for the
next four bytes, then skip again and so forth, as long as you always check
four consecutive bytes (because the hash function is calculated from four
bytes).


Can we do something like:
Build Phase
a. Calculate the hash and add the entry in history table at every 4 bytes.

Match Phase
a. Calculate the hash in rolling fashion and try to find match at every byte.


Sure, that'll work. However, I believe it's cheaper to add entries to 
the history table at every byte, and check for a match every 4 bytes. I 
think you'll get more or less the same level of compression either way, 
but adding to the history table is cheaper than checking for matches, 
and we'd rather do the cheap thing more often than the expensive thing.



b. When match is found then skip only in chunks, something like I was
 doing in find match function
+
+ /* consider only complete chunk matches. */
+ if (history_chunk_size == 0)
+ thislen += PGRB_MIN_CHUNK_SIZE;
+ }

Will this address the concern?


Hmm, so when checking if a match is truly a match, you compare the 
strings four bytes at a time rather than byte-by-byte? That might work, 
but I don't think that's a hot spot currently. In the profiling I did, 
with a nothing matches test case, all the cycles were spent in the 
history building, and finding matches. Finding out how long a match is 
was negligible. Of course, it might be a different story with input 
where the encoding helps and you have matches, but I think we were doing 
pretty well in those cases already.



The main reason to process in chunks as much as possible is to save
cpu cycles. For example if we build hash table byte-by-byte, then even
for best case where most of tuple has a match, it will have reasonable
overhead due to formation of hash table.


Hmm. One very simple optimization we could do is to just compare the two 
strings byte by byte, before doing anything else, to find any common 
prefix they might have. Then output a tag for the common prefix, and run 
the normal algorithm on the rest of the strings. In many real-world 
tables, the 1-2 first columns are a key that never changes, so that 
might work pretty well in practice. Maybe it would also be worthwhile to 
do the same for any common suffix the tuples might have.


That would fail to find matches where you e.g. update the last column to 
have the same value as the first column, and change nothing else, but 
that's ok. We're not aiming for the best possible compression, just 
trying to avoid WAL-logging data that wasn't changed.



Here during match phase, I think we can avoid copying literal bytes until
a match is found, that can save cycles for cases when old and new
tuple are mostly different.


I think the extra if's in the loop will actually cost you more cycles 
than you save. You could perhaps have two copies of the main 
match-finding loop though. First, loop without outputting anything, 
until you find the first match. Then, output anything up to that point 
as literals. Then fall into the second loop, which outputs any 
non-matches byte by byte.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-28 Thread Heikki Linnakangas


On 01/27/2014 07:03 PM, Amit Kapila wrote:

I have tried to improve algorithm in another way so that we can get
benefit of same chunks during find match (something similar to lz).
The main change is to consider chunks at fixed boundary (4 byte)
and after finding match, try to find if there is a longer match than
current chunk. While finding longer match, it still takes care that
next bigger match should be at chunk boundary. I am not
completely sure about the chunk boundary may be 8 or 16 can give
better results.


Since you're only putting a value in the history every 4 bytes, you 
wouldn't need to calculate the hash in a rolling fashion. You could just 
take next four bytes, calculate hash, put it in history table. Then next 
four bytes, calculate hash, and so on. Might save some cycles when 
building the history table...


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-28 Thread Jinyu

I think sort by string column is lower during merge join,  maybe comparing 
function in sort need be refined to save some cycle. It’s the hot function when 
do sort.  


Heikki Linnakangas hlinnakan...@vmware.com编写：

On 01/27/2014 07:03 PM, Amit Kapila wrote:
 I have tried to improve algorithm in another way so that we can get
 benefit of same chunks during find match (something similar to lz).
 The main change is to consider chunks at fixed boundary (4 byte)
 and after finding match, try to find if there is a longer match than
 current chunk. While finding longer match, it still takes care that
 next bigger match should be at chunk boundary. I am not
 completely sure about the chunk boundary may be 8 or 16 can give
 better results.

Since you're only putting a value in the history every 4 bytes, you 
wouldn't need to calculate the hash in a rolling fashion. You could just 
take next four bytes, calculate hash, put it in history table. Then next 
four bytes, calculate hash, and so on. Might save some cycles when 
building the history table...

- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-27 Thread Robert Haas

On Mon, Jan 27, 2014 at 12:03 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think that's a good thing to try.  Can you code it up?

 I have tried to improve algorithm in another way so that we can get
 benefit of same chunks during find match (something similar to lz).
 The main change is to consider chunks at fixed boundary (4 byte)
 and after finding match, try to find if there is a longer match than
 current chunk. While finding longer match, it still takes care that
 next bigger match should be at chunk boundary. I am not
 completely sure about the chunk boundary may be 8 or 16 can give
 better results.

 I think now we can once run with this patch on high end m/c.

Here are the results I got on the community PPC64 box.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgrb-v5 test.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-21 Thread Robert Haas

On Tue, Jan 21, 2014 at 2:00 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 9:49 PM, Robert Haas robertmh...@gmail.com wrote:
 I ran Heikki's test suit on latest master and latest master plus
 pgrb_delta_encoding_v4.patch on a PPC64 machine, but the results
 didn't look too good.  The only tests where the WAL volume changed by
 more than half a percent were the one short and one long field, no
 change test, where it dropped by 17%, but at the expense of an
 increase in duration of 38%; and the hundred tiny fields, half
 nulled test, where it dropped by 2% without a change in runtime.

 Unfortunately, some of the tests where WAL didn't change significantly
 took a runtime hit - in particular, hundred tiny fields, half
 changed slowed down by 10% and hundred tiny fields, all changed by
 8%.

 I think this part of result is positive, as with earlier approaches here the
 dip was  20%. Refer the result posted at link:
 http://www.postgresql.org/message-id/51366323.8070...@vmware.com


  I've attached the full results in OpenOffice format.

 Profiling the one short and one long field, no change test turns up
 the following:

 51.38% postgres  pgrb_delta_encode
 23.58% postgres  XLogInsert
  2.54% postgres  heap_update
  1.09% postgres  LWLockRelease
  0.90% postgres  LWLockAcquire
  0.89% postgres  palloc0
  0.88% postgres  log_heap_update
  0.84% postgres  HeapTupleSatisfiesMVCC
  0.75% postgres  ExecModifyTable
  0.73% postgres  hash_search_with_hash_value

 Yipes.  That's a lot more than I remember this costing before.  And I
 don't understand why I'm seeing such a large time hit on this test
 where you actually saw a significant time *reduction*.  One
 possibility is that you may have been running with a default
 checkpoint_segments value or one that's low enough to force
 checkpointing activity during the test.  I ran with
 checkpoint_segments=300.

 I ran with checkpoint_segments = 128 and when I ran with v4, I also
 see similar WAL reduction as you are seeing, except that in my case
 runtime for both are almost similar (i think in your case disk writes are
 fast, so CPU overhead is more visible).
 I think the major difference in above test is due to below part of code:

 pgrb_find_match()
 {
 ..
 + /* if (match_chunk)
 + {
 + while (*ip == *hp)
 + {
 + matchlen++;
 + ip++;
 + hp++;
 + }
 + } */
 }

 Basically if we don't go for longer match, then for test where most data
 (one short and one long field, no change) is similar, it has to do below
 extra steps with no advantage:
 a. copy extra tags
 b. calculation for rolling hash
 c. finding the match
 I think here major cost is due to 'a', but others might also not be free.
 To confirm the theory, if we run the test by just un-commenting above
 code, there can be significant change in both WAL reduction and
 runtime for this test.

 I have one idea to avoid the overhead of step a) which is to combine
 the tags, means don't write the tag until it founds any un-matching data.
 When any un-matched data is found, then combine all the previously
 matched data and write it as one tag.
 This should eliminate the overhead due to step a.

I think that's a good thing to try.  Can you code it up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-20 Thread Robert Haas

On Thu, Jan 16, 2014 at 12:07 AM, Amit Kapila amit.kapil...@gmail.com wrote:
Okay, got your point.
Another minor thing is that in latest patch which I have sent yesterday,
I have modified it such that while formation of chunks if there is a data
at end of string which doesn't have special pattern and is less than max
chunk size, we also consider that as a chunk. The reason of doing this
was that let us say if we have 104 bytes string which contains no special
bit pattern, then it will just have one 64 byte chunk and will leave the
remaining bytes, which might miss the chance of doing compression for
that data.

I ran Heikki's test suit on latest master and latest master plus
pgrb_delta_encoding_v4.patch on a PPC64 machine, but the results
didn't look too good.  The only tests where the WAL volume changed by
more than half a percent were the one short and one long field, no
change test, where it dropped by 17%, but at the expense of an
increase in duration of 38%; and the hundred tiny fields, half
nulled test, where it dropped by 2% without a change in runtime.
Unfortunately, some of the tests where WAL didn't change significantly
took a runtime hit - in particular, hundred tiny fields, half
changed slowed down by 10% and hundred tiny fields, all changed by
8%.  I've attached the full results in OpenOffice format.

Profiling the one short and one long field, no change test turns up
the following:

51.38% postgres  pgrb_delta_encode
23.58% postgres  XLogInsert
 2.54% postgres  heap_update
 1.09% postgres  LWLockRelease
 0.90% postgres  LWLockAcquire
 0.89% postgres  palloc0
 0.88% postgres  log_heap_update
 0.84% postgres  HeapTupleSatisfiesMVCC
 0.75% postgres  ExecModifyTable
 0.73% postgres  hash_search_with_hash_value

Yipes.  That's a lot more than I remember this costing before.  And I
don't understand why I'm seeing such a large time hit on this test
where you actually saw a significant time *reduction*.  One
possibility is that you may have been running with a default
checkpoint_segments value or one that's low enough to force
checkpointing activity during the test.  I ran with
checkpoint_segments=300.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgrb_delta_encoding_v4.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-20 Thread Amit Kapila

On Mon, Jan 20, 2014 at 9:49 PM, Robert Haas robertmh...@gmail.com wrote:

 I ran Heikki's test suit on latest master and latest master plus
 pgrb_delta_encoding_v4.patch on a PPC64 machine, but the results
 didn't look too good.  The only tests where the WAL volume changed by
 more than half a percent were the one short and one long field, no
 change test, where it dropped by 17%, but at the expense of an
 increase in duration of 38%; and the hundred tiny fields, half
 nulled test, where it dropped by 2% without a change in runtime.

 Unfortunately, some of the tests where WAL didn't change significantly
 took a runtime hit - in particular, hundred tiny fields, half
 changed slowed down by 10% and hundred tiny fields, all changed by
 8%.

I think this part of result is positive, as with earlier approaches here the
dip was  20%. Refer the result posted at link:
http://www.postgresql.org/message-id/51366323.8070...@vmware.com


  I've attached the full results in OpenOffice format.

 Profiling the one short and one long field, no change test turns up
 the following:

 51.38% postgres  pgrb_delta_encode
 23.58% postgres  XLogInsert
  2.54% postgres  heap_update
  1.09% postgres  LWLockRelease
  0.90% postgres  LWLockAcquire
  0.89% postgres  palloc0
  0.88% postgres  log_heap_update
  0.84% postgres  HeapTupleSatisfiesMVCC
  0.75% postgres  ExecModifyTable
  0.73% postgres  hash_search_with_hash_value

 Yipes.  That's a lot more than I remember this costing before.  And I
 don't understand why I'm seeing such a large time hit on this test
 where you actually saw a significant time *reduction*.  One
 possibility is that you may have been running with a default
 checkpoint_segments value or one that's low enough to force
 checkpointing activity during the test.  I ran with
 checkpoint_segments=300.

I ran with checkpoint_segments = 128 and when I ran with v4, I also
see similar WAL reduction as you are seeing, except that in my case
runtime for both are almost similar (i think in your case disk writes are
fast, so CPU overhead is more visible).
I think the major difference in above test is due to below part of code:

pgrb_find_match()
{
..
+ /* if (match_chunk)
+ {
+ while (*ip == *hp)
+ {
+ matchlen++;
+ ip++;
+ hp++;
+ }
+ } */
}

Basically if we don't go for longer match, then for test where most data
(one short and one long field, no change) is similar, it has to do below
extra steps with no advantage:
a. copy extra tags
b. calculation for rolling hash
c. finding the match
I think here major cost is due to 'a', but others might also not be free.
To confirm the theory, if we run the test by just un-commenting above
code, there can be significant change in both WAL reduction and
runtime for this test.

I have one idea to avoid the overhead of step a) which is to combine
the tags, means don't write the tag until it founds any un-matching data.
When any un-matched data is found, then combine all the previously
matched data and write it as one tag.
This should eliminate the overhead due to step a.

Can we think of anyway in which inspite of doing longer matches, we
can retain the sanctity of this approach?
One way could be to check if the match after chunk is long enough
that it matches rest of the string, but I think it can create problems
in some other cases.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-20 Thread Peter Geoghegan

On Mon, Nov 25, 2013 at 6:55 PM, Robert Haas robertmh...@gmail.com wrote:
 But even if that doesn't
 pan out, I think the fallback position should not be OK, well, if we
 can't get decreased I/O for free then forget it but rather OK, if we
 can't get decreased I/O for free then let's get decreased I/O in
 exchange for increased CPU usage.

While I haven't been following the development of this patch, I will
note that on the face of it the latter seem like a trade-off I'd be
quite willing to make.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-16 Thread Robert Haas

On Thu, Jan 16, 2014 at 12:07 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 16, 2014 at 12:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 15, 2014 at 7:28 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Unpatched
 ---
 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change |1054923224 |  33.101135969162

 After pgrb_delta_encoding_v4
 -

 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 877859144 | 30.6749138832092


 Temporary Changes
 (Revert Max Chunksize = 4 and logic of finding longer match)
 -

  testname| wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 677337304 | 25.4048750400543

 Sure, but watch me not care.

 If we're interested in taking advantage of the internal
 compressibility of tuples, we can do a lot better than this patch.  We
 can compress the old tuple and the new tuple.  We can compress
 full-page images.  We can compress inserted tuples.  But that's not
 the point of this patch.

 The point of *this* patch is to exploit the fact that the old and new
 tuples are likely to be very similar, NOT to squeeze out every ounce
 of compression from other sources.

Okay, got your point.
Another minor thing is that in latest patch which I have sent yesterday,
I have modified it such that while formation of chunks if there is a data
at end of string which doesn't have special pattern and is less than max
chunk size, we also consider that as a chunk. The reason of doing this
was that let us say if we have 104 bytes string which contains no special
bit pattern, then it will just have one 64 byte chunk and will leave the
remaining bytes, which might miss the chance of doing compression for
that data.

Yeah, that sounds right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-15 Thread Amit Kapila

On Wed, Jan 15, 2014 at 5:58 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 9:12 PM, Robert Haas robertmh...@gmail.com wrote:

 Performance Data
 -
 Non-default settings:
 autovacuum =off
 checkpoint_segments =128
 checkpoint_timeout = 10min

 Unpatched
 ---
 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change |1054923224 |  33.101135969162

 After pgrb_delta_encoding_v4
 -

 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 877859144 | 30.6749138832092


 Temporary Changes
 (Revert Max Chunksize = 4 and logic of finding longer match)
 -

  testname| wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 677337304 | 25.4048750400543


Sorry, minor correction in last mail, the last data (Temporary Changes) is
just by reverting logic of finding longer match in pgrb_find_match().

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-15 Thread Robert Haas

On Wed, Jan 15, 2014 at 7:28 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Unpatched
 ---
 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change |1054923224 |  33.101135969162

 After pgrb_delta_encoding_v4
 -

 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 877859144 | 30.6749138832092


 Temporary Changes
 (Revert Max Chunksize = 4 and logic of finding longer match)
 -

  testname| wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 677337304 | 25.4048750400543

Sure, but watch me not care.

If we're interested in taking advantage of the internal
compressibility of tuples, we can do a lot better than this patch.  We
can compress the old tuple and the new tuple.  We can compress
full-page images.  We can compress inserted tuples.  But that's not
the point of this patch.

The point of *this* patch is to exploit the fact that the old and new
tuples are likely to be very similar, NOT to squeeze out every ounce
of compression from other sources.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-15 Thread Amit Kapila

On Thu, Jan 16, 2014 at 12:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 15, 2014 at 7:28 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Unpatched
 ---
 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change |1054923224 |  33.101135969162

 After pgrb_delta_encoding_v4
 -

 testname | wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 877859144 | 30.6749138832092


 Temporary Changes
 (Revert Max Chunksize = 4 and logic of finding longer match)
 -

  testname| wal_generated |
 duration
 --+--+--
  one short and one long field, no change | 677337304 | 25.4048750400543

 Sure, but watch me not care.

 If we're interested in taking advantage of the internal
 compressibility of tuples, we can do a lot better than this patch.  We
 can compress the old tuple and the new tuple.  We can compress
 full-page images.  We can compress inserted tuples.  But that's not
 the point of this patch.

 The point of *this* patch is to exploit the fact that the old and new
 tuples are likely to be very similar, NOT to squeeze out every ounce
 of compression from other sources.

   Okay, got your point.
   Another minor thing is that in latest patch which I have sent yesterday,
   I have modified it such that while formation of chunks if there is a data
   at end of string which doesn't have special pattern and is less than max
   chunk size, we also consider that as a chunk. The reason of doing this
   was that let us say if we have 104 bytes string which contains no special
   bit pattern, then it will just have one 64 byte chunk and will leave the
   remaining bytes, which might miss the chance of doing compression for
   that data.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-14 Thread Robert Haas

On Tue, Jan 14, 2014 at 1:16 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 14, 2014 at 2:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 11, 2014 at 1:08 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Yes, currently this applies to update, what I have in mind is that
 in future if some one wants to use WAL compression for any other
 operation like 'full_page_writes', then it can be easily extendible.

 To be honest, I have not evaluated whether such a flag or compression
 would make sense for full page writes, but I think it should be possible
 while doing full page write (BkpBlock has RelFileNode) to check such a
 flag if it's present.

 Makes sense.

So shall I change it to string instead of bool and keep the name as
compress_wal or compress_wal_for_opr?

No.  If we add full-page-write compression in the future, that can be
a separate option.  But I doubt we'd want to set that at the table
level anyway; there's no particular reason that would be good for some
tables and bad for others (whereas in this case there is such a
reason).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-13 Thread Robert Haas

On Sat, Jan 11, 2014 at 1:08 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Yes, currently this applies to update, what I have in mind is that
 in future if some one wants to use WAL compression for any other
 operation like 'full_page_writes', then it can be easily extendible.

 To be honest, I have not evaluated whether such a flag or compression
 would make sense for full page writes, but I think it should be possible
 while doing full page write (BkpBlock has RelFileNode) to check such a
 flag if it's present.

Makes sense.

 The reason of adding the same chunk in head of list is that it uses same
 technique as pglz_hist_add. Now in pglz, it will not have repeat steps
 from c~f, as it has concept of good_match which leads to get this done in
 one go.

 Being said above, I am really not sure, how much real world data falls
 in above category and should we try to optimize based on above example,
 but yes it will save some CPU cycles in current test we are using.

In the Rabin algorithm, we shouldn't try to find a longer match.  The
match should end at the chunk end, period.  Otherwise, you lose the
shift-resistant property of the algorithm.

But I do think there might be a bug here, which is
 that, unless I'm misinterpreting something, hp is NOT the end of the
 chunk.  After calling pgrb_hash_init(), we've looked at the first FOUR
 bytes of the input.  If we find that we have a zero hash value at that
 point, shouldn't the chunk size be 4, not 1?  And similarly if we find
 it after sucking in one more byte, shouldn't the chunk size be 5, not
 2?  Right now, we're deciding where the chunks should end based on the
 data in the chunk plus the following 3 bytes, and that seems wonky.  I
 would expect us to include all of those bytes in the chunk.

 It depends on how we define chunk, basically chunk size will be based
 on the byte for which we consider hindex. The hindex for any byte is
 calculated considering that byte and the following 3 bytes, so
 after calling pgrb_hash_init(), even though we have looked at 4 bytes
 but still the hindex is for first byte and thats why it consider
 chunk size as 1, not 4.

 Isn't it similar to how current pglz works, basically it also
 uses next 4 bytes to calculate index (pglz_hist_idx) but still
 does byte by byte comparison, here if we try to map to rabin's
 delta encoding then always chunk size is 1.

I don't quite understand this.  The point of the Rabin algorithm is to
split the old tuple up into chunks and then for those chunks in the
new tuple.  For example, suppose the old tuple is
abcdefghijklmnopqrstuvwxyz.  It might get split like this: abcdef
hijklmnopqrstuvw xyz.  If any of those three chunks appear in the new
tuple, then we'll use them for compression.  If not, we'll just copy
the literal bytes.  If the chunks appear in the new tuple reordered or
shifted or with stuff inserted between one chunk at the next, we'll
still find them.  Unless I'm confused, which is possible, what you're
doing is essentially looking at the string and spitting it in those
three places, but then recording the chunks as being three bytes
shorter than they really are.  I don't see how that can be right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-13 Thread Amit Kapila

On Tue, Jan 14, 2014 at 2:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 11, 2014 at 1:08 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Yes, currently this applies to update, what I have in mind is that
 in future if some one wants to use WAL compression for any other
 operation like 'full_page_writes', then it can be easily extendible.

 To be honest, I have not evaluated whether such a flag or compression
 would make sense for full page writes, but I think it should be possible
 while doing full page write (BkpBlock has RelFileNode) to check such a
 flag if it's present.

 Makes sense.

   So shall I change it to string instead of bool and keep the name as
   compress_wal or compress_wal_for_opr?

 The reason of adding the same chunk in head of list is that it uses same
 technique as pglz_hist_add. Now in pglz, it will not have repeat steps
 from c~f, as it has concept of good_match which leads to get this done in
 one go.

 Being said above, I am really not sure, how much real world data falls
 in above category and should we try to optimize based on above example,
 but yes it will save some CPU cycles in current test we are using.

 In the Rabin algorithm, we shouldn't try to find a longer match.  The
 match should end at the chunk end, period.  Otherwise, you lose the
 shift-resistant property of the algorithm.

   Okay, it will work well for cases when most chunks in tuple are due
   due to special pattern in it, but it will loose out on CPU cycles in
   cases where most of the chunks are due to maximum chunk boundary
   and most part of new tuple matches with old tuple. The reason is that
   if the algorithm have some such property of finding longer matches than
   chunk boundaries, then it can save us on calculating hash again and
   again when we try to find match in old tuple.
   However I think it is better to go with rabin's algorithm instead of adding
   optimizations based on our own assumptions, because it is difficult to
   predict the real world tuple data.


 Isn't it similar to how current pglz works, basically it also
 uses next 4 bytes to calculate index (pglz_hist_idx) but still
 does byte by byte comparison, here if we try to map to rabin's
 delta encoding then always chunk size is 1.

 I don't quite understand this.  The point of the Rabin algorithm is to
 split the old tuple up into chunks and then for those chunks in the
 new tuple.  For example, suppose the old tuple is
 abcdefghijklmnopqrstuvwxyz.  It might get split like this: abcdef
 hijklmnopqrstuvw xyz.  If any of those three chunks appear in the new
 tuple, then we'll use them for compression.  If not, we'll just copy
 the literal bytes.  If the chunks appear in the new tuple reordered or
 shifted or with stuff inserted between one chunk at the next, we'll
 still find them.  Unless I'm confused, which is possible, what you're
 doing is essentially looking at the string and spitting it in those
 three places, but then recording the chunks as being three bytes
 shorter than they really are.  I don't see how that can be right.

  Today again spending some time on algorithm, I got the bug you
  are pointing to and you are right in saying that chunk is shorter.
  I think it should not be difficult to address this issue without affecting
  most part of algorithm, let me try to handle it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-10 Thread Robert Haas

 2. Provide a new reloption to specify Wal compression
 for update operation on table
 Create table tbl(c1 char(100)) With (compress_wal = true);

 Alternative options:
 a. compress_wal can take input as operation, e.g. 'insert', 'update',
 b. use alternate syntax:
 Create table tbl(c1 char(100))  Compress Wal For Update;
 c. anything better?

I think WITH (compress_wal = true) is pretty good.  I don't understand
your point about taking the operation as input, because this only
applies to updates.  But we could try to work update into the name
of the setting somehow, so as to be less likely to conflict with
future additions, like maybe wal_compress_update.  I think the
alternate syntax you propose is clearly worse, because it would
involve adding new keywords, something we try to avoid.

The only possible enhancement I can think of here is to make the
setting an integer rather than a Boolean, defined as the minimum
acceptable compression ratio.  A setting of 0 means always compress; a
setting of 100 means never compress; intermediate values define the
least acceptable ratio.  But to be honest, I think that's overkill;
I'd be inclined to hard-code the default value of 25 in the patch and
make it a #define.  The only real advantage of requiring a minimum 25%
compression percentage is that we can bail out on compression
three-quarters of the way through the tuple if we're getting nowhere.
That's fine for what it is, but the idea that users are going to see
much benefit from twaddling that number seems very dubious to me.

 Points to consider
 -

 1. As the current algorithm store the entry for same chunks at head of list,
it will always find last but one chunk (we don't store last 4 bytes) for
long matching string during match phase in encoding (pgrb_delta_encode).

We can improve it either by storing same chunks at end of list instead of 
 at
head or by trying to find a good_match technique used in lz algorithm.
Finding good_match technique can have overhead in some of the cases
when there is actually no match.

I don't see what the good_match thing has to do with anything in the
Rabin algorithm.  But I do think there might be a bug here, which is
that, unless I'm misinterpreting something, hp is NOT the end of the
chunk.  After calling pgrb_hash_init(), we've looked at the first FOUR
bytes of the input.  If we find that we have a zero hash value at that
point, shouldn't the chunk size be 4, not 1?  And similarly if we find
it after sucking in one more byte, shouldn't the chunk size be 5, not
2?  Right now, we're deciding where the chunks should end based on the
data in the chunk plus the following 3 bytes, and that seems wonky.  I
would expect us to include all of those bytes in the chunk.

 2. Another optimization that we can do in pgrb_find_match(), is that
 currently if
 it doesn't find the first chunk (chunk got by hash index) matching, it
 continues to find the match in other chunks. I am not sure if there is any
 benefit to search for other chunks if first one is not matching.

Well, if you took that out, I suspect it would hurt the compression
ratio.  Unless the CPU savings are substantial, I'd leave it alone.

 3. We can move code from pg_lzcompress.c to some new file pg_rbcompress.c,
 if we want to move, then we need to either duplicate some common macros
 like pglz_out_tag or keep it common, but might be change the name.

+1 for a new file.

 4. Decide on min and max chunksize. (currently kept as 2 and 4 respectively).
 The point to consider is that if we keep bigger chunk sizes, then it can
 save us on CPU cycles, but less reduction in Wal, on the other side if we
 keep it small it can have better reduction in Wal but consume more CPU
 cycles.

Whoa.  That seems way too small.  Since PGRB_PATTERN_AFTER_BITS is 4,
the average length of a chunk is about 16 bytes.  It makes little
sense to have the maximum chunk size be 25% of the expected chunk
length.  I'd recommend making the maximum chunk length something like
4 * PGRB_CONST_NUM, and the minimum chunk length maybe something like
4.

 5. kept an guc variable 'wal_update_compression_ratio', for test purpose, we
can remove it before commit.

Let's remove it now.

 7. docs needs to be updated, tab completion needs some work.

Tab completion can be skipped for now, but documentation is important.

 8. We can extend Alter Table to set compress option for table.

I don't understand what you have in mind here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-10 Thread Amit Kapila

On Fri, Jan 10, 2014 at 9:12 PM, Robert Haas robertmh...@gmail.com wrote:
 2. Provide a new reloption to specify Wal compression
 for update operation on table
 Create table tbl(c1 char(100)) With (compress_wal = true);

 Alternative options:
 a. compress_wal can take input as operation, e.g. 'insert', 'update',
 b. use alternate syntax:
 Create table tbl(c1 char(100))  Compress Wal For Update;
 c. anything better?

 I think WITH (compress_wal = true) is pretty good.  I don't understand
 your point about taking the operation as input, because this only
 applies to updates.

Yes, currently this applies to update, what I have in mind is that
in future if some one wants to use WAL compression for any other
operation like 'full_page_writes', then it can be easily extendible.

To be honest, I have not evaluated whether such a flag or compression
would make sense for full page writes, but I think it should be possible
while doing full page write (BkpBlock has RelFileNode) to check such a
flag if it's present.

 But we could try to work update into the name
 of the setting somehow, so as to be less likely to conflict with
 future additions, like maybe wal_compress_update.  I think the
 alternate syntax you propose is clearly worse, because it would
 involve adding new keywords, something we try to avoid.

Yes, this would be better than current, I will include it in
next version of patch, unless there is any other better idea than
this one.


 Points to consider
 -

 1. As the current algorithm store the entry for same chunks at head of list,
it will always find last but one chunk (we don't store last 4 bytes) for
long matching string during match phase in encoding (pgrb_delta_encode).

We can improve it either by storing same chunks at end of list instead of 
 at
head or by trying to find a good_match technique used in lz algorithm.
Finding good_match technique can have overhead in some of the cases
when there is actually no match.

 I don't see what the good_match thing has to do with anything in the
 Rabin algorithm.

The case for which I have mentioned it is where most of the data is
repetitive and modified tuple is almost same.

For example

orignal tuple
b

modified tuple
ccaab


Now let us see what will happen as per current algorithm

Step -1: Form the hash table (lets consider 4 byte chunks just for purpose
 of explanation):
 a. First chunk after first 4 a's () with hash value 1024.
 b. After that all the chunks will be 4 b's () and will have
same hash value (lets call it H2), so will be mapped to same
bucket (let us call it 'B2' bucket) and hence form a list.
Every new chunk with same hash value is added to front of list,
so if go by this B2 has front element with hash value as H2
and location as 58 (last but 8 bytes, as we don't include last
4 bytes in our algorithm)

Step -2: Perform encoding for modified tuple by using hash table formed in
 Step-1
 a. First chunk after first 4 bytes (ccaa) with hash value 1056.
 b. try to find match in hash table, no match, so proceed.
 c. Next chunk with 4 b's and hash value H2, try to find a match,
it will find match in B2 bucket (at first element).
 d. okay hash value matched, so it will try to match each byte.
if all bytes matched, it will consider chunk as matched_chunk.
 e. Now if the chunk matches, then we try to match consecutive bytes
after chunk (in the hope of finding longer match), and in this
case, it will find a match of 8 bytes and consider match_len as 8.
 f. it will increment the modified tuple (dp) to point to byte
next to match_len.
 g. Again, it will consider next 4 b's as chunk and repeat step
c~f.

So here, what is happening is steps c~f are getting repeated, whereas if
we would have added same chunks at end of list in step 1b, then we
could have found the matching string just in one go (c~f).

The reason of adding the same chunk in head of list is that it uses same
technique as pglz_hist_add. Now in pglz, it will not have repeat steps
from c~f, as it has concept of good_match which leads to get this done in
one go.

Being said above, I am really not sure, how much real world data falls
in above category and should we try to optimize based on above example,
but yes it will save some CPU cycles in current test we are using.


But I do think there might be a bug here, which is
 that, unless I'm misinterpreting something, hp is NOT the end of the
 chunk.  After calling pgrb_hash_init(), we've looked at the first FOUR
 bytes of the input.  If we find that we have a zero hash value at that
 point, shouldn't the

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-09 Thread Amit Kapila

On Fri, Dec 6, 2013 at 6:41 PM, Amit Kapila amit.kapil...@gmail.com wrote:
Agreed, summarization of data for LZ/Chunkwise encoding especially for
non-compressible (hundred tiny fields, all changed/half changed) or less
compressible data (hundred tiny fields, half nulled) w.r.t CPU
 usage is as below:

a. For hard disk, there is an overhead of 7~16% with LZ delta encoding
and there is an overhead of 5~8% with Chunk wise encoding.

b. For Tempfs (which means operate on RAM as disk), there is an
 overhead of 19~26%
with LZ delta encoding and there is an overhead of 9~18% with
 Chunk wise encoding.

 There might be some variation of data (in your last mail the overhead
 for chunkwise method for Tempfs was  12%),
 but in general the data suggests that chunk wise encoding has less
 overhead than LZ encoding for non-compressible data
 and for others it is better or equal.

 Now, I think we have below options for this patch:
 a. If the performance overhead for worst case is acceptable (we can
 try to reduce some more, but don't think it will be something
 drastic),
 then this can be done without any flag/option.
 b. Have it with table level option to enable/disable WAL compression
 c. Drop this patch, as for worst cases there is some performance overhead.
 d. Go back and work more on it, if there is any further suggestions
 for improvement.

Based on data posted previously for both approaches
(lz_delta, chunk_wise_encoding) and above options, I have improved
the last version of patch by keeping chunk wise approach and
provided a table level option to user.

Changes in this version of patch:
--
1. Implement decoding, it is almost similar to pglz_decompress as
the format to store encoded data is not changed much.

2. Provide a new reloption to specify Wal compression
for update operation on table
Create table tbl(c1 char(100)) With (compress_wal = true);

Alternative options:
a. compress_wal can take input as operation, e.g. 'insert', 'update',
b. use alternate syntax:
Create table tbl(c1 char(100))  Compress Wal For Update;
c. anything better?

3. Fixed below 2 defects in encoding:
a. In function pgrb_find_match(), if last byte of chunk matches,
it consider whole chunk as match.
b. If there is no match, it copies chunk as it is to encoded data,
while copying, it is ignoring last byte.
Due to defect fixes, data can vary, but I don't think there can be
any major change.

Points to consider
-

1. As the current algorithm store the entry for same chunks at head of list,
   it will always find last but one chunk (we don't store last 4 bytes) for
   long matching string during match phase in encoding (pgrb_delta_encode).

   We can improve it either by storing same chunks at end of list instead of at
   head or by trying to find a good_match technique used in lz algorithm.
   Finding good_match technique can have overhead in some of the cases
   when there is actually no match.

2. Another optimization that we can do in pgrb_find_match(), is that
currently if
it doesn't find the first chunk (chunk got by hash index) matching, it
continues to find the match in other chunks. I am not sure if there is any
benefit to search for other chunks if first one is not matching.

3. We can move code from pg_lzcompress.c to some new file pg_rbcompress.c,
if we want to move, then we need to either duplicate some common macros
like pglz_out_tag or keep it common, but might be change the name.

4. Decide on min and max chunksize. (currently kept as 2 and 4 respectively).
The point to consider is that if we keep bigger chunk sizes, then it can
save us on CPU cycles, but less reduction in Wal, on the other side if we
keep it small it can have better reduction in Wal but consume more CPU
cycles.

5. kept an guc variable 'wal_update_compression_ratio', for test purpose, we
   can remove it before commit.

7. docs needs to be updated, tab completion needs some work.

8. We can extend Alter Table to set compress option for table.


Thoughts/Suggestions?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgrb_delta_encoding_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-12 Thread Robert Haas

On Thu, Dec 12, 2013 at 12:27 AM, Amit Kapila amit.kapil...@gmail.com wrote:
On Thu, Dec 12, 2013 at 3:43 AM, Peter Eisentraut pete...@gmx.net wrote:
This patch fails the regression tests; see attachment.

Thanks for reporting the diffs. The reason for failures is that
still decoding for tuple is not done as mentioned in Notes section in
mail

(http://www.postgresql.org/message-id/caa4ek1jeuby16uwrda2tabkk+rlrl3giyyqy1mvh_6cthmd...@mail.gmail.com)

However, to keep the sanity of patch, I will do that and post an
updated patch, but I think the main idea behind new approach at this
point is to get feedback on if such an optimization is acceptable
for worst case scenarios and if not whether we can get this done
with table level or GUC option.

I don't understand why lack of decoding support should cause
regression tests to fail. I thought decoding was only being done
during WAL replay, a case not exercised by the regression tests.

A few other comments:

+#define PGRB_HKEY_PRIME11 /* prime number used for
rolling hash */
+#define PGRB_HKEY_SQUARE_PRIME11 * 11 /* prime number
used for rolling hash */
+#define PGRB_HKEY_CUBE_PRIME11 * 11 * 11 /* prime
number used for rolling hash */

11 * 11 can't accurately be described as a prime number. Nor can 11 *
11 * 11. Please adjust the comment. Also, why 11?

It doesn't appear that pglz_hist_idx is changed except for whitespace;
please revert that hunk.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-11 Thread Peter Eisentraut

This patch fails the regression tests; see attachment.
*** 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/expected/arrays.out
2013-08-24 01:24:42.0 +
--- 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/results/arrays.out
 2013-12-07 16:21:48.0 +
***
*** 495,508 
  (11 rows)
  
  SELECT * FROM array_op_test WHERE i @ '{38,34,32,89}' ORDER BY seqno;
!  seqno |   i   |  
   t  
! 
---+---+
! 40 | {34}  | 
{AA10611,AAA1205,AAA50956,31334,A70466,81587,AAA74623}
! 74 | {32}  | 
{1729,A22860,AA99807,A17383,AAA67062,AAA15165,AAA50956}
! 98 | {38,34,32,89} | 
{AA71621,8857,AAA65037,31334,AA48845}
!101 | {}| {}
! (4 rows)
! 
  SELECT * FROM array_op_test WHERE i = '{}' ORDER BY seqno;
   seqno | i  | t  
  ---++
--- 495,501 
  (11 rows)
  
  SELECT * FROM array_op_test WHERE i @ '{38,34,32,89}' ORDER BY seqno;
! ERROR:  compressed data is corrupt
  SELECT * FROM array_op_test WHERE i = '{}' ORDER BY seqno;
   seqno | i  | t  
  ---++
***
*** 622,632 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i @ '{}' ORDER BY seqno;
!  seqno | i  | t  
! ---++
!101 | {} | {}
! (1 row)
! 
  SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
   seqno |   i|   t
  ---++
--- 615,621 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i @ '{}' ORDER BY seqno;
! ERROR:  compressed data is corrupt
  SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
   seqno |   i|   t
  ---++
***
*** 644,654 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i @ '{NULL}' ORDER BY seqno;
!  seqno | i  | t  
! ---++
!101 | {} | {}
! (1 row)
! 
  SELECT * FROM array_op_test WHERE t @ '{72908}' ORDER BY seqno;
   seqno |   i   |  
   t
  
  
---+---+
--- 633,639 
  (0 rows)
  
  SELECT * FROM array_op_test WHERE i @ '{NULL}' ORDER BY seqno;
! ERROR:  compressed data is corrupt
  SELECT * FROM array_op_test WHERE t @ '{72908}' ORDER BY seqno;
   seqno |   i   |  
   t
  
  
---+---+

==

*** 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/expected/polymorphism.out
  2013-11-13 20:20:40.0 +
--- 
/var/lib/jenkins/jobs/postgresql_commitfest_world/workspace/src/test/regress/results/polymorphism.out
   2013-12-07 16:21:55.0 +
***
*** 638,648 
  -- check that we can apply functions taking ANYARRAY to pg_stats
  select distinct array_ndims(histogram_bounds) from pg_stats
  where histogram_bounds is not null;
!  array_ndims 
! -
!1
! (1 row)
! 
  -- such functions must protect themselves if varying element type isn't OK
  -- (WHERE clause here is to avoid possibly getting a collation error instead)
  select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
--- 638,644 
  -- check that we can apply functions taking ANYARRAY to pg_stats
  select distinct array_ndims(histogram_bounds) from pg_stats
  where histogram_bounds is not null;
! ERROR:  compressed data is corrupt
  -- such functions must protect themselves if varying element type isn't OK
  -- (WHERE clause here is to avoid possibly getting a collation error instead)
  select max(histogram_bounds) from pg_stats where tablename = 'pg_am';

==


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-11 Thread Amit Kapila

On Thu, Dec 12, 2013 at 3:43 AM, Peter Eisentraut pete...@gmx.net wrote:
 This patch fails the regression tests; see attachment.

   Thanks for reporting the diffs. The reason for failures is that
still decoding for tuple is not done as mentioned in Notes section in
mail
   
(http://www.postgresql.org/message-id/caa4ek1jeuby16uwrda2tabkk+rlrl3giyyqy1mvh_6cthmd...@mail.gmail.com)

   However, to keep the sanity of patch, I will do that and post an
updated patch, but I think the main idea behind new approach at this
   point is to get feedback on if such an optimization is acceptable
for worst case scenarios and if not whether we can get this done
   with table level or GUC option.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-06 Thread Amit Kapila

On Fri, Dec 6, 2013 at 3:39 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 06 December 2013 12:29 Amit Kapila wrote:
  Note -
  a. Performance is data is taken on my laptop, needs to be tested on
  some better m/c b. Attached Patch is just a prototype of chunkwise
  concept, code needs to be improved and decode
  handling/test is pending.
 
  I ran the performance test on linux machine and attached the results
 in the mail.


 I ran the performance test on above patches including another patch which
 Does snappy hash instead of normal hash in LZ algorithm. The performance
 Readings and patch with snappy hash not including new data in compression
 are attached in the mail.

   Thanks for taking the data.

 The chunk wise approach is giving good performance in most of the scenarios.

   Agreed, summarization of data for LZ/Chunkwise encoding especially for
   non-compressible (hundred tiny fields, all changed/half changed) or less
   compressible data (hundred tiny fields, half nulled) w.r.t CPU
usage is as below:

   a. For hard disk, there is an overhead of 7~16% with LZ delta encoding
   and there is an overhead of 5~8% with Chunk wise encoding.

   b. For Tempfs (which means operate on RAM as disk), there is an
overhead of 19~26%
   with LZ delta encoding and there is an overhead of 9~18% with
Chunk wise encoding.

There might be some variation of data (in your last mail the overhead
for chunkwise method for Tempfs was  12%),
but in general the data suggests that chunk wise encoding has less
overhead than LZ encoding for non-compressible data
and for others it is better or equal.

Now, I think we have below options for this patch:
a. If the performance overhead for worst case is acceptable (we can
try to reduce some more, but don't think it will be something
drastic),
then this can be done without any flag/option.
b. Have it with table level option to enable/disable WAL compression
c. Drop this patch, as for worst cases there is some performance overhead.
d. Go back and work more on it, if there is any further suggestions
for improvement.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-05 Thread Haribabu kommi

On 05 December 2013 21:16 Amit Kapila wrote: 
 Note -
 a. Performance is data is taken on my laptop, needs to be tested on
 some better m/c b. Attached Patch is just a prototype of chunkwise
 concept, code needs to be improved and decode
 handling/test is pending.

I ran the performance test on linux machine and attached the results in the 
mail.

Regards,
Hari babu.
Performance Data
-
Machine:

CPU - 4 core
RAM - 24GB
OS  - SUSE 11 SP2

Configurations:

Encoding - SQL_ASCII
lc-collate - C
lc-ctype   - C

Sharedbuffers - 4GB
walbuffers- 16MB
synchronous_commit - off
checkpoint_segments - 255
checkpoint_timeout - 25min
autovacuum - off


NORMAL DISK:
 
Head code: 

   testname| wal_generated | duration
---+---+--
 ten tiny fields, all changed  |1408441624 | 42.9247660636902
 hundred tiny fields, first 10 changed | 623904568 | 19.5896420478821
 hundred tiny fields, all changed  | 626929520 | 20.4015860557556
 hundred tiny fields, half changed | 632949424 | 20.4108989238739
 hundred tiny fields, half nulled  | 568672600 | 16.9662220478058
(5 rows)


lz snappy hash delta encoding

   testname| wal_generated | duration
---+---+--
 ten tiny fields, all changed  |1314962968 | 39.8615770339966
 hundred tiny fields, first 10 changed | 452513816 | 16.8626310825348
 hundred tiny fields, all changed  | 458503728 | 18.5805869102478
 hundred tiny fields, half changed | 463203696 | 17.7005319595337
 hundred tiny fields, half nulled  | 472732888 | 17.7074511051178
(5 rows)


robin delta encoding

   testname| wal_generated | duration
---+---+--
 ten tiny fields, all changed  |1321761080 | 40.9638168811798
 hundred tiny fields, first 10 changed | 501064768 | 18.8991298675537
 hundred tiny fields, all changed  | 631711040 | 22.2287940979004
 hundred tiny fields, half changed | 628356072 | 21.4988770484924
 hundred tiny fields, half nulled  | 506637272 | 18.4065499305725
(5 rows)


TEMPFS:

Head code: 

   testname| wal_generated | duration
---+---+--
 ten tiny fields, all changed  |1395199584 | 23.9126200675964
 hundred tiny fields, first 10 changed | 636436752 | 13.5280549526215
 hundred tiny fields, all changed  | 635591368 | 13.5622470378876
 hundred tiny fields, half changed | 628001440 | 13.5341370105743
 hundred tiny fields, half nulled  | 564033176 | 12.3501629829407
(5 rows)


lz snappy hash delta encoding

   testname| wal_generated | duration
---+---+--
 ten tiny fields, all changed  |1314964400 | 23.2445130348206
 hundred tiny fields, first 10 changed | 456703104 | 14.2320129871368
 hundred tiny fields, all changed  | 463224736 | 14.0170950889587
 hundred tiny fields, half changed | 462589768 | 14.032851934433
 hundred tiny fields, half nulled  | 473361760 | 13.3050961494446
(5 rows)


robin delta encoding

   testname| wal_generated | duration
---+---+--
 ten tiny fields, all changed  |1319550032 | 23.7516498565674
 hundred tiny fields, first 10 changed | 512198256 | 15.0410940647125
 hundred tiny fields, all changed  | 638134296 | 15.143513917923
 hundred tiny fields, half changed | 635378360 | 15.1855199337006
 hundred tiny fields, half nulled  | 508630832 | 13.4496691226959
(5 rows)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-05 Thread Amit Kapila

On Fri, Dec 6, 2013 at 12:10 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 05 December 2013 21:16 Amit Kapila wrote:
 Note -
 a. Performance is data is taken on my laptop, needs to be tested on
 some better m/c b. Attached Patch is just a prototype of chunkwise
 concept, code needs to be improved and decode
 handling/test is pending.

 I ran the performance test on linux machine and attached the results in the 
 mail.

This test doesn't make much sense for comparison as in chunkwise delta
encoding, I am not doing compression using new tuple
and the reason is that I want to check how good/bad it is as compare
to LZ approach for cases when data is non-compressible.
So could you please try to take readings by using patch
pgrb_delta_encoding_v1 attached in my previous mail.

For LZ delta encoding-
pgrb_delta_encoding_v1 - In heaptuple.c, there is a parameter
rabin_fingerprint_comp, set it to false, compile the code and take
readings.
  This will do LZ compression.

For chunk wise delta encoding - In heaptuple.c, there is a parameter
rabin_fingerprint_comp, set it to true, compile the code and take
readings
  This will operate chunk wise.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-12-02 Thread Amit Kapila

On Mon, Dec 2, 2013 at 7:40 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 29 November 2013 03:05 Robert Haas wrote:
 On Wed, Nov 27, 2013 at 9:31 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:

 I tried modifying the existing patch to support the dynamic rollup as follows.
 For every 32 bytes mismatch between the old and new tuple and it resets back 
 whenever it found a match.

 1. pglz-with-micro-optimization-compress-using-newdata-5:

 Adds all old tuple data to history and then check for the match from new 
 tuple.
 For every 32 bytes mismatch, it checks for the match for 2 bytes once. Like 
 this
 It repeats until it found a match or end of data.

 2. pglz-with-micro-optimization-compress-using-newdata_snappy_hash-1:

 Adds only first byte of old tuple data to the history and then check for the 
 match
 From new tuple. If any match found, then next unmatched byte from old tuple 
 is added
 To the history and repeats the process.

 If no match founds then adds the next byte of the old tuple history followed 
 by the
 Unmatched byte from new tuple data to the history.

 In this case the performance is good, but if there is any forward references 
 in the
 New data with old data then it will not compress the data.

The performance data has still same problem that is on fast disks
(tempfs data) it is low.
I am already doing chunk-wise implementation to see if it can improve
the situation, please wait
and then we can decide what is the best way to proceed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-11-28 Thread Robert Haas

On Wed, Nov 27, 2013 at 9:31 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Sure, but to explore (a), the scope is bit bigger. We have below
 options to explore (a):
 1. try to optimize existing algorithm as used in patch, which we have
 tried but ofcourse we can spend some more time to see if anything more
 can be tried out.
 2. try fingerprint technique as suggested by you above.
 3. try some other standard methods like vcdiff, lz4 etc.

Well, obviously, I'm hot on idea #2 and think that would be worth
spending some time on.  If we can optimize the algorithm used in the
patch some more (option #1), that would be fine, too, but the code
looks pretty tight to me, so I'm not sure how successful that's likely
to be.  But if you have an idea, sure.

As to #3, I took a look at lz4 and snappy but neither seems to have an
API for delta compression.  vcdiff is a commonly-used output for delta
compression but doesn't directly address the question of what
algorithm to use to find matches between the old and new input; and I
suspect that when you go searching for algorithms to do that
efficiently, it's going to bring you right back to #2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-11-27 Thread Robert Haas

On Wed, Nov 27, 2013 at 12:56 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 - There is a comment TODO: It would be nice to behave like the
 history and the source strings were concatenated, so that you could
 compress using the new data, too.  If we're not already doing that,
 then how are we managing to compress WAL by more than one-quarter in
 the hundred tiny fields, all changed case?

 Algorithm is not doing concatenation of history and source strings,
 the hash table is formed just with history data and then later
 if match is not found then it is added to history, so this can be the
 reason for the above result.

From the compressor's point of view, that's pretty much equivalent to
behaving as if those strings were concatenated.

The point is that there's a difference between using the old tuple's
history entries to compress the new tuple and using the new tuple's
own history to compress it.  The former is delta-compression, which is
what we're supposedly doing here.  The later is just plain
compression.  That doesn't *necessarily* make it a bad idea, but they
are clearly two different things.

 The basic idea is that you use a rolling hash function to divide up
 the history data into chunks of a given average size.  So we scan the
 history data, compute a rolling hash value at each point, and each
 time the bottom N bits are zero, we consider that to be the end of a
 chunk.  We enter all the chunks into a hash table.  The chunk size
 will vary, but on the average, given a reasonably well-behaved rolling
 hash function (the pglz one probably doesn't qualify) it'll happen
 every 2^N bytes, so perhaps for this purpose we'd choose N to be
 between 3 and 5.  Then, we scan the input we want to compress and
 divide it into chunks in the same way.  Chunks that don't exist in the
 history data get copied to the output, while those that do get
 replaced with a reference to their position in the history data.

 I think this idea looks better than current and it will definately
 improve some of the cases, but not sure we can win in all cases.
 We have tried one of the similar idea (reduce size of hash and
 eventually comparision) by adding every 10 bytes to the history
 lookup table rather than every byte. It improved most cases but not
 all cases (hundred tiny fields, all changed,
  hundred tiny fields, half changed test were still slow).
 Patch and results are at link (refer approach-1):
 http://www.postgresql.org/message-id/001f01ce1c14$d3af0770$7b0d1650$@kap...@huawei.com

What you did there will, I think, tend to miss a lot of compression
opportunities.  Suppose for example that the old tuple is
ABCDEFGHIJKLMNOP and the new tuple is xABCDEFGHIJKLMNOP.  After
copying one literal byte we'll proceed to copy 9 more, missing the
fact that there was a long match available after the first byte.  The
advantage of the fingerprinting technique is that it's supposed to be
resistant to that sort of thing.

 Now the tough question is what are the possible options for this patch
 and which one to pick:
 a. optimize encoding technique, so that it can improve results in most
 cases even if not all.
 b. have a table level option or guc to enable/disable WAL compression.
 c. use some heuristics to check if chances of compression are good,
 then only perform compression.
 1. apply this optimization for tuple size  128 and  2000
 2. apply this optimization if number of modified columns are less
 than 25% (some threshold number) of total columns.
 I think we can get modified columns from target entry and use
 it if triggers haven't changed that tuple. I remember
 earlier there were concerns that this value can't be trusted
 completely, but I think using it as a heuristic is not a
 problem, even if this number is not right in some cases.
 d. forget about this optimization and reject the patch.
 I think by doing option 'b' and 'c' together can make this
 optimization usable in cases where it is actually useful.

I agree that we probably want to do (b), and I suspect we want both a
GUC and a reloption, assuming that can be done relatively cleanly.

However, I think we should explore (a) more before we explore (c).   I
think there's a good chance that we can reduce the CPU overhead of
this enough to feel comfortable having it enabled by default.  If we
proceed with heuristics as in approach (c), I don't think that's the
end of the world, but I think there will be more corner cases where we
lose and have to fiddle things manually.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-11-27 Thread Amit Kapila

On Wed, Nov 27, 2013 at 7:35 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Nov 27, 2013 at 12:56 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 The basic idea is that you use a rolling hash function to divide up
 the history data into chunks of a given average size.  So we scan the
 history data, compute a rolling hash value at each point, and each
 time the bottom N bits are zero, we consider that to be the end of a
 chunk.  We enter all the chunks into a hash table.  The chunk size
 will vary, but on the average, given a reasonably well-behaved rolling
 hash function (the pglz one probably doesn't qualify) it'll happen
 every 2^N bytes, so perhaps for this purpose we'd choose N to be
 between 3 and 5.  Then, we scan the input we want to compress and
 divide it into chunks in the same way.  Chunks that don't exist in the
 history data get copied to the output, while those that do get
 replaced with a reference to their position in the history data.

 I think this idea looks better than current and it will definately
 improve some of the cases, but not sure we can win in all cases.
 We have tried one of the similar idea (reduce size of hash and
 eventually comparision) by adding every 10 bytes to the history
 lookup table rather than every byte. It improved most cases but not
 all cases (hundred tiny fields, all changed,
  hundred tiny fields, half changed test were still slow).
 Patch and results are at link (refer approach-1):
 http://www.postgresql.org/message-id/001f01ce1c14$d3af0770$7b0d1650$@kap...@huawei.com

 What you did there will, I think, tend to miss a lot of compression
 opportunities.  Suppose for example that the old tuple is
 ABCDEFGHIJKLMNOP and the new tuple is xABCDEFGHIJKLMNOP.  After
 copying one literal byte we'll proceed to copy 9 more, missing the
 fact that there was a long match available after the first byte.

That is right, but one idea to try that out was to see if we can
reduce CPU usage at cost of compression,
but we found that it didn't completely eliminate that problem.

 The
 advantage of the fingerprinting technique is that it's supposed to be
 resistant to that sort of thing.

Okay, one question arise here is that can it be better in terms of CPU
usage as compare to when
we have used hash function for every 10th byte, if you have a feeling
that it can improve situation,
I can try a prototype implementation of same to check the results.


 Now the tough question is what are the possible options for this patch
 and which one to pick:
 a. optimize encoding technique, so that it can improve results in most
 cases even if not all.
 b. have a table level option or guc to enable/disable WAL compression.
 c. use some heuristics to check if chances of compression are good,
 then only perform compression.
 1. apply this optimization for tuple size  128 and  2000
 2. apply this optimization if number of modified columns are less
 than 25% (some threshold number) of total columns.
 I think we can get modified columns from target entry and use
 it if triggers haven't changed that tuple. I remember
 earlier there were concerns that this value can't be trusted
 completely, but I think using it as a heuristic is not a
 problem, even if this number is not right in some cases.
 d. forget about this optimization and reject the patch.
 I think by doing option 'b' and 'c' together can make this
 optimization usable in cases where it is actually useful.

 I agree that we probably want to do (b), and I suspect we want both a
 GUC and a reloption, assuming that can be done relatively cleanly.

 However, I think we should explore (a) more before we explore (c).

Sure, but to explore (a), the scope is bit bigger. We have below
options to explore (a):
1. try to optimize existing algorithm as used in patch, which we have
tried but ofcourse we can spend some more time to see if anything more
can be tried out.
2. try fingerprint technique as suggested by you above.
3. try some other standard methods like vcdiff, lz4 etc.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-11-26 Thread Amit Kapila

On Tue, Nov 26, 2013 at 8:25 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jul 22, 2013 at 2:31 PM, Greg Smith g...@2ndquadrant.com wrote:

 I spent a little time running the tests from Heikki's script under
 perf.  On all three two short fields tests and also on the ten tiny
 fields, all changed test, we spend about 1% of the CPU time in
 pglz_delta_encode.  I don't see any evidence that it's actually
 compressing anything at all; it appears to be falling out where we
 test the input length against the strategy, presumably because the
 default strategy (which we largely copy here) doesn't try to compress
 input data of less than 32 bytes.   Given that this code isn't
 actually compressing anything in these cases, I'm a bit confused by
 Greg's report of substantial gains on ten tiny fields, all changed
 test; how can we win if we're not compressing?

I think it is mainly due to variation of test data on Mac, if we see
the results of Linux, there is not much difference in results.
Also the results posted by Heikki or by me at below links doesn't show
such inconsistency for ten tiny fields, all changed case.

http://www.postgresql.org/message-id/51366323.8070...@vmware.com
http://www.postgresql.org/message-id/001f01ce1c14$d3af0770$7b0d1650$@kap...@huawei.com
(Refer test_readings.txt)


 I studied the hundred tiny fields, half nulled test case in some
 detail.  Some thoughts:

 - There is a comment TODO: It would be nice to behave like the
 history and the source strings were concatenated, so that you could
 compress using the new data, too.  If we're not already doing that,
 then how are we managing to compress WAL by more than one-quarter in
 the hundred tiny fields, all changed case?

Algorithm is not doing concatenation of history and source strings,
the hash table is formed just with history data and then later
if match is not found then it is added to history, so this can be the
reason for the above result.

 It looks to me like the
 patch IS doing that, and I'm not sure it's a good idea, especially
 because it's using pglz_hist_add_no_recycle rather than pglz_add_hist:
 we verify that hlen + slen  2 * PGLZ_HISTORY_SIZE but that doesn't
 seem good enough. On the hundred tiny fields, half nulled test case,
 removing that line reduces compression somewhat but also saves on CPU
 cycles.

 - pglz_find_match() is happy to walk all the way down even a really,
 really long bucket chain.  It has some code that reduces good_match
 each time through, but it fails to produce a non-zero decrement once
 good_match * good_drop  100.  So if we're searching an enormously
 deep bucket many times in a row, and there are actually no matches,
 we'll go flying down the whole linked list every time.  I tried
 mandating a minimum decrease of 1 and didn't observe that it made any
 difference in the run time of this test case, but it still seems odd.
 For the record, it's not new behavior with this patch; pglz_compress()
 has the same issue as things stand today.  I wonder if we ought to
 decrease the good match length by a constant rather than a percentage
 at each step.

 - pglz_delta_encode() seems to spend about 50% of its CPU time loading
 up the history data.  It would be nice to find a way to reduce that
 effort.  I thought about maybe only making a history entry for, say,
 every fourth input position rather than every one, but a quick test
 seems to suggest that's a big fail, probably because it's too easy to
 skip over the position where we would have made the right match via
 some short match.  But maybe there's some more sophisticated strategy
 here that would work better.  For example, see:

 http://en.wikipedia.org/wiki/Rabin_fingerprint

 The basic idea is that you use a rolling hash function to divide up
 the history data into chunks of a given average size.  So we scan the
 history data, compute a rolling hash value at each point, and each
 time the bottom N bits are zero, we consider that to be the end of a
 chunk.  We enter all the chunks into a hash table.  The chunk size
 will vary, but on the average, given a reasonably well-behaved rolling
 hash function (the pglz one probably doesn't qualify) it'll happen
 every 2^N bytes, so perhaps for this purpose we'd choose N to be
 between 3 and 5.  Then, we scan the input we want to compress and
 divide it into chunks in the same way.  Chunks that don't exist in the
 history data get copied to the output, while those that do get
 replaced with a reference to their position in the history data.

 I'm not 100% certain that something like this would be better than
 trying to leverage the pglz machinery, but I think it might be worth
 trying.  One possible advantage is that you make many fewer hash-table
 entries, which reduces both the cost of setting up the hash table and
 the cost of probing it; another is that if you find a hit in the hash
 table, you needn't search any further: you are done; this is related
 to the point that, for the most part, the

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-11-25 Thread Robert Haas

On Mon, Jul 22, 2013 at 2:31 PM, Greg Smith g...@2ndquadrant.com wrote:
 On the Mac, the only case that seems to have a slowdown now is hundred tiny
 fields, half nulled.  It would be nice to understand just what is going on
 with that one.  I got some ugly results in two short fields, no change
 too, along with a couple of other weird results, but I think those were
 testing procedure issues that can be ignored.  The pgbench throttle work I
 did recently highlights that I can't really make a Mac quiet/consistent for
 benchmarking very well.  Note that I ran all of the Mac tests with
 assertions on, to try and catch platform specific bugs. The Linux ones used
 the default build parameters.

Amit has been asking me to look at this patch for a while, so I
finally did.  While I agree that it would be nice to get the CPU
overhead down low enough that we can turn this on by default and
forget about it, I'm not convinced that it's without value even if we
can't.  Fundamentally, this patch trades away some CPU in exchanged
for decrease I/O.  The testing thus far is all about whether the CPU
overhead can be made trivial, which is a good question to ask, because
if the answer is yes, then rather than trading something for something
else, we just get something for free.  Win!  But even if that doesn't
pan out, I think the fallback position should not be OK, well, if we
can't get decreased I/O for free then forget it but rather OK, if we
can't get decreased I/O for free then let's get decreased I/O in
exchange for increased CPU usage.

I spent a little time running the tests from Heikki's script under
perf.  On all three two short fields tests and also on the ten tiny
fields, all changed test, we spend about 1% of the CPU time in
pglz_delta_encode.  I don't see any evidence that it's actually
compressing anything at all; it appears to be falling out where we
test the input length against the strategy, presumably because the
default strategy (which we largely copy here) doesn't try to compress
input data of less than 32 bytes.   Given that this code isn't
actually compressing anything in these cases, I'm a bit confused by
Greg's report of substantial gains on ten tiny fields, all changed
test; how can we win if we're not compressing?

I studied the hundred tiny fields, half nulled test case in some
detail.  Some thoughts:

- There is a comment TODO: It would be nice to behave like the
history and the source strings were concatenated, so that you could
compress using the new data, too.  If we're not already doing that,
then how are we managing to compress WAL by more than one-quarter in
the hundred tiny fields, all changed case?  It looks to me like the
patch IS doing that, and I'm not sure it's a good idea, especially
because it's using pglz_hist_add_no_recycle rather than pglz_add_hist:
we verify that hlen + slen  2 * PGLZ_HISTORY_SIZE but that doesn't
seem good enough. On the hundred tiny fields, half nulled test case,
removing that line reduces compression somewhat but also saves on CPU
cycles.

- pglz_find_match() is happy to walk all the way down even a really,
really long bucket chain.  It has some code that reduces good_match
each time through, but it fails to produce a non-zero decrement once
good_match * good_drop  100.  So if we're searching an enormously
deep bucket many times in a row, and there are actually no matches,
we'll go flying down the whole linked list every time.  I tried
mandating a minimum decrease of 1 and didn't observe that it made any
difference in the run time of this test case, but it still seems odd.
For the record, it's not new behavior with this patch; pglz_compress()
has the same issue as things stand today.  I wonder if we ought to
decrease the good match length by a constant rather than a percentage
at each step.

- pglz_delta_encode() seems to spend about 50% of its CPU time loading
up the history data.  It would be nice to find a way to reduce that
effort.  I thought about maybe only making a history entry for, say,
every fourth input position rather than every one, but a quick test
seems to suggest that's a big fail, probably because it's too easy to
skip over the position where we would have made the right match via
some short match.  But maybe there's some more sophisticated strategy
here that would work better.  For example, see:

http://en.wikipedia.org/wiki/Rabin_fingerprint

The basic idea is that you use a rolling hash function to divide up
the history data into chunks of a given average size.  So we scan the
history data, compute a rolling hash value at each point, and each
time the bottom N bits are zero, we consider that to be the end of a
chunk.  We enter all the chunks into a hash table.  The chunk size
will vary, but on the average, given a reasonably well-behaved rolling
hash function (the pglz one probably doesn't qualify) it'll happen
every 2^N bytes, so perhaps for this purpose we'd choose N to be
between 3 and 5.  Then, we scan the input we want to compress

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Amit Kapila

On Tuesday, July 23, 2013 12:02 AM Greg Smith wrote:
 The v3 patch applies perfectly here now.  Attached is a spreadsheet
 with test results from two platforms, a Mac laptop and a Linux server.
 I used systems with high disk speed because that seemed like a worst
 case for this improvement.  The actual improvement for shrinking WAL
 should be even better on a system with slower disks.

You are absolutely right. 
To mimic it on our system, by configuring RAMFS for database, it shows similar 
results.
 
 There are enough problems with the hundred tiny fields results that I
 think this not quite ready for commit yet.  More comments on that
 below.
   This seems close though, close enough that I am planning to follow up
 to see if the slow disk results are better.

Thanks for going extra mile to try for slower disks. 

 Reviewing the wal-update-testsuite.sh test program, I think the only
 case missing that would be useful to add is ten tiny fields, one
 changed.  I think that one is interesting to highlight because it's
 what benchmark programs like pgbench do very often.
 The GUC added by the program looks like this:
 
 postgres=# show wal_update_compression_ratio ;
   wal_update_compression_ratio
 --
   25
 
 Is possible to add a setting here that disables the feature altogether?

Yes, it can be done in below 2 ways:
1. Provide a new configuration parameter (wal_update_compression) to turn 
on/off this feature.
2. At table level user can be given option to configure

   That always makes it easier to consider a commit, knowing people can
 roll back the change if it makes performance worse.  That would make
 performance testing easier too.  wal-update-testsuit.sh takes as long
 as
 13 minutes, it's long enough that I'd like the easier to automate
 comparison GUC disabling adds.  If that's not practical to do given the
 intrusiveness of the code, it's not really necessary.  I haven't looked
 at the change enough to be sure how hard this is.
 
 On the Mac, the only case that seems to have a slowdown now is hundred
 tiny fields, half nulled.  It would be nice to understand just what is
 going on with that one.  I got some ugly results in two short fields,
 no change too, along with a couple of other weird results, but I think
 those were testing procedure issues that can be ignored.  The pgbench
 throttle work I did recently highlights that I can't really make a Mac
 quiet/consistent for benchmarking very well.  Note that I ran all of
 the Mac tests with assertions on, to try and catch platform specific
 bugs.
 The Linux ones used the default build parameters.
 
 On Linux hundred tiny fields, half nulled was also by far the worst
 performing one, with a 30% increase in duration despite the 14% drop
 in WAL.  Exactly what's going on there really needs to be investigated
 before this seems safe to commit.  All of the hundred tiny fields
 cases seem pretty bad on Linux, with the rest of them running about a
 11% duration increase.

The main benefit of this patch is to reduce WAL for improving time in I/O,
But I think for faster I/O systems, the calculation to reduce WAL has overhead. 
I will check how to optimize that calculation, but I think this feature should 
be consider 
with configuration knob as it can improve many cases.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Amit Kapila

On Tuesday, July 23, 2013 12:27 AM Andres Freund wrote:
 On 2013-07-19 10:40:01 +0530, Hari Babu wrote:
 
  On Friday, July 19, 2013 4:11 AM Greg Smith wrote:
  On 7/9/13 12:09 AM, Amit Kapila wrote:
  I think the first thing to verify is whether the results posted
 can be validated in some other environment setup by another person.
  The testcase used is posted at below link:
  http://www.postgresql.org/message-
 id/51366323.8070...@vmware.com
 
  That seems easy enough to do here, Heikki's test script is
 excellent.
  The latest patch Hari posted on July 2 has one hunk that doesn't
 apply
  anymore now.
 
  The Head code change from Heikki is correct.
  During the patch rebase to latest PG LZ optimization code, the above
 code change is missed.
 
  Apart from the above changed some more changes are done in the patch,
 those are.
 
 FWIW I don't like this approach very much:
 
 * I'd be very surprised if this doesn't make WAL replay of update heavy
   workloads slower by at least factor of 2.

Yes, if you just consider the cost of replay, but it involves other
operations as well
like for standby case transfer of WAL, Write of WAL, Read from WAL and
then apply.
So among them most operation's will be benefited from reduced WAL size,
except apply where you need to decode. 
 
 * It makes data recovery from WAL *noticeably* harder since data
   corruption now is carried forwards and you need the old data to
 decode
   new data
  
   This is one of the reasons why this optimization is done only when the
new row goes in same page.

 * It makes changeset extraction either more expensive or it would have
   to be disabled there.

I think, if there is any such implication, we can probably have the
option of disable it

 I think my primary issue is that philosophically/architecturally I am
 of
 the opinion that a wal record should make sense of it's own without
 depending on heap data. And this patch looses that.

Is the main worry about corruption getting propagated?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Andres Freund

On 2013-07-23 18:59:11 +0530, Amit Kapila wrote:
  * I'd be very surprised if this doesn't make WAL replay of update heavy
workloads slower by at least factor of 2.
 
 Yes, if you just consider the cost of replay, but it involves other
 operations as well
 like for standby case transfer of WAL, Write of WAL, Read from WAL and
 then apply.
 So among them most operation's will be benefited from reduced WAL size,
 except apply where you need to decode.

I still think it's rather unlikely that they offset those. I've seen wal
replay be a major bottleneck more than once...

  * It makes data recovery from WAL *noticeably* harder since data
corruption now is carried forwards and you need the old data to
  decode
new data
   
This is one of the reasons why this optimization is done only when the
 new row goes in same page.

That doesn't help all that much. It somewhat eases recovering data if
full_page_writes are on, but it's realy hard to stitch together all
changes if the corruption occured within a 1h long checkpoint...

  * It makes changeset extraction either more expensive or it would have
to be disabled there.

 I think, if there is any such implication, we can probably have the
 option of disable it

That can just be done on wal_level = logical, that's not the
problem. It's certainly not with precedence that we have wal_level
dependent optimizations.

  I think my primary issue is that philosophically/architecturally I am
  of
  the opinion that a wal record should make sense of it's own without
  depending on heap data. And this patch looses that.
 
 Is the main worry about corruption getting propagated?

Not really. It feels wrong to me architecturally. That's subjective, I
know.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-23 Thread Amit Kapila

On Tuesday, July 23, 2013 7:06 PM Andres Freund wrote:
 On 2013-07-23 18:59:11 +0530, Amit Kapila wrote:
   * I'd be very surprised if this doesn't make WAL replay of update
 heavy
 workloads slower by at least factor of 2.
 
  Yes, if you just consider the cost of replay, but it involves
 other
  operations as well
  like for standby case transfer of WAL, Write of WAL, Read from
 WAL and
  then apply.
  So among them most operation's will be benefited from reduced WAL
 size,
  except apply where you need to decode.
 
 I still think it's rather unlikely that they offset those. I've seen
 wal
 replay be a major bottleneck more than once...

   * It makes data recovery from WAL *noticeably* harder since data
 corruption now is carried forwards and you need the old data to
   decode
 new data
 
 This is one of the reasons why this optimization is done only when
 the
  new row goes in same page.
 
 That doesn't help all that much. It somewhat eases recovering data if
 full_page_writes are on, but it's realy hard to stitch together all
 changes if the corruption occured within a 1h long checkpoint...
 
I think once a record is corrupted on page, user has to reconstruct that
page, it might be difficult
to just reconstruct a record and this optimization will not carry forward
any corruption from one to another 
Page.   
 
   * It makes changeset extraction either more expensive or it would
 have
 to be disabled there.
 
  I think, if there is any such implication, we can probably have
 the
  option of disable it
 
 That can just be done on wal_level = logical, that's not the
 problem. It's certainly not with precedence that we have wal_level
 dependent optimizations.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-22 Thread Greg Smith

The v3 patch applies perfectly here now.  Attached is a spreadsheet with 
test results from two platforms, a Mac laptop and a Linux server.  I 
used systems with high disk speed because that seemed like a worst case 
for this improvement.  The actual improvement for shrinking WAL should 
be even better on a system with slower disks.


There are enough problems with the hundred tiny fields results that I 
think this not quite ready for commit yet.  More comments on that below. 
 This seems close though, close enough that I am planning to follow up 
to see if the slow disk results are better.


Reviewing the wal-update-testsuite.sh test program, I think the only 
case missing that would be useful to add is ten tiny fields, one 
changed.  I think that one is interesting to highlight because it's 
what benchmark programs like pgbench do very often.


The GUC added by the program looks like this:

postgres=# show wal_update_compression_ratio ;
 wal_update_compression_ratio
--
 25

Is possible to add a setting here that disables the feature altogether? 
 That always makes it easier to consider a commit, knowing people can 
roll back the change if it makes performance worse.  That would make 
performance testing easier too.  wal-update-testsuit.sh takes as long as 
13 minutes, it's long enough that I'd like the easier to automate 
comparison GUC disabling adds.  If that's not practical to do given the 
intrusiveness of the code, it's not really necessary.  I haven't looked 
at the change enough to be sure how hard this is.


On the Mac, the only case that seems to have a slowdown now is hundred 
tiny fields, half nulled.  It would be nice to understand just what is 
going on with that one.  I got some ugly results in two short fields, 
no change too, along with a couple of other weird results, but I think 
those were testing procedure issues that can be ignored.  The pgbench 
throttle work I did recently highlights that I can't really make a Mac 
quiet/consistent for benchmarking very well.  Note that I ran all of the 
Mac tests with assertions on, to try and catch platform specific bugs. 
The Linux ones used the default build parameters.


On Linux hundred tiny fields, half nulled was also by far the worst 
performing one, with a 30% increase in duration despite the 14% drop in 
WAL.  Exactly what's going on there really needs to be investigated 
before this seems safe to commit.  All of the hundred tiny fields 
cases seem pretty bad on Linux, with the rest of them running about a 
11% duration increase.


This doesn't seem ready to commit for this CF, but the number of problem 
cases is getting pretty small now.  Now that I've gotten more familiar 
with the test programs and the feature, I can run more performance tests 
on this at any time really.  If updates addressing the trouble cases are 
ready from Amit or Hari before the next CF, send them out and I can look 
at them without waiting until that one starts.  This is a very promising 
looking performance feature.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


WAL-lz-v3.xls
Description: MS-Excel spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-22 Thread Andres Freund

On 2013-07-19 10:40:01 +0530, Hari Babu wrote:
 
 On Friday, July 19, 2013 4:11 AM Greg Smith wrote:
 On 7/9/13 12:09 AM, Amit Kapila wrote:
 I think the first thing to verify is whether the results posted can be 
  validated in some other environment setup by another person.
 The testcase used is posted at below link:
 http://www.postgresql.org/message-id/51366323.8070...@vmware.com
 
 That seems easy enough to do here, Heikki's test script is excellent. 
 The latest patch Hari posted on July 2 has one hunk that doesn't apply 
 anymore now.
 
 The Head code change from Heikki is correct.
 During the patch rebase to latest PG LZ optimization code, the above code 
 change is missed.
 
 Apart from the above changed some more changes are done in the patch, those 
 are. 

FWIW I don't like this approach very much:

* I'd be very surprised if this doesn't make WAL replay of update heavy
  workloads slower by at least factor of 2.

* It makes data recovery from WAL *noticeably* harder since data
  corruption now is carried forwards and you need the old data to decode
  new data

* It makes changeset extraction either more expensive or it would have
  to be disabled there.

I think my primary issue is that philosophically/architecturally I am of
the opinion that a wal record should make sense of it's own without
depending on heap data. And this patch looses that.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-22 Thread Greg Smith


On 7/22/13 2:57 PM, Andres Freund wrote:

* I'd be very surprised if this doesn't make WAL replay of update heavy
   workloads slower by at least factor of 2.


I was thinking about what a benchmark of WAL replay would look like last 
year.  I don't think that data is captured very well yet, and it should be.


My idea was to break the benchmark into two pieces.  One would take a 
base backup, then run a series of tests and archive the resulting the 
WAL.  I doubt you can make a useful benchmark here without a usefully 
populated database, that's why the base backup step is needed.


The first useful result then is to measure how long commit/archiving 
took and the WAL volume, which is what's done by the test harness for 
this program.  Then the resulting backup would be setup for replay. 
tarring up the backup and WAL archive could even give you a repeatable 
test set for ones where it's only replay changes happening.  Then the 
main number that's useful, total replay time, would be measured.


The main thing I wanted this for wasn't for code changes; it was to 
benchmark configuration changes.  I'd like to be able to answer 
questions like which I/O scheduler is best for a standby in a way that 
has real test data behind it.  The same approach should useful for 
answering your concerns about the replay performance impact of this 
change too though.



* It makes changeset extraction either more expensive or it would have
   to be disabled there.


That argues that if committed at all, the ability to turn this off I was 
asking about would be necessary.  It sounds like this *could* work like 
how minimal WAL archiving levels allow optimizations that are disabled 
at higher ones--like the COPY into a truncated/new table cheat.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-18 Thread Greg Smith


On 7/9/13 12:09 AM, Amit Kapila wrote:

   I think the first thing to verify is whether the results posted can be 
validated in some other environment setup by another person.
   The testcase used is posted at below link:
   http://www.postgresql.org/message-id/51366323.8070...@vmware.com


That seems easy enough to do here, Heikki's test script is excellent. 
The latest patch Hari posted on July 2 has one hunk that doesn't apply 
anymore now.  Inside src/backend/utils/adt/pg_lzcompress.c the patch 
tries to change this code:


-   if (hent)
+   if (hentno != INVALID_ENTRY)

But that line looks like this now:

if (hent != INVALID_ENTRY_PTR)

Definitions of those:

#define INVALID_ENTRY   0
#define INVALID_ENTRY_PTR   (hist_entries[INVALID_ENTRY])

I'm not sure if different error handling may be needed here now due the 
commit that changed this, or if the patch wasn't referring to the right 
type of error originally.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-18 Thread Stephen Frost

Greg,

* Greg Smith (g...@2ndquadrant.com) wrote:
 That seems easy enough to do here, Heikki's test script is
 excellent. The latest patch Hari posted on July 2 has one hunk that
 doesn't apply anymore now.  Inside
 src/backend/utils/adt/pg_lzcompress.c the patch tries to change this
 code:
 
 -   if (hent)
 +   if (hentno != INVALID_ENTRY)

hentno certainly doesn't make much sense here- it's only used at the top
of the function to keep things a bit cleaner when extracting the address
into hent from hist_entries:

hentno = hstart[pglz_hist_idx(input, end, mask)];
hent = hist_entries[hentno];

Indeed, as the referenced conditional is inside the following loop:

while (hent != INVALID_ENTRY_PTR)

and, since hentno == 0 implies hent == INVALID_ENTRY_PTR, the
conditional would never fail (which is what was happening prior to
Heikki commiting the fix for this, changing the conditional to what is
below).

 But that line looks like this now:
 
 if (hent != INVALID_ENTRY_PTR)

Right, this is correct- it's useful to check the new value for hent
after it's been updated by:

hent = hent-next;

and see if it's possible to drop out early.

 I'm not sure if different error handling may be needed here now due
 the commit that changed this, or if the patch wasn't referring to
 the right type of error originally.

I've not looked at anything regarding this beyond this email, but I'm
pretty confident that the change Heikki committed was the correct one.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-18 Thread Hari Babu


On Friday, July 19, 2013 4:11 AM Greg Smith wrote:
On 7/9/13 12:09 AM, Amit Kapila wrote:
I think the first thing to verify is whether the results posted can be 
 validated in some other environment setup by another person.
The testcase used is posted at below link:
http://www.postgresql.org/message-id/51366323.8070...@vmware.com

That seems easy enough to do here, Heikki's test script is excellent. 
The latest patch Hari posted on July 2 has one hunk that doesn't apply 
anymore now.

The Head code change from Heikki is correct.
During the patch rebase to latest PG LZ optimization code, the above code 
change is missed.

Apart from the above changed some more changes are done in the patch, those 
are. 

1. corrected some comments in the code 
2. Added a validity check as source and history length combined cannot be more 
than or equal to 8192. 

Thanks for the review, please find the latest patch attached in the mail.

Regards,
Hari babu.



pglz-with-micro-optimization-compress-using-newdata-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-09 Thread Mike Blackwell

The only environment I have available at the moment is a virtual box.
 That's probably not going to be very helpful for performance testing.

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com


http://www.rrdonnelley.com/
* mike.blackw...@rrd.com*


On Mon, Jul 8, 2013 at 11:09 PM, Amit Kapila amit.kap...@huawei.com wrote:


 On Tuesday, July 09, 2013 2:52 AM Mike Blackwell wrote:

  I can't comment on further direction for the patch, but since it was
 marked as Needs Review in the CF app I took a quick look at it.
   Thanks for looking into it.

   Last time Heikki has found test scenario's where the original patch was
 not performing good.
   He has also proposed a different approach for WAL encoding and sent the
 modified patch which has comparatively less negative performance impact and
   asked to check if the patch can reduce the performance impact for the
 scenario's mentioned by him.
   After that I found that with some modification's (use new tuple data for
  encoding) in his approach, it eliminates the negative performance impact
 and
   have WAL reduction for more number of cases.

   I think the first thing to verify is whether the results posted can be
 validated in some other environment setup by another person.
   The testcase used is posted at below link:
   http://www.postgresql.org/message-id/51366323.8070...@vmware.com



  It patches and compiles clean against the current Git HEAD, and 'make
 check' runs successfully.

  Does it need documentation for the GUC variable
 'wal_update_compression_ratio'?

   This variable has been added to test the patch for different
 compression_ratio during development test.
   It was not decided to have this variable permanently as part of this
 patch, so currently there is no documentation for it.
   However if the decision comes out to be that it needs to be part of
 patch, then documentation for same can be updated.

 With Regards,
 Amit Kapila.



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-09 Thread Amit Kapila


 On Wednesday, July 10, 2013 6:32 AM Mike Blackwell wrote:

 The only environment I have available at the moment is a virtual box.  That's 
 probably not going to be very helpful for performance testing.

It's okay. Anyway thanks for doing the basic test for patch. 

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-08 Thread Mike Blackwell

I can't comment on further direction for the patch, but since it was marked
as Needs Review in the CF app I took a quick look at it.

It patches and compiles clean against the current Git HEAD, and 'make
check' runs successfully.

Does it need documentation for the GUC variable
'wal_update_compression_ratio'?

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com


http://www.rrdonnelley.com/
* mike.blackw...@rrd.com*


On Tue, Jul 2, 2013 at 2:26 AM, Hari Babu haribabu.ko...@huawei.com wrote:

 On Friday, June 07, 2013 5:07 PM Amit Kapila wrote:
 On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote:
  On 04.03.2013 06:39, Amit Kapila wrote:
   On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
   On 02/05/2013 11:53 PM, Amit Kapila wrote:
   Performance data for the patch is attached with this mail.
   Conclusions from the readings (these are same as my previous
  patch):
  
 
  The attached patch also just adds overhead in most cases, but the
  overhead is much smaller in the worst case. I think that's the right
  tradeoff here - we want to avoid scenarios where performance falls off
  the cliff. That said, if you usually just get a slowdown, we certainly
  can't make this the default, and if we can't turn it on by default,
  this probably just isn't worth it.
 
  The attached patch contains the variable-hash-size changes I posted in
  the Optimizing pglz compressor. But in the delta encoding function,
  it goes further than that, and contains some further micro-
  optimizations:
  the hash is calculated in a rolling fashion, and it uses a specialized
  version of the pglz_hist_add macro that knows that the input can't
  exceed 4096 bytes. Those changes shaved off some cycles, but you could
  probably do more. One idea is to only add every 10 bytes or so to the
  history lookup table; that would sacrifice some compressibility for
  speed.
 
  If you could squeeze pglz_delta_encode function to be cheap enough
  that we could enable this by default, this would be pretty cool patch.
  Or at least, the overhead in the cases that you get no compression
  needs to be brought down, to about 2-5 % at most I think. If it can't
  be done easily, I feel that this probably needs to be dropped.

 After trying some more on optimizing pglz_delta_encode(), I found that if
 we use new data also in history, then the results of compression and cpu
 utilization are much better.

 In addition to the pg lz micro optimization changes, following changes are
 done in modified patch

 1. The unmatched new data is also added to the history which can be
 referenced later.
 2. To incorporate this change in the lZ algorithm, 1 extra control bit is
 needed to indicate if data is from old or new tuple

 The patch is rebased to use the new PG LZ algorithm optimization changes
 which got committed recently.

 Performance Data
 -

 Head code:

 testname | wal_generated | duration

 -+---+--

  two short fields, no change |1232911016 | 35.1784930229187
  two short fields, one changed   |1240322016 | 35.0436308383942
  two short fields, both changed  |1235318352 | 35.4989421367645
  one short and one long field, no change |1042332336 | 23.4457180500031
  ten tiny fields, all changed|1395194136 | 41.9023628234863
  hundred tiny fields, first 10 changed   | 626725984 | 21.2999589443207
  hundred tiny fields, all changed| 621899224 | 21.6676609516144
  hundred tiny fields, half changed   | 623998272 | 21.2745981216431
  hundred tiny fields, half nulled| 557714088 | 19.5902800559998


 pglz-with-micro-optimization-compress-using-newdata-2:

 testname | wal_generated | duration

 -+---+--

  two short fields, no change |1232903384 | 35.0115969181061
  two short fields, one changed   |1232906960 | 34.759307861
  two short fields, both changed  |1232903520 | 35.7665238380432
  one short and one long field, no change | 649647992 | 19.4671010971069
  ten tiny fields, all changed|1314957136 | 39.9727990627289
  hundred tiny fields, first 10 changed   | 458684024 | 17.8197758197784
  hundred tiny fields, all changed| 461028464 | 17.3083391189575
  hundred tiny fields, half changed   | 456528696 | 17.1769199371338
  hundred tiny fields, half nulled| 480548936 | 18.81720495224

 Observation
 ---
 1. It yielded compression in more cases (refer all cases of hundred tiny
 fields)
 2.

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-08 Thread Josh Berkus

On 07/08/2013 02:21 PM, Mike Blackwell wrote:
 I can't comment on further direction for the patch, but since it was marked
 as Needs Review in the CF app I took a quick look at it.
 
 It patches and compiles clean against the current Git HEAD, and 'make
 check' runs successfully.
 
 Does it need documentation for the GUC variable
 'wal_update_compression_ratio'?

Yes.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-07-08 Thread Amit Kapila


On Tuesday, July 09, 2013 2:52 AM Mike Blackwell wrote:

 I can't comment on further direction for the patch, but since it was marked 
 as Needs Review in the CF app I took a quick look at it.
  Thanks for looking into it.

  Last time Heikki has found test scenario's where the original patch was not 
performing good. 
  He has also proposed a different approach for WAL encoding and sent the 
modified patch which has comparatively less negative performance impact and 
  asked to check if the patch can reduce the performance impact for the 
scenario's mentioned by him.
  After that I found that with some modification's (use new tuple data for  
encoding) in his approach, it eliminates the negative performance impact and 
  have WAL reduction for more number of cases.

  I think the first thing to verify is whether the results posted can be 
validated in some other environment setup by another person. 
  The testcase used is posted at below link:
  http://www.postgresql.org/message-id/51366323.8070...@vmware.com

  

 It patches and compiles clean against the current Git HEAD, and 'make check' 
 runs successfully.

 Does it need documentation for the GUC variable 
 'wal_update_compression_ratio'?

  This variable has been added to test the patch for different 
compression_ratio during development test.
  It was not decided to have this variable permanently as part of this patch, 
so currently there is no documentation for it. 
  However if the decision comes out to be that it needs to be part of patch, 
then documentation for same can be updated.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-03-07 Thread Andres Freund

On 2013-03-05 23:26:59 +0200, Heikki Linnakangas wrote:
 On 04.03.2013 06:39, Amit Kapila wrote:
 The stats look fairly sane. I'm a little concerned about the apparent
 trend of falling TPS in the patched vs original tests for the 1-client
 test as record size increases, but it's only 0.0%-0.2%-0.4%, and the
 0.4% case made other config changes too. Nonetheless, it might be wise
 to check with really big records and see if the trend continues.
 
 For bigger size (~2000) records, it goes into toast, for which we don't do
 this optimization.
 This optimization is mainly for medium size records.
 
 I've been doing investigating the pglz option further, and doing performance
 comparisons of the pglz approach and this patch. I'll begin with some
 numbers:
 
 unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45):
 
 testname | wal_generated | duration
 
 -+---+--
  two short fields, no change |1245525360 | 9.94613695144653
  two short fields, one changed   |1245536528 |  10.146910905838
  two short fields, both changed  |1245523160 | 11.2332470417023
  one short and one long field, no change |1054926504 | 5.90477800369263
  ten tiny fields, all changed|1411774608 | 13.4536008834839
  hundred tiny fields, all changed| 635739680 | 7.57448387145996
  hundred tiny fields, half changed   | 636930560 | 7.56888699531555
  hundred tiny fields, half nulled| 573751120 | 6.68991994857788
 
 Amit's wal_update_changes_v10.patch:
 
 testname | wal_generated | duration
 
 -+---+--
  two short fields, no change |1249722112 | 13.0558869838715
  two short fields, one changed   |1246145408 | 12.9947438240051
  two short fields, both changed  |1245951056 | 13.0262880325317
  one short and one long field, no change | 678480664 | 5.70031690597534
  ten tiny fields, all changed|1328873920 | 20.0167419910431
  hundred tiny fields, all changed| 638149416 | 14.4236788749695
  hundred tiny fields, half changed   | 635560504 | 14.8770561218262
  hundred tiny fields, half nulled| 558468352 | 16.2437210083008
 
 pglz-with-micro-optimizations-1.patch:
 
  testname | wal_generated | duration
 -+---+--
  two short fields, no change |1245519008 | 11.6702048778534
  two short fields, one changed   |1245756904 | 11.3233819007874
  two short fields, both changed  |1249711088 | 11.6836447715759
  one short and one long field, no change | 664741392 | 6.44810795783997
  ten tiny fields, all changed|1328085568 | 13.9679481983185
  hundred tiny fields, all changed| 635974088 | 9.15514206886292
  hundred tiny fields, half changed   | 636309040 | 9.13769292831421
  hundred tiny fields, half nulled| 496396448 | 8.77351498603821
 
 In each test, a table is created with a large number of identical rows, and
 fillfactor=50. Then a full-table UPDATE is performed, and the UPDATE is
 timed. Duration is the time spent in the UPDATE (lower is better), and
 wal_generated is the amount of WAL generated by the updates (lower is
 better).
 
 The summary is that Amit's patch is a small win in terms of CPU usage, in
 the best case where the table has few columns, with one large column that is
 not updated. In all other cases it just adds overhead. In terms of WAL size,
 you get a big gain in the same best case scenario.
 
 Attached is a different version of this patch, which uses the pglz algorithm
 to spot the similarities between the old and new tuple, instead of having
 explicit knowledge of where the column boundaries are. This has the
 advantage that it will spot similarities, and be able to compress, in more
 cases. For example, you can see a reduction in WAL size in the hundred tiny
 fields, half nulled test case above.
 
 The attached patch also just adds overhead in most cases, but the overhead
 is much smaller in the worst case. I think that's the right tradeoff here -
 we want to avoid scenarios where performance falls off the cliff. That said,
 if you usually just get a slowdown, we certainly can't make this the
 default, and if we can't turn it on by default, this probably just isn't
 worth it.
 
 The attached patch contains the variable-hash-size changes I posted in the
 Optimizing pglz compressor. But in the delta encoding function, it goes
 further than that, and contains some further micro-optimizations: the hash
 is calculated in a rolling fashion, and it uses a specialized version of the
 pglz_hist_add macro that knows that the input can't exceed 4096 bytes. Those
 changes

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-03-06 Thread Amit Kapila

On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote:
 On 04.03.2013 06:39, Amit Kapila wrote:
  On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
  On 02/05/2013 11:53 PM, Amit Kapila wrote:
  Performance data for the patch is attached with this mail.
  Conclusions from the readings (these are same as my previous
 patch):
 
 
 I've been doing investigating the pglz option further, and doing
 performance comparisons of the pglz approach and this patch. I'll begin
 with some numbers:
 
 unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45):
 
  testname | wal_generated |
 duration
 
 -+---+-
 -
 -+---+
   two short fields, no change |1245525360 |
 9.94613695144653
   two short fields, one changed   |1245536528 |
 10.146910905838
   two short fields, both changed  |1245523160 |
 11.2332470417023
   one short and one long field, no change |1054926504 |
 5.90477800369263
   ten tiny fields, all changed|1411774608 |
 13.4536008834839
   hundred tiny fields, all changed| 635739680 |
 7.57448387145996
   hundred tiny fields, half changed   | 636930560 |
 7.56888699531555
   hundred tiny fields, half nulled| 573751120 |
 6.68991994857788
 
 Amit's wal_update_changes_v10.patch:
 
  testname | wal_generated |
 duration
 
 -+---+-
 -
 -+---+
   two short fields, no change |1249722112 |
 13.0558869838715
   two short fields, one changed   |1246145408 |
 12.9947438240051
   two short fields, both changed  |1245951056 |
 13.0262880325317
   one short and one long field, no change | 678480664 |
 5.70031690597534
   ten tiny fields, all changed|1328873920 |
 20.0167419910431
   hundred tiny fields, all changed| 638149416 |
 14.4236788749695
   hundred tiny fields, half changed   | 635560504 |
 14.8770561218262
   hundred tiny fields, half nulled| 558468352 |
 16.2437210083008
 
 pglz-with-micro-optimizations-1.patch:
 
   testname | wal_generated |
 duration
 -+---+-
 -
 -+---+
   two short fields, no change |1245519008 |
 11.6702048778534
   two short fields, one changed   |1245756904 |
 11.3233819007874
   two short fields, both changed  |1249711088 |
 11.6836447715759
   one short and one long field, no change | 664741392 |
 6.44810795783997
   ten tiny fields, all changed|1328085568 |
 13.9679481983185
   hundred tiny fields, all changed| 635974088 |
 9.15514206886292
   hundred tiny fields, half changed   | 636309040 |
 9.13769292831421
   hundred tiny fields, half nulled| 496396448 |
 8.77351498603821

For some of the tests, it doesn't even execute main part of
compression/encoding. 
The reason is that the length of tuple is less than strategy min length, so
it returns from below check
in function pglz_delta_encode()
if (strategy-match_size_good = 0 || 
slen  strategy-min_input_size || 
slen  strategy-max_input_size) 
return false;

The tests for which it doesn't execute encoding are below:
two short fields, no change
two short fields, one changed 
two short fields, both changed
ten tiny fields, all changed 


For above cases, the reason of difference in timings for both approaches
with original could be due to the reason that
this check is done after some processing. So I think if we check the length
in log_heap_update, then
there should not be timing difference for above test scenario's. I can check
that once.

This optimization helps only when tuple is of length  128~200 bytes and
upto 1800 bytes (till it turns to toast), otherwise it could result in
overhead without any major WAL reduction. 
Infact I think in one of my initial patch there is a check if length of
tuple is greater than 128 bytes then perform the optimization.

I shall try to run both patches for cases when length of tuple is  128~200
bytes, as this optimization has benefits in those cases. 

 
 In each test, a table is created with a large number of identical rows,
 and fillfactor=50. Then a full-table UPDATE is performed, and the
 UPDATE is timed. Duration is the time spent in the UPDATE (lower is
 better), and wal_generated is the amount of WAL generated by the
 updates (lower is better).
 
 The summary is that Amit's patch is a small win in terms of CPU usage,
 in the best case where the table has few columns, with one

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-03-03 Thread Craig Ringer

On 02/05/2013 11:53 PM, Amit Kapila wrote:
 Performance data for the patch is attached with this mail.
 Conclusions from the readings (these are same as my previous patch):

 1. With orignal pgbench there is a max 7% WAL reduction with not much
 performance difference.
 2. With 250 record pgbench there is a max wal reduction of 35% with not
 much performance difference.
 3. With 500 and above record size in pgbench there is an improvement in
 the performance and wal reduction both.

 If the record size increases there is a gain in performance and wal
 size is reduced as well.

 Performance data for synchronous_commit = on is under progress, I shall
 post it once it is done.
 I am expecting it to be same as previous.
 Please find the performance readings for synchronous_commit = on. 

 Each run is taken for 20 min. 

 Conclusions from the readings with synchronous commit on mode:

 1. With orignal pgbench there is a max 2% WAL reduction with not much
 performance difference.
 2. With 500 record pgbench there is a max wal reduction of 3% with not much
 performance difference.
 3. With 1800 record size in pgbench there is both an improvement in the
 performance (approx 3%) as well as wal reduction (44%). 

 If the record size increases there is a very good reduction in WAL size.

The stats look fairly sane. I'm a little concerned about the apparent
trend of falling TPS in the patched vs original tests for the 1-client
test as record size increases, but it's only 0.0%-0.2%-0.4%, and the
0.4% case made other config changes too. Nonetheless, it might be wise
to check with really big records and see if the trend continues.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-03-03 Thread Amit Kapila

On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
 On 02/05/2013 11:53 PM, Amit Kapila wrote:
  Performance data for the patch is attached with this mail.
  Conclusions from the readings (these are same as my previous patch):
 
  1. With orignal pgbench there is a max 7% WAL reduction with not
 much
  performance difference.
  2. With 250 record pgbench there is a max wal reduction of 35% with
 not
  much performance difference.
  3. With 500 and above record size in pgbench there is an improvement
 in
  the performance and wal reduction both.
 
  If the record size increases there is a gain in performance and wal
  size is reduced as well.
 
  Performance data for synchronous_commit = on is under progress, I
 shall
  post it once it is done.
  I am expecting it to be same as previous.
  Please find the performance readings for synchronous_commit = on.
 
  Each run is taken for 20 min.
 
  Conclusions from the readings with synchronous commit on mode:
 
  1. With orignal pgbench there is a max 2% WAL reduction with not much
  performance difference.
  2. With 500 record pgbench there is a max wal reduction of 3% with
 not much
  performance difference.
  3. With 1800 record size in pgbench there is both an improvement in
 the
  performance (approx 3%) as well as wal reduction (44%).
 
  If the record size increases there is a very good reduction in WAL
 size.
 
 The stats look fairly sane. I'm a little concerned about the apparent
 trend of falling TPS in the patched vs original tests for the 1-client
 test as record size increases, but it's only 0.0%-0.2%-0.4%, and the
 0.4% case made other config changes too. Nonetheless, it might be wise
 to check with really big records and see if the trend continues.

For bigger size (~2000) records, it goes into toast, for which we don't do
this optimization.
This optimization is mainly for medium size records.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-01-31 Thread Amit Kapila

On Wednesday, January 30, 2013 8:32 PM Amit Kapila wrote:
 On Tuesday, January 29, 2013 7:42 PM Amit Kapila wrote:
  On Tuesday, January 29, 2013 3:53 PM Heikki Linnakangas wrote:
   On 29.01.2013 11:58, Amit Kapila wrote:
Can there be another way with which current patch code can be
 made
   better,
so that we don't need to change the encoding approach, as I am
  having
feeling that this might not be performance wise equally good.
  
   The point is that I don't want to heap_delta_encode() to know the
   internals of pglz compression. You could probably make my patch
 more
   like yours in behavior by also passing an array of offsets in the
   new tuple to check, and only checking for matches as those offsets.
 
  I think it makes sense, because if we have offsets of both new and
 old
  tuple, we can internally use memcmp to compare columns and use same
  algorithm for encoding.
  I will change the patch according to this suggestion.
 
 I have modified the patch as per above suggestion.
 Apart from passing new and old tuple offsets, I have passed
 bitmaplength also, as we need to copy the bitmap of new tuple as it is
 into Encoded WAL Tuple.
 
 Please see if such API design is okay?
 
 I shall update the README and send the performance/WAL Reduction data
 for modified patch tomorrow.

Updated patch including comments and README is attached with this mail.
This patch contain exactly same design behavior as per previous. 
It takes care of API design suggestion of Heikki.

The performance data is similar, as it is not complete, I shall send that
tomorrow.

With Regards,
Amit Kapila.


wal_update_changes_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 144 matches

Mail list logo