subject:"Re\: \[HACKERS\] \[REVIEW\] Re\: Compression of full\-page\-writes"

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-11 Thread Michael Paquier

On Wed, Mar 11, 2015 at 3:57 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 11, 2015 at 7:08 AM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

I have some minor comments

 The comments have been implemented in the attached patch.

 Thanks for updating the patch! I just changed a bit and finally pushed it.
 Thanks everyone involved in this patch!

Woohoo! Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-11 Thread Fujii Masao

On Wed, Mar 11, 2015 at 7:08 AM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

I have some minor comments

 The comments have been implemented in the attached patch.

Thanks for updating the patch! I just changed a bit and finally pushed it.
Thanks everyone involved in this patch!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-10 Thread Michael Paquier

On Mon, Mar 9, 2015 at 9:08 PM, Michael Paquier wrote:
 On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao wrote:
 Thanks for updating the patch! Attached is the refactored version of the 
 patch.

Fujii-san and I had a short chat about tuning a bit the PGLZ strategy
which is now PGLZ_strategy_default in the patch (at least 25% of
compression, etc.). In particular min_input_size which is not set at
32B is too low, and knowing that the minimum fillfactor of a relation
page is 10% this looks really too low.

For example, using the extension attached to this email able to
compress and decompress bytea strings that I have developed after pglz
has been moved to libpqcommon (contains as well a function able to get
a relation page without its hole, feel free to use it), I am seeing
that we can gain quite a lot of space even with some incompressible
data like UUID or some random float data (pages are compressed without
their hole):
1) Float table:
=# create table float_tab (id float);
CREATE TABLE
=# insert into float_tab select random() from generate_series(1, 20);
INSERT 0 20
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('float_tab'::regclass, 0, false);
-[ RECORD 1 ]+
compress_size| 329
raw_size_no_hole | 744
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('float_tab'::regclass, 0, false);
-[ RECORD 1 ]+-
compress_size| 1753
raw_size_no_hole | 4344
So that's more or less 60% saved...
2) UUID table
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('uuid_tab'::regclass, 0, false);
-[ RECORD 1 ]+
compress_size| 590
raw_size_no_hole | 904
=# insert into uuid_tab select gen_random_uuid() from generate_series(1, 100);
INSERT 0 100
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('uuid_tab'::regclass, 0, false);
-[ RECORD 1 ]+-
compress_size| 3338
raw_size_no_hole | 5304
And in this case we are close to 40% saved...

At least, knowing that with the header there are at least 24B used on
a page, what about increasing min_input_size to something like 128B or
256B? I don't think that this is a blocker for this patch as most of
the relation pages are going to have far more data than that so they
will be unconditionally compressed, but there is definitely something
we could do in this area later on, perhaps even we could do
improvement with the other parameters like the compression rate. So
that's something to keep in mind...
-- 
Michael


compress_test.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-10 Thread Fujii Masao

On Mon, Mar 9, 2015 at 9:08 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila rahila.s...@nttdata.com 
 wrote:
 Please find attached  a patch. As discussed, flag to denote compression 
 and presence of hole in block image has been added in 
 XLogRecordImageHeader rather than block header.

 Thanks for updating the patch! Attached is the refactored version of the 
 patch.

 Cool. Thanks!

 I have some minor comments:

Thanks for the comments!

 +Turning this parameter on can reduce the WAL volume without
 Turning valueon/ this parameter

That tag is not used in other place in config.sgml, so I'm not sure if
that's really necessary.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-10 Thread Rahila Syed

Hello,

I have some minor comments

The comments have been implemented in the attached patch.

I think that extra parenthesis should be used for the first expression
with BKPIMAGE_HAS_HOLE.
Parenthesis have been added to improve code readability.

Thank you,
Rahila Syed


On Mon, Mar 9, 2015 at 5:38 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
  michael.paqu...@gmail.com wrote:
  On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila rahila.s...@nttdata.com
 wrote:
  Please find attached  a patch. As discussed, flag to denote
 compression and presence of hole in block image has been added in
 XLogRecordImageHeader rather than block header.
 
  Thanks for updating the patch! Attached is the refactored version of the
 patch.

 Cool. Thanks!

 I have some minor comments:

 +The default value is literaloff/
 Dot at the end of this sentence.

 +Turning this parameter on can reduce the WAL volume without
 Turning valueon/ this parameter

 +but at the cost of some extra CPU time by the compression during
 +WAL logging and the decompression during WAL replay.
 Isn't a verb missing here, for something like that:
 but at the cost of some extra CPU spent on the compression during WAL
 logging and on the decompression during WAL replay.

 + * This can reduce the WAL volume, but at some extra cost of CPU time
 + * by the compression during WAL logging.
 Er, similarly some extra cost of CPU spent on the compression

 +   if (blk-bimg_info  BKPIMAGE_HAS_HOLE 
 +   (blk-hole_offset == 0 ||
 +blk-hole_length == 0 ||
 I think that extra parenthesis should be used for the first expression
 with BKPIMAGE_HAS_HOLE.

 +   if (blk-bimg_info 
 BKPIMAGE_IS_COMPRESSED 
 +   blk-bimg_len == BLCKSZ)
 +   {
 Same here.

 +   /*
 +* cross-check that hole_offset == 0
 and hole_length == 0
 +* if the HAS_HOLE flag is set.
 +*/
 I think that you mean here that this happens when the flag is *not* set.

 +   /*
 +* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED,
 +* an XLogRecordBlockCompressHeader follows
 +*/
 Maybe a struct should be added for an XLogRecordBlockCompressHeader
 struct. And a dot at the end of the sentence should be added?

 Regards,
 --
 Michael


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Support-compression-full-page-writes-in-WAL_v25.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-09 Thread Michael Paquier

On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Please find attached  a patch. As discussed, flag to denote compression and 
 presence of hole in block image has been added in XLogRecordImageHeader 
 rather than block header.

 Thanks for updating the patch! Attached is the refactored version of the 
 patch.

Cool. Thanks!

I have some minor comments:

+The default value is literaloff/
Dot at the end of this sentence.

+Turning this parameter on can reduce the WAL volume without
Turning valueon/ this parameter

+but at the cost of some extra CPU time by the compression during
+WAL logging and the decompression during WAL replay.
Isn't a verb missing here, for something like that:
but at the cost of some extra CPU spent on the compression during WAL
logging and on the decompression during WAL replay.

+ * This can reduce the WAL volume, but at some extra cost of CPU time
+ * by the compression during WAL logging.
Er, similarly some extra cost of CPU spent on the compression

+   if (blk-bimg_info  BKPIMAGE_HAS_HOLE 
+   (blk-hole_offset == 0 ||
+blk-hole_length == 0 ||
I think that extra parenthesis should be used for the first expression
with BKPIMAGE_HAS_HOLE.

+   if (blk-bimg_info  BKPIMAGE_IS_COMPRESSED 
+   blk-bimg_len == BLCKSZ)
+   {
Same here.

+   /*
+* cross-check that hole_offset == 0
and hole_length == 0
+* if the HAS_HOLE flag is set.
+*/
I think that you mean here that this happens when the flag is *not* set.

+   /*
+* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED,
+* an XLogRecordBlockCompressHeader follows
+*/
Maybe a struct should be added for an XLogRecordBlockCompressHeader
struct. And a dot at the end of the sentence should be added?

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-09 Thread Fujii Masao

On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Please find attached  a patch. As discussed, flag to denote compression and 
 presence of hole in block image has been added in XLogRecordImageHeader 
 rather than block header.

Thanks for updating the patch! Attached is the refactored version of the patch.

Regards,

-- 
Fujii Masao
*** a/contrib/pg_xlogdump/pg_xlogdump.c
--- b/contrib/pg_xlogdump/pg_xlogdump.c
***
*** 359,376  XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
  	rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord;
  
  	/*
! 	 * Calculate the amount of FPI data in the record. Each backup block
! 	 * takes up BLCKSZ bytes, minus the hole length.
  	 *
  	 * XXX: We peek into xlogreader's private decoded backup blocks for the
! 	 * hole_length. It doesn't seem worth it to add an accessor macro for
! 	 * this.
  	 */
  	fpi_len = 0;
  	for (block_id = 0; block_id = record-max_block_id; block_id++)
  	{
  		if (XLogRecHasBlockImage(record, block_id))
! 			fpi_len += BLCKSZ - record-blocks[block_id].hole_length;
  	}
  
  	/* Update per-rmgr statistics */
--- 359,375 
  	rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord;
  
  	/*
! 	 * Calculate the amount of FPI data in the record.
  	 *
  	 * XXX: We peek into xlogreader's private decoded backup blocks for the
! 	 * bimg_len indicating the length of FPI data. It doesn't seem worth it to
! 	 * add an accessor macro for this.
  	 */
  	fpi_len = 0;
  	for (block_id = 0; block_id = record-max_block_id; block_id++)
  	{
  		if (XLogRecHasBlockImage(record, block_id))
! 			fpi_len += record-blocks[block_id].bimg_len;
  	}
  
  	/* Update per-rmgr statistics */
***
*** 465,473  XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
     blk);
  			if (XLogRecHasBlockImage(record, block_id))
  			{
! printf( (FPW); hole: offset: %u, length: %u\n,
! 	   record-blocks[block_id].hole_offset,
! 	   record-blocks[block_id].hole_length);
  			}
  			putchar('\n');
  		}
--- 464,485 
     blk);
  			if (XLogRecHasBlockImage(record, block_id))
  			{
! if (record-blocks[block_id].bimg_info 
! 	BKPIMAGE_IS_COMPRESSED)
! {
! 	printf( (FPW); hole: offset: %u, length: %u, compression saved: %u\n,
! 		   record-blocks[block_id].hole_offset,
! 		   record-blocks[block_id].hole_length,
! 		   BLCKSZ -
! 		   record-blocks[block_id].hole_length -
! 		   record-blocks[block_id].bimg_len);
! }
! else
! {
! 	printf( (FPW); hole: offset: %u, length: %u\n,
! 		   record-blocks[block_id].hole_offset,
! 		   record-blocks[block_id].hole_length);
! }
  			}
  			putchar('\n');
  		}
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2282,2287  include_dir 'conf.d'
--- 2282,2311 
/listitem
   /varlistentry
  
+  varlistentry id=guc-wal-compression xreflabel=wal_compression
+   termvarnamewal_compression/varname (typeboolean/type)
+   indexterm
+primaryvarnamewal_compression/ configuration parameter/primary
+   /indexterm
+   /term
+   listitem
+para
+ When this parameter is literalon/, the productnamePostgreSQL/
+ server compresses a full page image written to WAL when
+ xref linkend=guc-full-page-writes is on or during a base backup.
+ A compressed page image will be decompressed during WAL replay.
+ The default value is literaloff/
+/para
+ 
+para
+ Turning this parameter on can reduce the WAL volume without
+ increasing the risk of unrecoverable data corruption,
+ but at the cost of some extra CPU time by the compression during
+ WAL logging and the decompression during WAL replay.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-wal-buffers xreflabel=wal_buffers
termvarnamewal_buffers/varname (typeinteger/type)
indexterm
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 89,94  char	   *XLogArchiveCommand = NULL;
--- 89,95 
  bool		EnableHotStandby = false;
  bool		fullPageWrites = true;
  bool		wal_log_hints = false;
+ bool		wal_compression = false;
  bool		log_checkpoints = false;
  int			sync_method = DEFAULT_SYNC_METHOD;
  int			wal_level = WAL_LEVEL_MINIMAL;
*** a/src/backend/access/transam/xloginsert.c
--- b/src/backend/access/transam/xloginsert.c
***
*** 24,35 
--- 24,39 
  #include access/xlog_internal.h
  #include access/xloginsert.h
  #include catalog/pg_control.h
+ #include common/pg_lzcompress.h
  #include miscadmin.h
  #include storage/bufmgr.h
  #include storage/proc.h
  #include utils/memutils.h
  #include pg_trace.h
  
+ /* Buffer size required to store a compressed version of

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-08 Thread Fujii Masao

On Mon, Feb 16, 2015 at 9:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-16 20:55:20 +0900, Michael Paquier wrote:
 On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila rahila.s...@nttdata.com
 wrote:

 
  Regarding the sanity checks that have been added recently. I think that
  they are useful but I am suspecting as well that only a check on the record
  CRC is done because that's reliable enough and not doing those checks
  accelerates a bit replay. So I am thinking that we should simply replace
  them by assertions.
 
  Removing the checks makes sense as CRC ensures correctness . Moreover ,as
  error message for invalid length of record is present in the code ,
  messages for invalid block length can be redundant.
 
  Checks have been replaced by assertions in the attached patch.
 

 After more thinking, we may as well simply remove them, an error with CRC
 having high chances to complain before reaching this point...

 Surely not. The existing code explicitly does it like
 if (blk-has_data  blk-data_len == 0)
 report_invalid_record(state,
   BKPBLOCK_HAS_DATA set, but no data included at 
 %X/%X,
   (uint32) (state-ReadRecPtr  32), (uint32) 
 state-ReadRecPtr);
 these cross checks are important. And I see no reason to deviate from
 that. The CRC sum isn't foolproof - we intentionally do checks at
 several layers. And, as you can see from some other locations, we
 actually try to *not* fatally error out when hitting them at times - so
 an Assert also is wrong.

 Heikki:
 /* cross-check that the HAS_DATA flag is set iff data_length  0 */
 if (blk-has_data  blk-data_len == 0)
 report_invalid_record(state,
   BKPBLOCK_HAS_DATA set, but no data included at 
 %X/%X,
   (uint32) 
 (state-ReadRecPtr  32), (uint32) state-ReadRecPtr);
 if (!blk-has_data  blk-data_len != 0)
 report_invalid_record(state,
  BKPBLOCK_HAS_DATA not set, but data length is %u at %X/%X,
   (unsigned int) 
 blk-data_len,
   
 (uint32) (state-ReadRecPtr  32), (uint32) state-ReadRecPtr);
 those look like they're missing a goto err; to me.

Yes. I pushed the fix. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Fujii Masao

On Thu, Mar 5, 2015 at 10:28 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
 Please find attached  a patch. As discussed, flag to denote
 compression and presence of hole in block image has been added in
 XLogRecordImageHeader rather than block header.

 FWIW, I personally won't commit it with things done that way. I think
 it's going the wrong way, leading to a harder to interpret and less
 flexible format.  I'm not going to further protest if Fujii or Heikki
 commit it this way though.

I'm pretty sure that we can discuss the *better* WAL format even after
committing this patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Syed, Rahila


Hello,

Please find attached  a patch. As discussed, flag to denote compression and 
presence of hole in block image has been added in XLogRecordImageHeader rather 
than block header.  

Following are WAL numbers based on attached  test script  posted by Michael 
earlier in the thread.
 
  WAL generated
FPW compression on   122.032 MB

FPW compression off   155.223 MB

HEAD  155.236 MB 

Compression : 21 %
Number of block images generated in WAL :   63637


Thank you,
Rahila Syed


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


Support-compression-of-full-page-writes-in-WAL_v23.patch
Description: Support-compression-of-full-page-writes-in-WAL_v23.patch


compress_run.bash
Description: compress_run.bash

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Michael Paquier

On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Please find attached  a patch. As discussed, flag to denote compression and 
 presence of hole in block image has been added in XLogRecordImageHeader 
 rather than block header.

 Following are WAL numbers based on attached  test script  posted by Michael 
 earlier in the thread.

   WAL generated
 FPW compression on   122.032 MB

 FPW compression off   155.223 MB

 HEAD  155.236 MB

 Compression : 21 %
 Number of block images generated in WAL :   63637

ISTM that we are getting a nice thing here. I tested the patch and WAL
replay is working correctly.

Some nitpicky comments...

+ * bkp_info stores flags for information about the backup block image
+ * BKPIMAGE_IS_COMPRESSED is used to identify if a given block image
is compressed.
+ * BKPIMAGE_WITH_HOLE is used to identify the presence of a hole in a
block image.
+ * If the block image has no hole, it is ensured that the raw size of
a compressed
+ * block image is equal to BLCKSZ, hence the contents of
+ * XLogRecordBlockImageCompressionInfo are not necessary.
Take care of the limit of 80 characters per line. (Perhaps you could
run pgindent on your code before sending a patch?). The first line of
this paragraph is a sentence in itself, no?

In xlogreader.c, blk-with_hole is a boolean, you could remove the ==0
and ==1 it is compared with.

+  /*
+   * Length of a block image must be less than BLCKSZ
+   * if the block has hole
+   */
if the block has a hole. (End of the sentence needs a dot.)

+   /*
+* Length of a block image must be equal to BLCKSZ
+* if the block does not have hole
+*/
if the block does not have a hole.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Andres Freund

On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
 Please find attached  a patch. As discussed, flag to denote
 compression and presence of hole in block image has been added in
 XLogRecordImageHeader rather than block header.

FWIW, I personally won't commit it with things done that way. I think
it's going the wrong way, leading to a harder to interpret and less
flexible format.  I'm not going to further protest if Fujii or Heikki
commit it this way though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-04 Thread Syed, Rahila

Hello,

Are there any other flag bits that we should or are planning to add into WAL 
header newly, except the above two? If yes and they are required by even a 
block which doesn't have an image, I will change my mind and agree to add 
something like chunk ID to a block header. 
But I guess the answer of the question is No. Since the flag bits now we are 
thinking to add are required only by a block having an image, adding them into 
a block header (instead of block image header) seems a waste of bytes in WAL. 
So I concur with Michael.
I agree.
As per my understanding, this change of xlog format was to provide for future 
enhancement which would need flags relevant to entire block.
But as mentioned, currently the flags being added are related to block image 
only. Hence for this patch it makes sense to add a field to 
XLogRecordImageHeader rather than block header. 
This will also save bytes per WAL record. 

Thank you,
Rahila Syed

 


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Syed, Rahila

Hello,

It would be good to get those problems fixed first. Could you send an updated 
patch?

Please find attached updated patch with WAL replay error fixed. The patch 
follows chunk ID approach of xlog format.

Following are brief measurement numbers. 
  WAL
FPW compression on   122.032 MB

FPW compression off   155.239 MB

HEAD  155.236 MB


Thank you,
Rahila Syed


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-of-full-page-writes_v22.patch
Description: Support-compression-of-full-page-writes_v22.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Michael Paquier

On Wed, Mar 4, 2015 at 12:41 AM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Please find attached updated patch with WAL replay error fixed. The patch 
 follows chunk ID approach of xlog format.

(Review done independently of the chunk_id stuff being good or not,
already gave my opinion on the matter).

  * readRecordBufSize is set to the new buffer size.
- *
+
The patch has some noise diffs.

You may want to change the values of BKPBLOCK_WILL_INIT and
BKPBLOCK_SAME_REL to respectively 0x01 and 0x02.

+   uint8   chunk_id = 0;
+   chunk_id |= XLR_CHUNK_BLOCK_REFERENCE;

Why not simply that:
chunk_id = XLR_CHUNK_BLOCK_REFERENCE;

+#define XLR_CHUNK_ID_DATA_SHORT255
+#define XLR_CHUNK_ID_DATA_LONG 254
Why aren't those just using one bit as well? This seems inconsistent
with the rest.

+ if ((blk-with_hole == 0  blk-hole_offset != 0) ||
+ (blk-with_hole == 1  blk-hole_offset = 0))
In xlogreader.c blk-with_hole is defined as a boolean but compared
with an integer, could you remove the ==0 and ==1 portions for
clarity?

-   goto err;
+   goto err;
}
}
-
if (remaining != datatotal)
This gathers incorrect code alignment and unnecessary diffs.

 typedef struct XLogRecordBlockHeader
 {
+   /* Chunk ID precedes */
+
uint8   id;
What prevents the declaration of chunk_id as an int8 here instead of
this comment? This is confusing.

 Following are brief measurement numbers.
   WAL
 FPW compression on   122.032 MB
 FPW compression off   155.239 MB
 HEAD   155.236 MB

What is the test run in this case? How many block images have been
generated in WAL for each case? You could gather some of those numbers
with pg_xlogdump --stat for example.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Fujii Masao

On Tue, Mar 3, 2015 at 9:34 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Mar 3, 2015 at 9:24 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-03-03 08:59:30 +0900, Michael Paquier wrote:
 Already mentioned upthread, but I agree with Fujii-san here: adding
 information related to the state of a block image in
 XLogRecordBlockHeader makes little sense because we are not sure to
 have a block image, perhaps there is only data associated to it, and
 that we should control that exclusively in XLogRecordBlockImageHeader
 and let the block ID alone for now.

 This argument doesn't make much sense to me. The flag byte could very
 well indicate 'block reference without image following' vs 'block
 reference with data + hole following' vs 'block reference with
 compressed data following'.

 Information about the state of a block is decoupled with its
 existence, aka in the block header, we should control if:
 - record has data
 - record has a block
 And in the block image header, we control if the block is:
 - compressed or not
 - has a hole or not.

Are there any other flag bits that we should or are planning to add into
WAL header newly, except the above two? If yes and they are required by even
a block which doesn't have an image, I will change my mind and agree to
add something like chunk ID to a block header. But I guess the answer of the
question is No. Since the flag bits now we are thinking to add are required
only by a block having an image, adding them into a block header (instead of
block image header) seems a waste of bytes in WAL. So I concur with Michael.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Andres Freund

On 2015-03-03 08:59:30 +0900, Michael Paquier wrote:
 Already mentioned upthread, but I agree with Fujii-san here: adding
 information related to the state of a block image in
 XLogRecordBlockHeader makes little sense because we are not sure to
 have a block image, perhaps there is only data associated to it, and
 that we should control that exclusively in XLogRecordBlockImageHeader
 and let the block ID alone for now.

This argument doesn't make much sense to me. The flag byte could very
well indicate 'block reference without image following' vs 'block
reference with data + hole following' vs 'block reference with
compressed data following'.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Michael Paquier

On Tue, Mar 3, 2015 at 9:24 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-03-03 08:59:30 +0900, Michael Paquier wrote:
 Already mentioned upthread, but I agree with Fujii-san here: adding
 information related to the state of a block image in
 XLogRecordBlockHeader makes little sense because we are not sure to
 have a block image, perhaps there is only data associated to it, and
 that we should control that exclusively in XLogRecordBlockImageHeader
 and let the block ID alone for now.

 This argument doesn't make much sense to me. The flag byte could very
 well indicate 'block reference without image following' vs 'block
 reference with data + hole following' vs 'block reference with
 compressed data following'.

Information about the state of a block is decoupled with its
existence, aka in the block header, we should control if:
- record has data
- record has a block
And in the block image header, we control if the block is:
- compressed or not
- has a hole or not.
Are you willing to sacrifice bytes in the block header to control if a
block is compressed or has a hole even if the block has only data but
no image?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Michael Paquier

On Tue, Mar 3, 2015 at 5:17 AM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

When I test the WAL or replication related features, I usually run
make installcheck and pgbench against the master at the same time
after setting up the replication environment.
 I will conduct these tests before sending updated version.

Seems this increases the header size of WAL record even if no backup block
 image is included. Right?
 Yes, this increases the header size of WAL record by 1 byte for every block
 reference even if it has no backup block image.

Isn't it better to add the flag info about backup block image into
 XLogRecordBlockImageHeader rather than XLogRecordBlockHeader
 Yes , this will make the code extensible,readable and  will save couple of
 bytes per record.
  But the current approach is to provide a chunk ID identifying different
 xlog record fragments like main data , block references etc.
 Currently , block ID is used to identify record fragments which can be
 either XLR_BLOCK_ID_DATA_SHORT , XLR_BLOCK_ID_DATA_LONG or actual block ID.
 This can be replaced by chunk ID to separate it from block ID. Block ID can
 be used to number the block fragments whereas chunk ID can be used to
 distinguish between main data fragments and block references. Chunk ID of
 block references can contain information about presence of data, image ,
 hole and compression.
 Chunk ID for main data fragments remains as it is . This approach provides
 for readability and extensibility.

Already mentioned upthread, but I agree with Fujii-san here: adding
information related to the state of a block image in
XLogRecordBlockHeader makes little sense because we are not sure to
have a block image, perhaps there is only data associated to it, and
that we should control that exclusively in XLogRecordBlockImageHeader
and let the block ID alone for now. Hence we'd better have 1 extra
int8 in XLogRecordBlockImageHeader with now 2 flags:
- Is block compressed or not?
- Does block have a hole?
Perhaps this will not be considered as ugly, and this leaves plenty of
room for storing a version number for compression.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Fujii Masao

On Fri, Feb 27, 2015 at 12:44 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed rahilasye...@gmail.com wrote:
Even this patch doesn't work fine. The standby emit the following
error messages.

 Yes this bug remains unsolved. I am still working on resolving this.

 Following chunk IDs have been added in the attached patch as suggested
 upthread.
 +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
 +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
 +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08

 XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
 XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
 and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.

 Before sending a new version, be sure that this get fixed by for
 example building up a master with a standby replaying WAL, and running
 make installcheck-world or similar. If the standby does not complain
 at all, you have good chances to not have bugs. You could also build
 with WAL_DEBUG to check record consistency.

+1

When I test the WAL or replication related features, I usually run
make installcheck and pgbench against the master at the same time
after setting up the replication environment.

 typedef struct XLogRecordBlockHeader
 {
+uint8chunk_id;/* xlog fragment id */
 uint8id;/* block reference ID */

Seems this increases the header size of WAL record even if no backup block
image is included. Right? Isn't it better to add the flag info about backup
block image into XLogRecordBlockImageHeader rather than XLogRecordBlockHeader?
Originally we borrowed one or two bits from its existing fields to minimize
the header size, but we can just add new flag field if we prefer
the extensibility and readability of the code.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Rahila Syed

Hello,

When I test the WAL or replication related features, I usually run
make installcheck and pgbench against the master at the same time
after setting up the replication environment.
I will conduct these tests before sending updated version.

Seems this increases the header size of WAL record even if no backup block
image is included. Right?
Yes, this increases the header size of WAL record by 1 byte for every block
reference even if it has no backup block image.

Isn't it better to add the flag info about backup block image into
XLogRecordBlockImageHeader rather than XLogRecordBlockHeader
Yes , this will make the code extensible,readable and  will save couple of
bytes per record.
 But the current approach is to provide a chunk ID identifying different
xlog record fragments like main data , block references etc.
Currently , block ID is used to identify record fragments which can be
either XLR_BLOCK_ID_DATA_SHORT , XLR_BLOCK_ID_DATA_LONG or actual block ID.
This can be replaced by chunk ID to separate it from block ID. Block ID can
be used to number the block fragments whereas chunk ID can be used to
distinguish between main data fragments and block references. Chunk ID of
block references can contain information about presence of data, image ,
hole and compression.
Chunk ID for main data fragments remains as it is . This approach provides
for readability and extensibility.

Thank you,
Rahila Syed

On Mon, Mar 2, 2015 at 3:43 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Fri, Feb 27, 2015 at 12:44 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier
  michael.paqu...@gmail.com wrote:
  On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed rahilasye...@gmail.com
 wrote:
 Even this patch doesn't work fine. The standby emit the following
 error messages.
 
  Yes this bug remains unsolved. I am still working on resolving this.
 
  Following chunk IDs have been added in the attached patch as suggested
  upthread.
  +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
  +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
  +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08
 
  XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
  XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
  and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.
 
  Before sending a new version, be sure that this get fixed by for
  example building up a master with a standby replaying WAL, and running
  make installcheck-world or similar. If the standby does not complain
  at all, you have good chances to not have bugs. You could also build
  with WAL_DEBUG to check record consistency.

 +1

 When I test the WAL or replication related features, I usually run
 make installcheck and pgbench against the master at the same time
 after setting up the replication environment.

  typedef struct XLogRecordBlockHeader
  {
 +uint8chunk_id;/* xlog fragment id */
  uint8id;/* block reference ID */

 Seems this increases the header size of WAL record even if no backup block
 image is included. Right? Isn't it better to add the flag info about backup
 block image into XLogRecordBlockImageHeader rather than
 XLogRecordBlockHeader?
 Originally we borrowed one or two bits from its existing fields to minimize
 the header size, but we can just add new flag field if we prefer
 the extensibility and readability of the code.

 Regards,

 --
 Fujii Masao

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Fujii Masao

On Tue, Feb 24, 2015 at 6:46 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
 Hello ,

I've not read this logic yet, but ISTM there is a bug in that new WAL format 
because I got the following error and the startup process could not replay 
any WAL records when I set up replication and enabled wal_compression.

LOG:  record with invalid length at 0/3B0
LOG:  record with invalid length at 0/3000518
LOG:  Invalid block length in record 0/30005A0
LOG:  Invalid block length in record 0/3000D60 ...

 Please fine attached patch which replays WAL records.

Even this patch doesn't work fine. The standby emit the following
error messages.

LOG:  invalid block_id 255 at 0/3B0
LOG:  record with invalid length at 0/30017F0
LOG:  invalid block_id 255 at 0/3001878
LOG:  record with invalid length at 0/30027D0
LOG:  record with invalid length at 0/3002E58
...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Michael Paquier

On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

Even this patch doesn't work fine. The standby emit the following
error messages.

 Yes this bug remains unsolved. I am still working on resolving this.

 Following chunk IDs have been added in the attached patch as suggested
 upthread.
 +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
 +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
 +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08

 XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
 XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
 and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.

Before sending a new version, be sure that this get fixed by for
example building up a master with a standby replaying WAL, and running
make installcheck-world or similar. If the standby does not complain
at all, you have good chances to not have bugs. You could also build
with WAL_DEBUG to check record consistency.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Michael Paquier

On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed rahilasye...@gmail.com wrote:
Even this patch doesn't work fine. The standby emit the following
error messages.

 Yes this bug remains unsolved. I am still working on resolving this.

 Following chunk IDs have been added in the attached patch as suggested
 upthread.
 +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
 +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
 +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08

 XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
 XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
 and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.

 Before sending a new version, be sure that this get fixed by for
 example building up a master with a standby replaying WAL, and running
 make installcheck-world or similar. If the standby does not complain
 at all, you have good chances to not have bugs. You could also build
 with WAL_DEBUG to check record consistency.

It would be good to get those problems fixed first. Could you send an
updated patch? I'll look into it in more details. For the time being I
am switching this patch to Waiting on Author.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-24 Thread Andres Freund

On 2015-02-24 16:03:41 +0900, Michael Paquier wrote:
 Looking at this code, I think that it is really confusing to move the data
 related to the status of the backup block out of XLogRecordBlockImageHeader
 to the chunk ID itself that may *not* include a backup block at all as it
 is conditioned by the presence of BKPBLOCK_HAS_IMAGE.

What's the problem here? We could actually now easily remove
BKPBLOCK_HAS_IMAGE and replace it by a chunk id.

 the idea of having the backup block data in its dedicated header with bits
 stolen from the existing fields, perhaps by rewriting it to something like
 that:
 typedef struct XLogRecordBlockImageHeader {
 uint32 length:15,
  hole_length:15,
  is_compressed:1,
  is_hole:1;
 } XLogRecordBlockImageHeader;
 Now perhaps I am missing something and this is really ugly ;)

I think it's fantastically ugly. We'll also likely want different
compression formats and stuff in the not too far away future. This will
just end up being a pain.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-23 Thread Fujii Masao

On Mon, Feb 23, 2015 at 5:28 PM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

 Attached is a patch which has following changes,

 As suggested above block ID in xlog structs has been replaced by chunk ID.
 Chunk ID is used to distinguish between different types of xlog record
 fragments.
 Like,
 XLR_CHUNK_ID_DATA_SHORT
 XLR_CHUNK_ID_DATA_LONG
 XLR_CHUNK_BKP_COMPRESSED
 XLR_CHUNK_BKP_WITH_HOLE

 In block references, block ID follows the chunk ID. Here block ID retains
 its functionality.
 This approach increases data by 1 byte for each block reference in an xlog
 record. This approach separates ID referring different fragments of xlog
 record from the actual block ID which is used to refer  block references in
 xlog record.

I've not read this logic yet, but ISTM there is a bug in that new WAL format
because I got the following error and the startup process could not replay
any WAL records when I set up replication and enabled wal_compression.

LOG:  record with invalid length at 0/3B0
LOG:  record with invalid length at 0/3000518
LOG:  Invalid block length in record 0/30005A0
LOG:  Invalid block length in record 0/3000D60
...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-23 Thread Rahila Syed

Hello,

Attached is a patch which has following changes,

As suggested above block ID in xlog structs has been replaced by chunk ID.
Chunk ID is used to distinguish between different types of xlog record
fragments.
Like,
XLR_CHUNK_ID_DATA_SHORT
XLR_CHUNK_ID_DATA_LONG
XLR_CHUNK_BKP_COMPRESSED
XLR_CHUNK_BKP_WITH_HOLE

In block references, block ID follows the chunk ID. Here block ID retains
its functionality.
This approach increases data by 1 byte for each block reference in an xlog
record. This approach separates ID referring different fragments of xlog
record from the actual block ID which is used to refer  block references in
xlog record.

Following are WAL numbers for each scenario,

 WAL
FPW compression on   121.652 MB

FPW compression off   148.998 MB

HEAD  148.764 MB

Compression remains nearly same as before. There is some difference in WAL
between HEAD and HEAD+patch+compression OFF. This difference corresponds to
1 byte increase with each block reference of xlog record.

Thank you,
Rahila Syed





On Wed, Feb 18, 2015 at 7:53 PM, Syed, Rahila rahila.s...@nttdata.com
wrote:

 Hello,

 I think we should change the xlog format so that the block_id (which
 currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the
 block id but something like XLR_CHUNK_ID. Which is used as is for
 XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
 XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
 XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
 block id following the chunk id.

 Yes, that'll increase the amount of data for a backup block by 1 byte,
 but I think that's worth it. I'm pretty sure we will be happy about the
 added extensibility pretty soon.

 To clarify my understanding of the above change,

 Instead of a block id to reference different fragments of an xlog record ,
 a single byte field chunk_id  should be used.  chunk_id  will be same as
 XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments.
 But for block references, it will take store following values in order to
 store information about the backup blocks.
 #define XLR_CHUNK_BKP_COMPRESSED  0x01
 #define XLR_CHUNK_BKP_WITH_HOLE 0x02
 ...

 The new xlog format should look like follows,

 Fixed-size header (XLogRecord struct)
 Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
 XLogRecordBlockHeader
 Chunk_id
  XLogRecordBlockHeader
 ...
 ...
 Chunk_id ( rename id field of the XLogRecordDataHeader struct)
 XLogRecordDataHeader[Short|Long]
  block data
  block data
  ...
  main data

 I will post a patch based on this.

 Thank you,
 Rahila Syed

 -Original Message-
 From: Andres Freund [mailto:and...@2ndquadrant.com]
 Sent: Monday, February 16, 2015 5:26 PM
 To: Syed, Rahila
 Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
 Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

 On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
  - * As a trivial form of data compression, the XLOG code is aware that
  - * PG data pages usually contain an unused hole in the middle,
  which
  - * contains only zero bytes.  If hole_length  0 then we have removed
  - * such a hole from the stored data (and it's not counted in the
  - * XLOG record's CRC, either).  Hence, the amount of block data
  actually
  - * present is BLCKSZ - hole_length bytes.
  + * Block images are able to do several types of compression:
  + * - When wal_compression is off, as a trivial form of compression,
  + the
  + * XLOG code is aware that PG data pages usually contain an unused
 hole
  + * in the middle, which contains only zero bytes.  If length  BLCKSZ
  + * then we have removed such a hole from the stored data (and it is
  + * not counted in the XLOG record's CRC, either).  Hence, the amount
  + * of block data actually present is length bytes.  The hole offset
  + * on page is defined using hole_offset.
  + * - When wal_compression is on, block images are compressed using a
  + * compression algorithm without their hole to improve compression
  + * process of the page. length corresponds in this case to the
  + length
  + * of the compressed block. hole_offset is the hole offset of the
  + page,
  + * and the length of the uncompressed block is defined by
  + raw_length,
  + * whose data is included in the record only when compression is
  + enabled
  + * and with_hole is set to true, see below.
  + *
  + * is_compressed is used to identify if a given block image is
  + compressed
  + * or not. Maximum page size allowed on the system being 32k, the
  + hole
  + * offset cannot be more than 15-bit long so the last free bit is
  + used to
  + * store the compression state of block image. If the maximum page
  + size
  + * allowed is increased to a value higher than that, we should
  + consider
  + * increasing this structure size as well

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Michael Paquier

On Mon, Feb 16, 2015 at 8:55 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
 - * As a trivial form of data compression, the XLOG code is aware that
 - * PG data pages usually contain an unused hole in the middle, which
 - * contains only zero bytes.  If hole_length  0 then we have removed
 - * such a hole from the stored data (and it's not counted in the
 - * XLOG record's CRC, either).  Hence, the amount of block data actually
 - * present is BLCKSZ - hole_length bytes.
 + * Block images are able to do several types of compression:
 + * - When wal_compression is off, as a trivial form of compression, the
 + * XLOG code is aware that PG data pages usually contain an unused hole
 + * in the middle, which contains only zero bytes.  If length  BLCKSZ
 + * then we have removed such a hole from the stored data (and it is
 + * not counted in the XLOG record's CRC, either).  Hence, the amount
 + * of block data actually present is length bytes.  The hole offset
 + * on page is defined using hole_offset.
 + * - When wal_compression is on, block images are compressed using a
 + * compression algorithm without their hole to improve compression
 + * process of the page. length corresponds in this case to the length
 + * of the compressed block. hole_offset is the hole offset of the page,
 + * and the length of the uncompressed block is defined by raw_length,
 + * whose data is included in the record only when compression is enabled
 + * and with_hole is set to true, see below.
 + *
 + * is_compressed is used to identify if a given block image is compressed
 + * or not. Maximum page size allowed on the system being 32k, the hole
 + * offset cannot be more than 15-bit long so the last free bit is used to
 + * store the compression state of block image. If the maximum page size
 + * allowed is increased to a value higher than that, we should consider
 + * increasing this structure size as well, but this would increase the
 + * length of block header in WAL records with alignment.
 + *
 + * with_hole is used to identify the presence of a hole in a block image.
 + * As the length of a block cannot be more than 15-bit long, the extra bit 
 in
 + * the length field is used for this identification purpose. If the block 
 image
 + * has no hole, it is ensured that the raw size of a compressed block image 
 is
 + * equal to BLCKSZ, hence the contents of 
 XLogRecordBlockImageCompressionInfo
 + * are not necessary.
   */
  typedef struct XLogRecordBlockImageHeader
  {
 - uint16  hole_offset;/* number of bytes before hole */
 - uint16  hole_length;/* number of bytes in hole */
 + uint16  length:15,  /* length of block data in 
 record */
 + with_hole:1;/* status of hole in the block 
 */
 +
 + uint16  hole_offset:15, /* number of bytes before hole */
 + is_compressed:1;/* compression status of image */
 +
 + /* Followed by the data related to compression if block is compressed 
 */
  } XLogRecordBlockImageHeader;

 Yikes, this is ugly.

 I think we should change the xlog format so that the block_id (which
 currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
 the block id but something like XLR_CHUNK_ID. Which is used as is for
 XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
 XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
 XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
 block id following the chunk id.
 Yes, that'll increase the amount of data for a backup block by 1 byte,
 but I think that's worth it. I'm pretty sure we will be happy about the
 added extensibility pretty soon.

Yeah, that would help for readability and does not cost much compared
to BLCKSZ. Still could you explain what kind of extensibility you have
in mind except code readability? It is hard to make a nice picture
with only the paper and the pencils, and the current patch approach
has been taken to minimize the record length, particularly for users
who do not care about WAL compression.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Syed, Rahila

Hello,

I think we should change the xlog format so that the block_id (which currently 
is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but 
something like XLR_CHUNK_ID. Which is used as is for 
XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to 
XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... 
The BKP blocks will then follow, storing the block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte, but I 
think that's worth it. I'm pretty sure we will be happy about the added 
extensibility pretty soon.

To clarify my understanding of the above change,

Instead of a block id to reference different fragments of an xlog record , a 
single byte field chunk_id  should be used.  chunk_id  will be same as 
XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments. 
But for block references, it will take store following values in order to store 
information about the backup blocks.
#define XLR_CHUNK_BKP_COMPRESSED  0x01  
#define XLR_CHUNK_BKP_WITH_HOLE 0x02
...

The new xlog format should look like follows,

Fixed-size header (XLogRecord struct)
Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
XLogRecordBlockHeader 
Chunk_id
 XLogRecordBlockHeader 
...
...
Chunk_id ( rename id field of the XLogRecordDataHeader struct) 
XLogRecordDataHeader[Short|Long] 
 block data
 block data
 ...
 main data

I will post a patch based on this.

Thank you,
Rahila Syed

-Original Message-
From: Andres Freund [mailto:and...@2ndquadrant.com] 
Sent: Monday, February 16, 2015 5:26 PM
To: Syed, Rahila
Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
 - * As a trivial form of data compression, the XLOG code is aware that
 - * PG data pages usually contain an unused hole in the middle, 
 which
 - * contains only zero bytes.  If hole_length  0 then we have removed
 - * such a hole from the stored data (and it's not counted in the
 - * XLOG record's CRC, either).  Hence, the amount of block data 
 actually
 - * present is BLCKSZ - hole_length bytes.
 + * Block images are able to do several types of compression:
 + * - When wal_compression is off, as a trivial form of compression, 
 + the
 + * XLOG code is aware that PG data pages usually contain an unused hole
 + * in the middle, which contains only zero bytes.  If length  BLCKSZ
 + * then we have removed such a hole from the stored data (and it is
 + * not counted in the XLOG record's CRC, either).  Hence, the amount
 + * of block data actually present is length bytes.  The hole offset
 + * on page is defined using hole_offset.
 + * - When wal_compression is on, block images are compressed using a
 + * compression algorithm without their hole to improve compression
 + * process of the page. length corresponds in this case to the 
 + length
 + * of the compressed block. hole_offset is the hole offset of the 
 + page,
 + * and the length of the uncompressed block is defined by 
 + raw_length,
 + * whose data is included in the record only when compression is 
 + enabled
 + * and with_hole is set to true, see below.
 + *
 + * is_compressed is used to identify if a given block image is 
 + compressed
 + * or not. Maximum page size allowed on the system being 32k, the 
 + hole
 + * offset cannot be more than 15-bit long so the last free bit is 
 + used to
 + * store the compression state of block image. If the maximum page 
 + size
 + * allowed is increased to a value higher than that, we should 
 + consider
 + * increasing this structure size as well, but this would increase 
 + the
 + * length of block header in WAL records with alignment.
 + *
 + * with_hole is used to identify the presence of a hole in a block image.
 + * As the length of a block cannot be more than 15-bit long, the 
 + extra bit in
 + * the length field is used for this identification purpose. If the 
 + block image
 + * has no hole, it is ensured that the raw size of a compressed block 
 + image is
 + * equal to BLCKSZ, hence the contents of 
 + XLogRecordBlockImageCompressionInfo
 + * are not necessary.
   */
  typedef struct XLogRecordBlockImageHeader  {
 - uint16  hole_offset;/* number of bytes before hole */
 - uint16  hole_length;/* number of bytes in hole */
 + uint16  length:15,  /* length of block data in 
 record */
 + with_hole:1;/* status of hole in the block 
 */
 +
 + uint16  hole_offset:15, /* number of bytes before hole */
 + is_compressed:1;/* compression status of image */
 +
 + /* Followed by the data related to compression if block is 
 +compressed */
  } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Syed, Rahila

Hello,

Thank you for reviewing and testing the patch.

+   /* leave if data cannot be compressed */
+   if (compressed_len == 0)
+   return false;
This should be  0, pglz_compress returns -1 when compression fails.

+   if (pglz_decompress(block_image, bkpb-bkp_len, 
record-uncompressBuf,
+   
bkpb-bkp_uncompress_len) == 0)
Similarly, this should be  0.

These have been corrected in the attached.

Regarding the sanity checks that have been added recently. I think that they 
are useful but I am suspecting as well that only a check on the record CRC is 
done because that's reliable enough and not doing those checks accelerates a 
bit replay. So I am thinking that we should simply replace them by assertions.
Removing the checks makes sense as CRC ensures correctness . Moreover ,as error 
message for invalid length of record is present in the code , messages for 
invalid block length can be redundant.
Checks have been replaced by assertions in the attached patch.

Following if  condition in XLogCompressBackupBlock has been modified as follows

Previous
/*
+  * We recheck the actual size even if pglz_compress() reports success 
and
+  * see if at least 2 bytes of length have been saved, as this 
corresponds
+  * to the additional amount of data stored in WAL record for a 
compressed
+  * block via raw_length when block contains hole..
+  */
+  *len = (uint16) compressed_len;
+  if (*len = orig_len - SizeOfXLogRecordBlockImageCompressionInfo)
+  return false;
+  return true;


Current
if ((hole_length != 0) 
+  (*len = orig_len - 
SizeOfXLogRecordBlockImageCompressionInfo))
+  return false;
+return true

This is because the extra information raw_length is included only if compressed 
block has hole in it.

Once we get those small issues fixes, I think that it is with having a 
committer look at this patch, presumably Fujii-san
Agree. I will mark this patch as ready for committer

Thank you,
Rahila Syed

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v19.patch
Description: Support-compression-for-full-page-writes-in-WAL_v19.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Andres Freund

On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
 - * As a trivial form of data compression, the XLOG code is aware that
 - * PG data pages usually contain an unused hole in the middle, which
 - * contains only zero bytes.  If hole_length  0 then we have removed
 - * such a hole from the stored data (and it's not counted in the
 - * XLOG record's CRC, either).  Hence, the amount of block data actually
 - * present is BLCKSZ - hole_length bytes.
 + * Block images are able to do several types of compression:
 + * - When wal_compression is off, as a trivial form of compression, the
 + * XLOG code is aware that PG data pages usually contain an unused hole
 + * in the middle, which contains only zero bytes.  If length  BLCKSZ
 + * then we have removed such a hole from the stored data (and it is
 + * not counted in the XLOG record's CRC, either).  Hence, the amount
 + * of block data actually present is length bytes.  The hole offset
 + * on page is defined using hole_offset.
 + * - When wal_compression is on, block images are compressed using a
 + * compression algorithm without their hole to improve compression
 + * process of the page. length corresponds in this case to the length
 + * of the compressed block. hole_offset is the hole offset of the page,
 + * and the length of the uncompressed block is defined by raw_length,
 + * whose data is included in the record only when compression is enabled
 + * and with_hole is set to true, see below.
 + *
 + * is_compressed is used to identify if a given block image is compressed
 + * or not. Maximum page size allowed on the system being 32k, the hole
 + * offset cannot be more than 15-bit long so the last free bit is used to
 + * store the compression state of block image. If the maximum page size
 + * allowed is increased to a value higher than that, we should consider
 + * increasing this structure size as well, but this would increase the
 + * length of block header in WAL records with alignment.
 + *
 + * with_hole is used to identify the presence of a hole in a block image.
 + * As the length of a block cannot be more than 15-bit long, the extra bit in
 + * the length field is used for this identification purpose. If the block 
 image
 + * has no hole, it is ensured that the raw size of a compressed block image 
 is
 + * equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo
 + * are not necessary.
   */
  typedef struct XLogRecordBlockImageHeader
  {
 - uint16  hole_offset;/* number of bytes before hole */
 - uint16  hole_length;/* number of bytes in hole */
 + uint16  length:15,  /* length of block data in 
 record */
 + with_hole:1;/* status of hole in the block 
 */
 +
 + uint16  hole_offset:15, /* number of bytes before hole */
 + is_compressed:1;/* compression status of image */
 +
 + /* Followed by the data related to compression if block is compressed */
  } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id (which
currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
the block id but something like XLR_CHUNK_ID. Which is used as is for
XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte,
but I think that's worth it. I'm pretty sure we will be happy about the
added extensibility pretty soon.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Michael Paquier

On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila rahila.s...@nttdata.com
wrote:


 Regarding the sanity checks that have been added recently. I think that
 they are useful but I am suspecting as well that only a check on the record
 CRC is done because that's reliable enough and not doing those checks
 accelerates a bit replay. So I am thinking that we should simply replace
 them by assertions.

 Removing the checks makes sense as CRC ensures correctness . Moreover ,as
 error message for invalid length of record is present in the code ,
 messages for invalid block length can be redundant.

 Checks have been replaced by assertions in the attached patch.


After more thinking, we may as well simply remove them, an error with CRC
having high chances to complain before reaching this point...



 Current

 if ((hole_length != 0) 

 +  (*len = orig_len -
 SizeOfXLogRecordBlockImageCompressionInfo))

 +  return false;

 +return true


This makes sense.

Nitpicking 1:
+Assert(!(blk-with_hole == 1   blk-hole_offset = 0));
Double-space here.

Nitpicking 2:
char * page
This should be rewritten as char *page, the * being assigned with the
variable name.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Andres Freund

On 2015-02-16 20:55:20 +0900, Michael Paquier wrote:
 On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila rahila.s...@nttdata.com
 wrote:
 
 
  Regarding the sanity checks that have been added recently. I think that
  they are useful but I am suspecting as well that only a check on the record
  CRC is done because that's reliable enough and not doing those checks
  accelerates a bit replay. So I am thinking that we should simply replace
  them by assertions.
 
  Removing the checks makes sense as CRC ensures correctness . Moreover ,as
  error message for invalid length of record is present in the code ,
  messages for invalid block length can be redundant.
 
  Checks have been replaced by assertions in the attached patch.
 
 
 After more thinking, we may as well simply remove them, an error with CRC
 having high chances to complain before reaching this point...

Surely not. The existing code explicitly does it like
if (blk-has_data  blk-data_len == 0)
report_invalid_record(state,
  BKPBLOCK_HAS_DATA set, but no data included at 
%X/%X,
  (uint32) (state-ReadRecPtr  32), (uint32) 
state-ReadRecPtr);
these cross checks are important. And I see no reason to deviate from
that. The CRC sum isn't foolproof - we intentionally do checks at
several layers. And, as you can see from some other locations, we
actually try to *not* fatally error out when hitting them at times - so
an Assert also is wrong.

Heikki:
/* cross-check that the HAS_DATA flag is set iff data_length  0 */
if (blk-has_data  blk-data_len == 0)
report_invalid_record(state,
  BKPBLOCK_HAS_DATA set, but no data included at 
%X/%X,
  (uint32) 
(state-ReadRecPtr  32), (uint32) state-ReadRecPtr);
if (!blk-has_data  blk-data_len != 0)
report_invalid_record(state,
 BKPBLOCK_HAS_DATA not set, but data length is %u at %X/%X,
  (unsigned int) 
blk-data_len,
  
(uint32) (state-ReadRecPtr  32), (uint32) state-ReadRecPtr);
those look like they're missing a goto err; to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-12 Thread Syed, Rahila


Thank you for comments. Please find attached the updated patch.

This patch fails to compile:
xlogreader.c:1049:46: error: extraneous ')' after condition, expected a 
statement
blk-with_hole  blk-hole_offset = 
 0))
This has been rectified.

Note as well that at least clang does not like much how the sanity check with 
with_hole are done. You should place parentheses around the '' expressions. 
Also, I would rather define with_hole == 0 or with_hole == 1 explicitly int 
those checks
The expressions are modified accordingly.

There is a typo:
s/true,see/true, see/
[nitpicky]Be as well aware of the 80-character limit per line that is usually 
normally by comment blocks.[/]

Have corrected the typos and changed the comments as mentioned. Also , 
realigned certain lines to meet the 80-char limit.

Thank you,
Rahila Syed



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v18.patch
Description: Support-compression-for-full-page-writes-in-WAL_v18.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-11 Thread Michael Paquier

On Wed, Feb 11, 2015 at 11:03 PM, Syed, Rahila rahila.s...@nttdata.com
wrote:

 IMO, we should add details about how this new field is used in the
 comments on top of XLogRecordBlockImageHeader, meaning that when a page
 hole is present we use the compression info structure and when there is no
 hole, we are sure that the FPW raw length is BLCKSZ meaning that the two
 bytes of the CompressionInfo stuff is unnecessary.
 This comment is included in the patch attached.

  For correctness with_hole should be set even for uncompressed pages. I
 think that we should as well use it for sanity checks in xlogreader.c when
 decoding records.
 This change is made in the attached patch. Following sanity checks have
 been added in xlogreader.c

 if (!(blk-with_hole)  blk-hole_offset != 0 || blk-with_hole 
 blk-hole_offset = 0))

 if (blk-with_hole  blk-bkp_len = BLCKSZ)

 if (!(blk-with_hole)  blk-bkp_len != BLCKSZ)


Cool, thanks!

This patch fails to compile:
xlogreader.c:1049:46: error: extraneous ')' after condition, expected a
statement
blk-with_hole  blk-hole_offset
= 0))

Note as well that at least clang does not like much how the sanity check
with with_hole are done. You should place parentheses around the ''
expressions. Also, I would rather define with_hole == 0 or with_hole == 1
explicitly int those checks.

There is a typo:
s/true,see/true, see/

[nitpicky]Be as well aware of the 80-character limit per line that is
usually normally by comment blocks.[/]

+ * with_hole is used to identify the presence of hole in a block.
+ * As mentioned above, length of block cannnot be more than 15-bit long.
+ * So, the free bit in the length field is used by with_hole to identify
presence of
+ * XLogRecordBlockImageCompressionInfo. If hole is not present ,the raw
size of
+ * a compressed block is equal to BLCKSZ therefore
XLogRecordBlockImageCompressionInfo
+ * for the corresponding compressed block need not be stored in header.
+ * If hole is present raw size is stored.
I would rewrite this paragraph as follows, fixing the multiple typos:
with_hole is used to identify the presence of a hole in a block image. As
the length of a block cannot be more than 15-bit long, the extra bit in the
length field is used for this identification purpose. If the block image
has no hole, it is ensured that the raw size of a compressed block image is
equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo
are not necessary.

+   /* Followed by the data related to compression if block is
compressed */
This comment needs to be updated to if block image is compressed and has a
hole.

+   lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h)
and
+   XLogRecordBlockImageHeader where page hole offset and length is limited
to 15-bit
+   length (see src/include/access/xlogrecord.h).
80-character limit...

Regards
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Syed, Rahila

Hello,

A bug had been introduced in the latest versions of the patch. The order of 
parameters passed to pglz_decompress was wrong.
Please find attached patch with following correction,

Original code,
+   if (pglz_decompress(block_image, record-uncompressBuf,
+   bkpb-bkp_len, 
bkpb-bkp_uncompress_len) == 0)
Correction
+   if (pglz_decompress(block_image, bkpb-bkp_len,
+   record-uncompressBuf, 
bkpb-bkp_uncompress_len) == 0)


For example here you should not have a space between the function definition 
and the variable declarations:
+{
+
+int orig_len = BLCKSZ - hole_length;
This is as well incorrect in two places:
if(hole_length != 0)
There should be a space between the if and its condition in parenthesis.

Also corrected above code format mistakes.

Thank you,
Rahila Syed

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Syed, Rahila
Sent: Monday, February 09, 2015 6:58 PM
To: Michael Paquier; Fujii Masao
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

Hello,

 Do we always need extra two bytes for compressed backup block?
 ISTM that extra bytes are not necessary when the hole length is zero.
 In this case the length of the original backup block (i.e.,
 uncompressed) must be BLCKSZ, so we don't need to save the original 
 size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

This is implemented in the attached patch by dividing length field as follows,
uint16  length:15,  
with_hole:1; 

2 should be replaced with the macro variable indicating the size of 
extra header for compressed backup block.
Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2

You can refactor XLogCompressBackupBlock() and move all the above code 
to it for more simplicity
This is also implemented in the patch attached.

Thank you,
Rahila Syed


-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com]
Sent: Friday, February 06, 2015 6:00 PM
To: Fujii Masao
Cc: Syed, Rahila; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
 Do we always need extra two bytes for compressed backup block?
 ISTM that extra bytes are not necessary when the hole length is zero.
 In this case the length of the original backup block (i.e.,
 uncompressed) must be BLCKSZ, so we don't need to save the original 
 size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

 Furthermore, when fpw compression is disabled and the hole length is 
 zero, we seem to be able to save one byte from the header of backup 
 block. Currently we use 4 bytes for the header, 2 bytes for the length 
 of backup block, 15 bits for the hole offset and 1 bit for the flag 
 indicating whether block is compressed or not. But in that case, the 
 length of backup block doesn't need to be stored because it must be 
 BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on HEAD, because 
you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader 
to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 
bytes as hole-related data is not necessary. I imagine that a patch optimizing 
that wouldn't be that hard to write as well.

 +int page_len = BLCKSZ - hole_length;
 +char *scratch_buf;
 +if (hole_length != 0)
 +{
 +scratch_buf = compression_scratch;
 +memcpy(scratch_buf, page, hole_offset);
 +memcpy(scratch_buf + hole_offset,
 +   page + (hole_offset + hole_length),
 +   BLCKSZ - (hole_length + hole_offset));
 +}
 +else
 +scratch_buf = page;
 +
 +/* Perform compression of block */
 +if (XLogCompressBackupBlock(scratch_buf,
 +page_len,
 +regbuf-compressed_page,
 +compress_len))
 +{
 +/* compression is done, add record */
 +is_compressed = true;
 +}

 You can refactor XLogCompressBackupBlock() and move all the above code 
 to it for more simplicity.

Sure.
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Michael Paquier

On Mon, Feb 9, 2015 at 10:27 PM, Syed, Rahila wrote:
 (snip)

Thanks for showing up here! I have not tested the test the patch,
those comments are based on what I read from v17.

 Do we always need extra two bytes for compressed backup block?
 ISTM that extra bytes are not necessary when the hole length is zero.
 In this case the length of the original backup block (i.e.,
 uncompressed) must be BLCKSZ, so we don't need to save the original
 size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

 This is implemented in the attached patch by dividing length field as follows,
 uint16  length:15,
 with_hole:1;

IMO, we should add details about how this new field is used in the
comments on top of XLogRecordBlockImageHeader, meaning that when a
page hole is present we use the compression info structure and when
there is no hole, we are sure that the FPW raw length is BLCKSZ
meaning that the two bytes of the CompressionInfo stuff is
unnecessary.


2 should be replaced with the macro variable indicating the size of
extra header for compressed backup block.
 Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2

You can refactor XLogCompressBackupBlock() and move all the
above code to it for more simplicity
 This is also implemented in the patch attached.

This portion looks correct to me.

A couple of other comments:
1) Nitpicky but, code format is sometimes strange.
For example here you should not have a space between the function
definition and the variable declarations:
+{
+
+int orig_len = BLCKSZ - hole_length;
This is as well incorrect in two places:
if(hole_length != 0)
There should be a space between the if and its condition in parenthesis.
2) For correctness with_hole should be set even for uncompressed
pages. I think that we should as well use it for sanity checks in
xlogreader.c when decoding records.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Syed, Rahila

Hello,

 Do we always need extra two bytes for compressed backup block?
 ISTM that extra bytes are not necessary when the hole length is zero.
 In this case the length of the original backup block (i.e., 
 uncompressed) must be BLCKSZ, so we don't need to save the original 
 size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

This is implemented in the attached patch by dividing length field as follows,
uint16  length:15,  
with_hole:1; 

2 should be replaced with the macro variable indicating the size of 
extra header for compressed backup block.
Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2

You can refactor XLogCompressBackupBlock() and move all the 
above code to it for more simplicity
This is also implemented in the patch attached.

Thank you,
Rahila Syed


-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com] 
Sent: Friday, February 06, 2015 6:00 PM
To: Fujii Masao
Cc: Syed, Rahila; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
 Do we always need extra two bytes for compressed backup block?
 ISTM that extra bytes are not necessary when the hole length is zero.
 In this case the length of the original backup block (i.e., 
 uncompressed) must be BLCKSZ, so we don't need to save the original 
 size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

 Furthermore, when fpw compression is disabled and the hole length is 
 zero, we seem to be able to save one byte from the header of backup 
 block. Currently we use 4 bytes for the header, 2 bytes for the length 
 of backup block, 15 bits for the hole offset and 1 bit for the flag 
 indicating whether block is compressed or not. But in that case, the 
 length of backup block doesn't need to be stored because it must be 
 BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on HEAD, because 
you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader 
to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 
bytes as hole-related data is not necessary. I imagine that a patch optimizing 
that wouldn't be that hard to write as well.

 +int page_len = BLCKSZ - hole_length;
 +char *scratch_buf;
 +if (hole_length != 0)
 +{
 +scratch_buf = compression_scratch;
 +memcpy(scratch_buf, page, hole_offset);
 +memcpy(scratch_buf + hole_offset,
 +   page + (hole_offset + hole_length),
 +   BLCKSZ - (hole_length + hole_offset));
 +}
 +else
 +scratch_buf = page;
 +
 +/* Perform compression of block */
 +if (XLogCompressBackupBlock(scratch_buf,
 +page_len,
 +regbuf-compressed_page,
 +compress_len))
 +{
 +/* compression is done, add record */
 +is_compressed = true;
 +}

 You can refactor XLogCompressBackupBlock() and move all the above code 
 to it for more simplicity.

Sure.
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v17.patch
Description: Support-compression-for-full-page-writes-in-WAL_v17.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-08 Thread Fujii Masao

On Fri, Feb 6, 2015 at 3:42 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Fujii Masao wrote:
 I wrote
 This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
 compression algorithm to return a size of 0 bytes as compressed or
 decompressed length btw? We could as well make it return a negative
 value when a failure occurs if you feel more comfortable with it.

 I feel that's better. Attached is the updated version of the patch.
 I changed the pg_lzcompress and pg_lzdecompress so that they return -1
 when failure happens. Also I applied some cosmetic changes to the patch
 (e.g., shorten the long name of the newly-added macros).
 Barring any objection, I will commit this.

 I just had a look at your updated version, ran some sanity tests, and
 things look good from me. The new names of the macros at the top of
 tuptoaster.c are clearer as well.

Thanks for the review! Pushed!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Fujii Masao

On Fri, Feb 6, 2015 at 4:15 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
/*
+* We recheck the actual size even if pglz_compress() report success,
+* because it might be satisfied with having saved as little as one byte
+* in the compressed data.
+*/
+   *len = (uint16) compressed_len;
+   if (*len = orig_len - 1)
+   return false;
+   return true;
+}

 As per latest code ,when compression is 'on' we introduce additional 2 bytes 
 in the header of each block image for storing raw_length of the compressed 
 block.
 In order to achieve compression while accounting for these two additional 
 bytes, we must ensure that compressed length is less than original length - 
 2.
 So , IIUC the above condition should rather be

 If (*len = orig_len -2 )
 return false;

2 should be replaced with the macro variable indicating the size of
extra header for compressed backup block.

Do we always need extra two bytes for compressed backup block?
ISTM that extra bytes are not necessary when the hole length is zero.
In this case the length of the original backup block (i.e., uncompressed)
must be BLCKSZ, so we don't need to save the original size in
the extra bytes.

Furthermore, when fpw compression is disabled and the hole length
is zero, we seem to be able to save one byte from the header of
backup block. Currently we use 4 bytes for the header, 2 bytes for
the length of backup block, 15 bits for the hole offset and 1 bit for
the flag indicating whether block is compressed or not. But in that case,
the length of backup block doesn't need to be stored because it must
be BLCKSZ. Shouldn't we optimize the header in this way? Thought?

+int page_len = BLCKSZ - hole_length;
+char *scratch_buf;
+if (hole_length != 0)
+{
+scratch_buf = compression_scratch;
+memcpy(scratch_buf, page, hole_offset);
+memcpy(scratch_buf + hole_offset,
+   page + (hole_offset + hole_length),
+   BLCKSZ - (hole_length + hole_offset));
+}
+else
+scratch_buf = page;
+
+/* Perform compression of block */
+if (XLogCompressBackupBlock(scratch_buf,
+page_len,
+regbuf-compressed_page,
+compress_len))
+{
+/* compression is done, add record */
+is_compressed = true;
+}

You can refactor XLogCompressBackupBlock() and move all the
above code to it for more simplicity.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Syed, Rahila


In any case, those things have been introduced by what I did in previous 
versions... And attached is a new patch.
Thank you for feedback.

   /* allocate scratch buffer used for compression of block images */
+  if (compression_scratch == NULL)
+  compression_scratch = MemoryContextAllocZero(xloginsert_cxt,
+  
 BLCKSZ);
 }
The compression patch can  use the latest interface MemoryContextAllocExtended 
to proceed without compression when sufficient memory is not available for 
scratch buffer. 
The attached patch introduces OutOfMem flag which is set on when 
MemoryContextAllocExtended returns NULL . 

Thank you,
Rahila Syed

-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com] 
Sent: Friday, February 06, 2015 12:46 AM
To: Syed, Rahila
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
/*
+* We recheck the actual size even if pglz_compress() report success,
+* because it might be satisfied with having saved as little as one byte
+* in the compressed data.
+*/
+   *len = (uint16) compressed_len;
+   if (*len = orig_len - 1)
+   return false;
+   return true;
+}

 As per latest code ,when compression is 'on' we introduce additional 2 bytes 
 in the header of each block image for storing raw_length of the compressed 
 block.
 In order to achieve compression while accounting for these two additional 
 bytes, we must ensure that compressed length is less than original length - 2.
 So , IIUC the above condition should rather be

 If (*len = orig_len -2 )
 return false;
 return true;
 The attached patch contains this. It also has a cosmetic change-  renaming 
 compressBuf to uncompressBuf as it is used to store uncompressed page.

Agreed on both things.

Just looking at your latest patch after some time to let it cool down, I 
noticed a couple of things.

 #define MaxSizeOfXLogRecordBlockHeader \
 (SizeOfXLogRecordBlockHeader + \
- SizeOfXLogRecordBlockImageHeader + \
+ SizeOfXLogRecordBlockImageHeader, \
+ SizeOfXLogRecordBlockImageCompressionInfo + \
There is a comma here instead of a sum sign. We should really sum up all those 
sizes to evaluate the maximum size of a block header.

+ * Permanently allocate readBuf uncompressBuf.  We do it this way,
+ * rather than just making a static array, for two reasons:
This comment is just but weird, readBuf AND uncompressBuf is more appropriate.

+ * We recheck the actual size even if pglz_compress() report success,
+ * because it might be satisfied with having saved as little as one byte
+ * in the compressed data. We add two bytes to store raw_length  with the
+ * compressed image. So for compression to be effective
compressed_len should
+ * be atleast  orig_len - 2
This comment block should be reworked, and misses a dot at its end. I rewrote 
it like that, hopefully that's clearer:
+   /*
+* We recheck the actual size even if pglz_compress() reports
success and see
+* if at least 2 bytes of length have been saved, as this
corresponds to the
+* additional amount of data stored in WAL record for a compressed block
+* via raw_length.
+*/

In any case, those things have been introduced by what I did in previous 
versions... And attached is a new patch.
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v17.patch
Description: Support-compression-for-full-page-writes-in-WAL_v17.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier

On Fri, Feb 6, 2015 at 4:30 PM, Michael Paquier wrote:
 On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
 Do we always need extra two bytes for compressed backup block?
 ISTM that extra bytes are not necessary when the hole length is zero.
 In this case the length of the original backup block (i.e., uncompressed)
 must be BLCKSZ, so we don't need to save the original size in
 the extra bytes.

 Yes, we would need a additional bit to identify that. We could steal
 it from length in XLogRecordBlockImageHeader.

 Furthermore, when fpw compression is disabled and the hole length
 is zero, we seem to be able to save one byte from the header of
 backup block. Currently we use 4 bytes for the header, 2 bytes for
 the length of backup block, 15 bits for the hole offset and 1 bit for
 the flag indicating whether block is compressed or not. But in that case,
 the length of backup block doesn't need to be stored because it must
 be BLCKSZ. Shouldn't we optimize the header in this way? Thought?

 If we do it, that's something to tackle even before this patch on
 HEAD, because you could use the 16th bit of the first 2 bytes of
 XLogRecordBlockImageHeader to do necessary sanity checks, to actually
 not reduce record by 1 byte, but 2 bytes as hole-related data is not
 necessary. I imagine that a patch optimizing that wouldn't be that
 hard to write as well.

Actually, as Heikki pointed me out... A block image is 8k and pages
without holes are rare, so it may be not worth sacrificing code
simplicity for record reduction at the order of 0.1% or smth like
that, and the current patch is light because it keeps things simple.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
 Do we always need extra two bytes for compressed backup block?
 ISTM that extra bytes are not necessary when the hole length is zero.
 In this case the length of the original backup block (i.e., uncompressed)
 must be BLCKSZ, so we don't need to save the original size in
 the extra bytes.

Yes, we would need a additional bit to identify that. We could steal
it from length in XLogRecordBlockImageHeader.

 Furthermore, when fpw compression is disabled and the hole length
 is zero, we seem to be able to save one byte from the header of
 backup block. Currently we use 4 bytes for the header, 2 bytes for
 the length of backup block, 15 bits for the hole offset and 1 bit for
 the flag indicating whether block is compressed or not. But in that case,
 the length of backup block doesn't need to be stored because it must
 be BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on
HEAD, because you could use the 16th bit of the first 2 bytes of
XLogRecordBlockImageHeader to do necessary sanity checks, to actually
not reduce record by 1 byte, but 2 bytes as hole-related data is not
necessary. I imagine that a patch optimizing that wouldn't be that
hard to write as well.

 +int page_len = BLCKSZ - hole_length;
 +char *scratch_buf;
 +if (hole_length != 0)
 +{
 +scratch_buf = compression_scratch;
 +memcpy(scratch_buf, page, hole_offset);
 +memcpy(scratch_buf + hole_offset,
 +   page + (hole_offset + hole_length),
 +   BLCKSZ - (hole_length + hole_offset));
 +}
 +else
 +scratch_buf = page;
 +
 +/* Perform compression of block */
 +if (XLogCompressBackupBlock(scratch_buf,
 +page_len,
 +regbuf-compressed_page,
 +compress_len))
 +{
 +/* compression is done, add record */
 +is_compressed = true;
 +}

 You can refactor XLogCompressBackupBlock() and move all the
 above code to it for more simplicity.

Sure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier

On Fri, Feb 6, 2015 at 6:35 PM, Syed, Rahila wrote:
 The compression patch can  use the latest interface 
 MemoryContextAllocExtended to proceed without compression when sufficient 
 memory is not available for
 scratch buffer.
 The attached patch introduces OutOfMem flag which is set on when 
 MemoryContextAllocExtended returns NULL .

TBH, I don't think that brings much as this allocation is done once
and process would surely fail before reaching the first code path
doing a WAL record insertion. In any case, OutOfMem is useless, you
could simply check if compression_scratch is NULL when assembling a
record.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Syed, Rahila

Hello,
 
/*
+* We recheck the actual size even if pglz_compress() report success,
+* because it might be satisfied with having saved as little as one byte
+* in the compressed data.
+*/
+   *len = (uint16) compressed_len;
+   if (*len = orig_len - 1)
+   return false;
+   return true;
+}

As per latest code ,when compression is 'on' we introduce additional 2 bytes in 
the header of each block image for storing raw_length of the compressed block. 
In order to achieve compression while accounting for these two additional 
bytes, we must ensure that compressed length is less than original length - 2.
So , IIUC the above condition should rather be 

If (*len = orig_len -2 )
return false;
return true;
 
The attached patch contains this. It also has a cosmetic change-  renaming 
compressBuf to uncompressBuf as it is used to store uncompressed page.
 
Thank you,
Rahila Syed

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
Sent: Wednesday, January 07, 2015 9:32 AM
To: Rahila Syed
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed rahilasyed...@gmail.com wrote:
 Following are some comments,
Thanks for the feedback.

uint16  hole_offset:15, /* number of bytes in hole */
 Typo in description of hole_offset
Fixed. That's before hole.

for (block_id = 0; block_id = record-max_block_id; block_id++)
-   {
-   if (XLogRecHasBlockImage(record, block_id))
-   fpi_len += BLCKSZ -
 record-blocks[block_id].hole_length;
-   }
+   fpi_len += record-blocks[block_id].bkp_len;

 IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is 
 incorrectly removed from the above for loop.
Fixed.

typedef struct XLogRecordCompressedBlockImageHeader
 I am trying to understand the purpose behind declaration of the above 
 struct. IIUC, it is defined in order to introduce new field uint16 
 raw_length and it has been declared as a separate struct from 
 XLogRecordBlockImageHeader to not affect the size of WAL record when 
 compression is off.
 I wonder if it is ok to simply memcpy the uint16 raw_length in the 
 hdr_scratch when compression is on and not have a separate header 
 struct for it neither declare it in existing header. raw_length can be 
 a locally defined variable is XLogRecordAssemble or it can be a field 
 in registered_buffer struct like compressed_page.
 I think this can simplify the code.
 Am I missing something obvious?
You are missing nothing. I just introduced this structure for a matter of 
readability to show the two-byte difference between non-compressed and 
compressed header information. It is true that doing it my way makes the 
structures duplicated, so let's simply add the compression-related information 
as an extra structure added after XLogRecordBlockImageHeader if the block is 
compressed. I hope this addresses your concerns.

 /*
  * Fill in the remaining fields in the XLogRecordBlockImageHeader
  * struct and add new entries in the record chain.
 */

   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;

 This code line seems to be misplaced with respect to the above comment.
 Comment indicates filling of XLogRecordBlockImageHeader fields while 
 fork_flags is a field of XLogRecordBlockHeader.
 Is it better to place the code close to following condition?
   if (needs_backup)
   {
Yes, this comment should not be here. I replaced it with the comment in HEAD.


+  *the original length of the
+ * block without its page hole being deducible from the compressed 
+ data
+ * itself.
 IIUC, this comment before XLogRecordBlockImageHeader seems to be no 
 longer valid as original length is not deducible from compressed data 
 and rather stored in header.
Aah, true. This was originally present in the header of PGLZ that has been 
removed to make it available for frontends.

Updated patches are attached.
Regards,
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v15.patch
Description: Support-compression-for-full-page-writes-in-WAL_v15.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Michael Paquier

On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
/*
+* We recheck the actual size even if pglz_compress() report success,
+* because it might be satisfied with having saved as little as one byte
+* in the compressed data.
+*/
+   *len = (uint16) compressed_len;
+   if (*len = orig_len - 1)
+   return false;
+   return true;
+}

 As per latest code ,when compression is 'on' we introduce additional 2 bytes 
 in the header of each block image for storing raw_length of the compressed 
 block.
 In order to achieve compression while accounting for these two additional 
 bytes, we must ensure that compressed length is less than original length - 2.
 So , IIUC the above condition should rather be

 If (*len = orig_len -2 )
 return false;
 return true;
 The attached patch contains this. It also has a cosmetic change-  renaming 
 compressBuf to uncompressBuf as it is used to store uncompressed page.

Agreed on both things.

Just looking at your latest patch after some time to let it cool down,
I noticed a couple of things.

 #define MaxSizeOfXLogRecordBlockHeader \
 (SizeOfXLogRecordBlockHeader + \
- SizeOfXLogRecordBlockImageHeader + \
+ SizeOfXLogRecordBlockImageHeader, \
+ SizeOfXLogRecordBlockImageCompressionInfo + \
There is a comma here instead of a sum sign. We should really sum up
all those sizes to evaluate the maximum size of a block header.

+ * Permanently allocate readBuf uncompressBuf.  We do it this way,
+ * rather than just making a static array, for two reasons:
This comment is just but weird, readBuf AND uncompressBuf is more appropriate.

+ * We recheck the actual size even if pglz_compress() report success,
+ * because it might be satisfied with having saved as little as one byte
+ * in the compressed data. We add two bytes to store raw_length  with the
+ * compressed image. So for compression to be effective
compressed_len should
+ * be atleast  orig_len - 2
This comment block should be reworked, and misses a dot at its end. I
rewrote it like that, hopefully that's clearer:
+   /*
+* We recheck the actual size even if pglz_compress() reports
success and see
+* if at least 2 bytes of length have been saved, as this
corresponds to the
+* additional amount of data stored in WAL record for a compressed block
+* via raw_length.
+*/

In any case, those things have been introduced by what I did in
previous versions... And attached is a new patch.
-- 
Michael
From 2b0c54bc7c4bfb2494cf8b0394d56635b01d7c5a Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 25 Nov 2014 14:24:26 +0900
Subject: [PATCH] Support compression for full-page writes in WAL

Compression is controlled with a new parameter called wal_compression.
This parameter can be changed at session level to control WAL compression.
---
 contrib/pg_xlogdump/pg_xlogdump.c |  17 ++-
 doc/src/sgml/config.sgml  |  29 +
 src/backend/access/transam/xlog.c |   1 +
 src/backend/access/transam/xloginsert.c   | 161 ++
 src/backend/access/transam/xlogreader.c   |  71 +---
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 src/include/access/xlogreader.h   |   7 +-
 src/include/access/xlogrecord.h   |  49 ++--
 src/include/pg_config.h.in|   4 +-
 11 files changed, 297 insertions(+), 53 deletions(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index c1bfbc2..3ebaac6 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -363,14 +363,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * takes up BLCKSZ bytes, minus the hole length.
 	 *
 	 * XXX: We peek into xlogreader's private decoded backup blocks for the
-	 * hole_length. It doesn't seem worth it to add an accessor macro for
+	 * length of block. It doesn't seem worth it to add an accessor macro for
 	 * this.
 	 */
 	fpi_len = 0;
 	for (block_id = 0; block_id = record-max_block_id; block_id++)
 	{
 		if (XLogRecHasBlockImage(record, block_id))
-			fpi_len += BLCKSZ - record-blocks[block_id].hole_length;
+			fpi_len += record-blocks[block_id].bkp_len;
 	}
 
 	/* Update per-rmgr statistics */
@@ -465,9 +465,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
    blk);
 			if (XLogRecHasBlockImage(record, block_id))
 			{
-printf( (FPW); hole: offset: %u, length: %u\n,
-	   record-blocks[block_id].hole_offset,
-	   record-blocks[block_id].hole_length);
+if (record-blocks[block_id].is_compressed)
+	printf( (FPW compressed); hole offset: %u, 
+		   compressed length: %u, original length: %u\n,
+		   record-blocks[block_id].hole_offset,

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Fujii Masao

On Tue, Jan 6, 2015 at 11:09 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote:
 The patch 1 cannot be applied to the master successfully because of
 recent change.
 Yes, that's caused by ccb161b. Attached are rebased versions.

 - The real stuff comes with patch 2, that implements the removal of
 PGLZ_Header, changing the APIs of compression and decompression to pglz to
 not have anymore toast metadata, this metadata being now localized in
 tuptoaster.c. Note that this patch protects the on-disk format (tested with
 pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
 compression and decompression look like with this patch, simply performing
 operations from a source to a destination:
 extern int32 pglz_compress(const char *source, int32 slen, char *dest,
   const PGLZ_Strategy *strategy);
 extern int32 pglz_decompress(const char *source, char *dest,
   int32 compressed_size, int32 raw_size);
 The return value of those functions is the number of bytes written in the
 destination buffer, and 0 if operation failed.

 So it's guaranteed that 0 is never returned in success case? I'm not sure
 if that case can really happen, though.
 This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
 compression algorithm to return a size of 0 bytes as compressed or
 decompressed length btw? We could as well make it return a negative
 value when a failure occurs if you feel more comfortable with it.

I feel that's better. Attached is the updated version of the patch.
I changed the pg_lzcompress and pg_lzdecompress so that they return -1
when failure happens. Also I applied some cosmetic changes to the patch
(e.g., shorten the long name of the newly-added macros).
Barring any objection, I will commit this.

Regards,

-- 
Fujii Masao
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***
*** 35,43 
  #include access/tuptoaster.h
  #include access/xact.h
  #include catalog/catalog.h
  #include miscadmin.h
  #include utils/fmgroids.h
- #include utils/pg_lzcompress.h
  #include utils/rel.h
  #include utils/typcache.h
  #include utils/tqual.h
--- 35,43 
  #include access/tuptoaster.h
  #include access/xact.h
  #include catalog/catalog.h
+ #include common/pg_lzcompress.h
  #include miscadmin.h
  #include utils/fmgroids.h
  #include utils/rel.h
  #include utils/typcache.h
  #include utils/tqual.h
***
*** 45,50 
--- 45,70 
  
  #undef TOAST_DEBUG
  
+ /*
+  *	The information at the start of the compressed toast data.
+  */
+ typedef struct toast_compress_header
+ {
+ 	int32		vl_len_;	/* varlena header (do not touch directly!) */
+ 	int32		rawsize;
+ } toast_compress_header;
+ 
+ /*
+  * Utilities for manipulation of header information for compressed
+  * toast entries.
+  */
+ #define	TOAST_COMPRESS_HDRSZ		((int32) sizeof(toast_compress_header))
+ #define TOAST_COMPRESS_RAWSIZE(ptr)	(((toast_compress_header *) ptr)-rawsize)
+ #define TOAST_COMPRESS_RAWDATA(ptr) \
+ 	(((char *) ptr) + TOAST_COMPRESS_HDRSZ)
+ #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
+ 	(((toast_compress_header *) ptr)-rawsize = len)
+ 
  static void toast_delete_datum(Relation rel, Datum value);
  static Datum toast_save_datum(Relation rel, Datum value,
   struct varlena * oldexternal, int options);
***
*** 53,58  static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
--- 73,79 
  static struct varlena *toast_fetch_datum(struct varlena * attr);
  static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
  		int32 sliceoffset, int32 length);
+ static struct varlena *toast_decompress_datum(struct varlena * attr);
  static int toast_open_indexes(Relation toastrel,
     LOCKMODE lock,
     Relation **toastidxs,
***
*** 138,148  heap_tuple_untoast_attr(struct varlena * attr)
  		/* If it's compressed, decompress it */
  		if (VARATT_IS_COMPRESSED(attr))
  		{
! 			PGLZ_Header *tmp = (PGLZ_Header *) attr;
! 
! 			attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 			SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 			pglz_decompress(tmp, VARDATA(attr));
  			pfree(tmp);
  		}
  	}
--- 159,166 
  		/* If it's compressed, decompress it */
  		if (VARATT_IS_COMPRESSED(attr))
  		{
! 			struct varlena *tmp = attr;
! 			attr = toast_decompress_datum(tmp);
  			pfree(tmp);
  		}
  	}
***
*** 163,173  heap_tuple_untoast_attr(struct varlena * attr)
  		/*
  		 * This is a compressed value inside of the main tuple
  		 */
! 		PGLZ_Header *tmp = (PGLZ_Header *) attr;
! 
! 		attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 		SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 		pglz_decompress(tmp, VARDATA(attr));
  	}
  	else if (VARATT_IS_SHORT(attr))
  	{
---

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Michael Paquier

Fujii Masao wrote:
 I wrote
 This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
 compression algorithm to return a size of 0 bytes as compressed or
 decompressed length btw? We could as well make it return a negative
 value when a failure occurs if you feel more comfortable with it.

 I feel that's better. Attached is the updated version of the patch.
 I changed the pg_lzcompress and pg_lzdecompress so that they return -1
 when failure happens. Also I applied some cosmetic changes to the patch
 (e.g., shorten the long name of the newly-added macros).
 Barring any objection, I will commit this.

I just had a look at your updated version, ran some sanity tests, and
things look good from me. The new names of the macros at the top of
tuptoaster.c are clearer as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-06 Thread Michael Paquier

On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed rahilasyed...@gmail.com wrote:
 Following are some comments,
Thanks for the feedback.

uint16  hole_offset:15, /* number of bytes in hole */
 Typo in description of hole_offset
Fixed. That's before hole.

for (block_id = 0; block_id = record-max_block_id; block_id++)
-   {
-   if (XLogRecHasBlockImage(record, block_id))
-   fpi_len += BLCKSZ -
 record-blocks[block_id].hole_length;
-   }
+   fpi_len += record-blocks[block_id].bkp_len;

 IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is
 incorrectly removed from the above for loop.
Fixed.

typedef struct XLogRecordCompressedBlockImageHeader
 I am trying to understand the purpose behind declaration of the above
 struct. IIUC, it is defined in order to introduce new field uint16
 raw_length and it has been declared as a separate struct from
 XLogRecordBlockImageHeader to not affect the size of WAL record when
 compression is off.
 I wonder if it is ok to simply memcpy the uint16 raw_length in the
 hdr_scratch when compression is on
 and not have a separate header struct for it neither declare it in existing
 header. raw_length can be a locally defined variable is XLogRecordAssemble
 or it can be a field in registered_buffer struct like compressed_page.
 I think this can simplify the code.
 Am I missing something obvious?
You are missing nothing. I just introduced this structure for a matter
of readability to show the two-byte difference between non-compressed
and compressed header information. It is true that doing it my way
makes the structures duplicated, so let's simply add the
compression-related information as an extra structure added after
XLogRecordBlockImageHeader if the block is compressed. I hope this
addresses your concerns.

 /*
  * Fill in the remaining fields in the XLogRecordBlockImageHeader
  * struct and add new entries in the record chain.
 */

   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;

 This code line seems to be misplaced with respect to the above comment.
 Comment indicates filling of XLogRecordBlockImageHeader fields while
 fork_flags is a field of XLogRecordBlockHeader.
 Is it better to place the code close to following condition?
   if (needs_backup)
   {
Yes, this comment should not be here. I replaced it with the comment in HEAD.


+  *the original length of the
+ * block without its page hole being deducible from the compressed data
+ * itself.
 IIUC, this comment before XLogRecordBlockImageHeader seems to be no longer
 valid as original length is not deducible from compressed data and rather
 stored in header.
Aah, true. This was originally present in the header of PGLZ that has
been removed to make it available for frontends.

Updated patches are attached.
Regards,
-- 
Michael


20150107_fpw_compression_v14.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-05 Thread Fujii Masao

On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:
 On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
 pglz_compress() and pglz_decompress() still use PGLZ_Header, so the
 frontend
 which uses those functions needs to handle PGLZ_Header. But it basically
 should
 be handled via the varlena macros. That is, the frontend still seems to
 need to
 understand the varlena datatype. I think we should avoid that. Thought?
 Hm, yes it may be wiser to remove it and make the data passed to pglz
 for varlena 8 bytes shorter..

 OK, here is the result of this work, made of 3 patches.

Thanks for updating the patches!

 The first two patches move pglz stuff to src/common and make it a frontend
 utility entirely independent on varlena and its related metadata.
 - Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still
 present. There is nothing amazing here, and that's the broken version that
 has been reverted in 966115c.

The patch 1 cannot be applied to the master successfully because of
recent change.

 - The real stuff comes with patch 2, that implements the removal of
 PGLZ_Header, changing the APIs of compression and decompression to pglz to
 not have anymore toast metadata, this metadata being now localized in
 tuptoaster.c. Note that this patch protects the on-disk format (tested with
 pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
 compression and decompression look like with this patch, simply performing
 operations from a source to a destination:
 extern int32 pglz_compress(const char *source, int32 slen, char *dest,
   const PGLZ_Strategy *strategy);
 extern int32 pglz_decompress(const char *source, char *dest,
   int32 compressed_size, int32 raw_size);
 The return value of those functions is the number of bytes written in the
 destination buffer, and 0 if operation failed.

So it's guaranteed that 0 is never returned in success case? I'm not sure
if that case can really happen, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-05 Thread Michael Paquier

On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote:
 The patch 1 cannot be applied to the master successfully because of
 recent change.
Yes, that's caused by ccb161b. Attached are rebased versions.

 - The real stuff comes with patch 2, that implements the removal of
 PGLZ_Header, changing the APIs of compression and decompression to pglz to
 not have anymore toast metadata, this metadata being now localized in
 tuptoaster.c. Note that this patch protects the on-disk format (tested with
 pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
 compression and decompression look like with this patch, simply performing
 operations from a source to a destination:
 extern int32 pglz_compress(const char *source, int32 slen, char *dest,
   const PGLZ_Strategy *strategy);
 extern int32 pglz_decompress(const char *source, char *dest,
   int32 compressed_size, int32 raw_size);
 The return value of those functions is the number of bytes written in the
 destination buffer, and 0 if operation failed.

 So it's guaranteed that 0 is never returned in success case? I'm not sure
 if that case can really happen, though.
This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
compression algorithm to return a size of 0 bytes as compressed or
decompressed length btw? We could as well make it return a negative
value when a failure occurs if you feel more comfortable with it.
-- 
Michael


20150105_fpw_compression_v13.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-28 Thread Michael Paquier

On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier michael.paqu...@gmail.com
wrote:
On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao masao.fu...@gmail.com
wrote:
pglz_compress() and pglz_decompress() still use PGLZ_Header, so the
frontend
which uses those functions needs to handle PGLZ_Header. But it basically
should
be handled via the varlena macros. That is, the frontend still seems to
need to
understand the varlena datatype. I think we should avoid that. Thought?
Hm, yes it may be wiser to remove it and make the data passed to pglz
for varlena 8 bytes shorter..

OK, here is the result of this work, made of 3 patches.

The first two patches move pglz stuff to src/common and make it a frontend
utility entirely independent on varlena and its related metadata.
- Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still
present. There is nothing amazing here, and that's the broken version that
has been reverted in 966115c.
- The real stuff comes with patch 2, that implements the removal of
PGLZ_Header, changing the APIs of compression and decompression to pglz to
not have anymore toast metadata, this metadata being now localized in
tuptoaster.c. Note that this patch protects the on-disk format (tested with
pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
compression and decompression look like with this patch, simply performing
operations from a source to a destination:
extern int32 pglz_compress(const char *source, int32 slen, char *dest,
const PGLZ_Strategy *strategy);
extern int32 pglz_decompress(const char *source, char *dest,
int32 compressed_size, int32 raw_size);
The return value of those functions is the number of bytes written in the
destination buffer, and 0 if operation failed. This is aimed to make
backend as well more pluggable. The reason why patch 2 exists (it could be
merged with patch 1), is to facilitate the review and the changes made to
pglz to make it an entirely independent facility.

Patch 3 is the FPW compression, changed to fit with those changes. Note
that as PGLZ_Header contains the raw size of the compressed data, and that
it does not exist, it is necessary to store the raw length of the block
image directly in the block image header with 2 additional bytes. Those 2
bytes are used only if wal_compression is set to true thanks to a boolean
flag, so if wal_compression is disabled, the WAL record length is exactly
the same as HEAD, and there is no penalty in the default case. Similarly to
previous patches, the block image is compressed without its hole.

To finish, here are some results using the same test as here with the hack
on getrusage to get the system and user CPU diff on a single backend
execution:
http://www.postgresql.org/message-id/cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com
Just as a reminder, this test generated a fixed number of FPWs on a single
backend with fsync and autovacuum disabled with several values of
fillfactor to see the effect of page holes.

test | ffactor | user_diff | system_diff | pg_size_pretty
-+-+---+-+
FPW on | 50 | 48.823907 |0.737649 | 582 MB
FPW on | 20 | 16.135000 |0.764682 | 229 MB
FPW on | 10 | 8.521099 |0.751947 | 116 MB
FPW off | 50 | 29.722793 |1.045577 | 746 MB
FPW off | 20 | 12.673375 |0.905422 | 293 MB
FPW off | 10 | 6.723120 |0.779936 | 148 MB
HEAD| 50 | 30.763136 |1.129822 | 746 MB
HEAD| 20 | 13.340823 |0.893365 | 293 MB
HEAD| 10 | 7.267311 |0.909057 | 148 MB
(9 rows)

Results are similar to what has been measured previously, it doesn't hurt
to check again, but roughly the CPU cost is balanced by the WAL record
reduction. There is 0 byte of difference in term of WAL record length
between HEAD this patch when wal_compression = off.

Patches, as well as the test script and the results are attached.
Regards,
--
Michael

results.sql
Description: Binary data

test_compress
Description: Binary data

20141228_fpw_compression_v12.tar.gz
Description: GNU Zip compressed data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier

On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Returning only a boolean is fine for me (that's what my first patch
 did), especially if we add at some point hooks for compression and
 decompression calls.
Here is a patch rebased on current HEAD (60838df) for the core feature
with the APIs of pglz using booleans as return values.
-- 
Michael
From c8891e6086682d4e5bc197ef3047068b3a3875c5 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Tue, 25 Nov 2014 14:24:26 +0900
Subject: [PATCH] Support compression for full-page writes in WAL

Compression is controlled with a new parameter called wal_compression.
This parameter can be changed at session level to control WAL compression.
---
 contrib/pg_xlogdump/pg_xlogdump.c |  20 ++--
 doc/src/sgml/config.sgml  |  29 +
 src/backend/access/transam/xlog.c |   1 +
 src/backend/access/transam/xloginsert.c   | 155 ++
 src/backend/access/transam/xlogreader.c   |  77 ++---
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 src/include/access/xlogreader.h   |   7 +-
 src/include/access/xlogrecord.h   |  36 --
 src/include/pg_config.h.in|   4 +-
 11 files changed, 283 insertions(+), 57 deletions(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index 9f05e25..3ba1d8f 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -363,15 +363,12 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * takes up BLCKSZ bytes, minus the hole length.
 	 *
 	 * XXX: We peek into xlogreader's private decoded backup blocks for the
-	 * hole_length. It doesn't seem worth it to add an accessor macro for
+	 * length of block. It doesn't seem worth it to add an accessor macro for
 	 * this.
 	 */
 	fpi_len = 0;
 	for (block_id = 0; block_id = record-max_block_id; block_id++)
-	{
-		if (XLogRecHasBlockImage(record, block_id))
-			fpi_len += BLCKSZ - record-blocks[block_id].hole_length;
-	}
+		fpi_len += record-blocks[block_id].bkp_len;
 
 	/* Update per-rmgr statistics */
 
@@ -465,9 +462,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
    blk);
 			if (XLogRecHasBlockImage(record, block_id))
 			{
-printf( (FPW); hole: offset: %u, length: %u\n,
-	   record-blocks[block_id].hole_offset,
-	   record-blocks[block_id].hole_length);
+if (record-blocks[block_id].is_compressed)
+	printf( (FPW compressed); hole offset: %u, 
+		   compressed length: %u, original length: %u\n,
+		   record-blocks[block_id].hole_offset,
+		   record-blocks[block_id].bkp_len,
+		   record-blocks[block_id].bkp_uncompress_len);
+else
+	printf( (FPW); hole offset: %u, length: %u\n,
+		   record-blocks[block_id].hole_offset,
+		   record-blocks[block_id].bkp_len);
 			}
 			putchar('\n');
 		}
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bcb106..acbbd20 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2282,6 +2282,35 @@ include_dir 'conf.d'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-wal-compression xreflabel=wal_compression
+  termvarnamewal_compression/varname (typeboolean/type)
+  indexterm
+   primaryvarnamewal_compression/ configuration parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+When this parameter is literalon/, the productnamePostgreSQL/
+server compresses the content of full-page writes when necessary and
+inserts in WAL records with smaller sizes, reducing the amount of
+WAL stored on disk.
+   /para
+
+   para
+Compression has the advantage of reducing the amount of disk I/O when
+doing WAL-logging, at the cost of some extra CPU to perform the
+compression of a block image.  At WAL replay, compressed block images
+need extra CPU cycles to perform the decompression of each block
+image, but it can reduce as well replay time in I/O bounded
+environments.
+   /para
+
+   para
+The default value is literaloff/.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-wal-buffers xreflabel=wal_buffers
   termvarnamewal_buffers/varname (typeinteger/type)
   indexterm
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e54..d68d9e3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -88,6 +88,7 @@ char	   *XLogArchiveCommand = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
+bool		wal_compression = false;
 bool		log_checkpoints = false;
 int			sync_method =

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier

On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Returning only a boolean is fine for me (that's what my first patch
 did), especially if we add at some point hooks for compression and
 decompression calls.
 Here is a patch rebased on current HEAD (60838df) for the core feature
 with the APIs of pglz using booleans as return values.
After the revert of 1st patch moving pglz to src/common, I have
reworked both patches, resulting in the attached.

For pglz, the dependency to varlena has been removed to make the code
able to run independently on both frontend and backend sides. In order
to do that the APIs of pglz_compress and pglz_decompress have been
changed a bit:
- pglz_compress returns the number of bytes compressed.
- pglz_decompress takes as additional argument the compressed length
of the buffer, and returns the number of bytes decompressed instead of
a simple boolean for consistency with the compression API.
PGLZ_Header is not modified to keep the on-disk format intact.

The WAL compression patch is realigned based on those changes.
Regards,
-- 
Michael


20141226_fpw_compression_v11.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Fujii Masao

On Fri, Dec 26, 2014 at 12:31 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Returning only a boolean is fine for me (that's what my first patch
 did), especially if we add at some point hooks for compression and
 decompression calls.
 Here is a patch rebased on current HEAD (60838df) for the core feature
 with the APIs of pglz using booleans as return values.
 After the revert of 1st patch moving pglz to src/common, I have
 reworked both patches, resulting in the attached.

 For pglz, the dependency to varlena has been removed to make the code
 able to run independently on both frontend and backend sides. In order
 to do that the APIs of pglz_compress and pglz_decompress have been
 changed a bit:
 - pglz_compress returns the number of bytes compressed.
 - pglz_decompress takes as additional argument the compressed length
 of the buffer, and returns the number of bytes decompressed instead of
 a simple boolean for consistency with the compression API.
 PGLZ_Header is not modified to keep the on-disk format intact.

pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend
which uses those functions needs to handle PGLZ_Header. But it basically should
be handled via the varlena macros. That is, the frontend still seems to need to
understand the varlena datatype. I think we should avoid that. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier

On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend
 which uses those functions needs to handle PGLZ_Header. But it basically 
 should
 be handled via the varlena macros. That is, the frontend still seems to need 
 to
 understand the varlena datatype. I think we should avoid that. Thought?
Hm, yes it may be wiser to remove it and make the data passed to pglz
for varlena 8 bytes shorter..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-24 Thread Fujii Masao

On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Dec 18, 2014 at 5:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks!
 Thanks for your input.

 +else
 +memcpy(compression_scratch, page, page_len);

 I don't think the block image needs to be copied to scratch buffer here.
 We can try to compress the page directly.
 Check.

 +#include utils/pg_lzcompress.h
  #include utils/memutils.h

 pg_lzcompress.h should be after meutils.h.
 Oops.

 +/* Scratch buffer used to store block image to-be-compressed */
 +static char compression_scratch[PGLZ_MAX_BLCKSZ];

 Isn't it better to allocate the memory for compression_scratch in
 InitXLogInsert()
 like hdr_scratch?
 Because the OS would not touch it if wal_compression is never used,
 but now that you mention it, it may be better to get that in the
 context of xlog_insert..

 +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));

 Why don't we allocate the buffer for uncompressed page only once and
 keep reusing it like XLogReaderState-readBuf? The size of uncompressed
 page is at most BLCKSZ, so we can allocate the memory for it even before
 knowing the real size of each block image.
 OK, this would save some cycles. I was trying to make process allocate
 a minimum of memory only when necessary.

 -printf( (FPW); hole: offset: %u, length: %u\n,
 -   record-blocks[block_id].hole_offset,
 -   record-blocks[block_id].hole_length);
 +if (record-blocks[block_id].is_compressed)
 +printf( (FPW); hole offset: %u, compressed length 
 %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);
 +else
 +printf( (FPW); hole offset: %u, length: %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);

 We need to consider what info about FPW we want pg_xlogdump to report.
 I'd like to calculate how much bytes FPW was compressed, from the report
 of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
 and that of compressed one in the report.
 OK, so let's add a parameter in the decoder for the uncompressed
 length. Sounds fine?

 In pg_config.h, the comment of BLCKSZ needs to be updated? Because
 the maximum size of BLCKSZ can be affected by not only itemid but also
 XLogRecordBlockImageHeader.
 Check.

  boolhas_image;
 +boolis_compressed;

 Doesn't ResetDecoder need to reset is_compressed?
 Check.

 +#wal_compression = off# enable compression of full-page writes
 Currently wal_compression compresses only FPW, so isn't it better to place
 it after full_page_writes in postgresql.conf.sample?
 Check.

 +uint16extra_data;/* used to store offset of bytes in
 hole, with
 + * last free bit used to check if block is
 + * compressed */
 At least to me, defining something like the following seems more easy to
 read.
 uint16hole_offset:15,
 is_compressed:1
 Check++.

 Updated patches addressing all those things are attached.

Thanks for updating the patch!

Firstly I'm thinking to commit the
0001-Move-pg_lzcompress.c-to-src-common.patch.

pg_lzcompress.h still exists in include/utils, but it should be moved to
include/common?

Do we really need PGLZ_Status? I'm not sure whether your categorization of
the result status of compress/decompress functions is right or not. For example,
pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems
invalid logically... Maybe this needs to be revisited when we introduce other
compression algorithms and create the wrapper function for those compression
and decompression functions. Anyway making pg_lzdecompress return
the boolean value seems enough.

I updated 0001-Move-pg_lzcompress.c-to-src-common.patch accordingly.
Barring objections, I will push the attached patch firstly.

Regards,

-- 
Fujii Masao
From 92a7c2ac8a6a8ec6ae84c6713f8ad30b7958de94 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 25 Nov 2014 14:05:59 +0900
Subject: [PATCH] Move pg_lzcompress.c to src/common

Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. pglz_decompress contained a call
to elog to emit an error message in case of corrupted data. This function
is changed to return a status code to let its callers return an error
instead. Compression function is changed similarly to make the whole set
consistent.
---
 src/backend/access/heap/tuptoaster.c  |   11 +-
 src/backend/utils/adt/Makefile|4 +-
 src/backend/utils/adt/pg_lzcompress.c |  779 
 src/common/Makefile   |3 +-

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-24 Thread Michael Paquier

On Wed, Dec 24, 2014 at 8:44 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Firstly I'm thinking to commit the
 0001-Move-pg_lzcompress.c-to-src-common.patch.

 pg_lzcompress.h still exists in include/utils, but it should be moved to
 include/common?
You are right. This is a remnant of first version of this patch where
pglz was added in port/ and not common/.

 Do we really need PGLZ_Status? I'm not sure whether your categorization of
 the result status of compress/decompress functions is right or not. For 
 example,
 pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems
 invalid logically... Maybe this needs to be revisited when we introduce other
 compression algorithms and create the wrapper function for those compression
 and decompression functions. Anyway making pg_lzdecompress return
 the boolean value seems enough.
Returning only a boolean is fine for me (that's what my first patch
did), especially if we add at some point hooks for compression and
decompression calls.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Fujii Masao

On Thu, Dec 18, 2014 at 2:21 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed rahilasye...@gmail.com
 wrote:

 I had a look at code. I have few minor points,

 Thanks!

 +   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
 +
 +   if (is_compressed)
 {
 -   rdt_datas_last-data = page;
 -   rdt_datas_last-len = BLCKSZ;
 +   /* compressed block information */
 +   bimg.length = compress_len;
 +   bimg.extra_data = hole_offset;
 +   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;

 For consistency with the existing code , how about renaming the macro
 XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of
 BKPBLOCK_HAS_IMAGE.

 OK, why not...


 +   blk-hole_offset = extra_data  ~XLR_BLCK_COMPRESSED_MASK;
 Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
 more indicative of the fact that lower 15 bits of extra_data field comprises
 of hole_offset value. This suggestion is also just to achieve consistency
 with the existing BKPBLOCK_FORK_MASK for fork_flags field.

 Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK
 though.

 And comment typo
 +* First try to compress block, filling in the page hole with
 zeros
 +* to improve the compression of the whole. If the block is
 considered
 +* as incompressible, complete the block header information as
 if
 +* nothing happened.

 As hole is no longer being compressed, this needs to be changed.

 Fixed. As well as an additional comment block down.

 A couple of things noticed on the fly:
 - Fixed pg_xlogdump being not completely correct to report the FPW
 information
 - A couple of typos and malformed sentences fixed
 - Added an assertion to check that the hole offset value does not the bit
 used for compression status
 - Reworked docs, mentioning as well that wal_compression is off by default.
 - Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by
 Fujii-san)

Thanks!

+else
+memcpy(compression_scratch, page, page_len);

I don't think the block image needs to be copied to scratch buffer here.
We can try to compress the page directly.

+#include utils/pg_lzcompress.h
 #include utils/memutils.h

pg_lzcompress.h should be after meutils.h.

+/* Scratch buffer used to store block image to-be-compressed */
+static char compression_scratch[PGLZ_MAX_BLCKSZ];

Isn't it better to allocate the memory for compression_scratch in
InitXLogInsert()
like hdr_scratch?

+uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));

Why don't we allocate the buffer for uncompressed page only once and
keep reusing it like XLogReaderState-readBuf? The size of uncompressed
page is at most BLCKSZ, so we can allocate the memory for it even before
knowing the real size of each block image.

-printf( (FPW); hole: offset: %u, length: %u\n,
-   record-blocks[block_id].hole_offset,
-   record-blocks[block_id].hole_length);
+if (record-blocks[block_id].is_compressed)
+printf( (FPW); hole offset: %u, compressed length %u\n,
+   record-blocks[block_id].hole_offset,
+   record-blocks[block_id].bkp_len);
+else
+printf( (FPW); hole offset: %u, length: %u\n,
+   record-blocks[block_id].hole_offset,
+   record-blocks[block_id].bkp_len);

We need to consider what info about FPW we want pg_xlogdump to report.
I'd like to calculate how much bytes FPW was compressed, from the report
of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
and that of compressed one in the report.

In pg_config.h, the comment of BLCKSZ needs to be updated? Because
the maximum size of BLCKSZ can be affected by not only itemid but also
XLogRecordBlockImageHeader.

 boolhas_image;
+boolis_compressed;

Doesn't ResetDecoder need to reset is_compressed?

+#wal_compression = off# enable compression of full-page writes

Currently wal_compression compresses only FPW, so isn't it better to place
it after full_page_writes in postgresql.conf.sample?

+uint16extra_data;/* used to store offset of bytes in
hole, with
+ * last free bit used to check if block is
+ * compressed */

At least to me, defining something like the following seems more easy to
read.

uint16hole_offset:15,
is_compressed:1

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Rahila Syed

Isn't it better to allocate the memory for compression_scratch in
InitXLogInsert()
like hdr_scratch?

I think making compression_scratch a statically allocated global variable
 is the result of  following discussion earlier,

http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com


Thank you,
Rahila Syed



On Thu, Dec 18, 2014 at 1:57 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, Dec 18, 2014 at 2:21 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 
 
  On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed rahilasye...@gmail.com
  wrote:
 
  I had a look at code. I have few minor points,
 
  Thanks!
 
  +   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
  +
  +   if (is_compressed)
  {
  -   rdt_datas_last-data = page;
  -   rdt_datas_last-len = BLCKSZ;
  +   /* compressed block information */
  +   bimg.length = compress_len;
  +   bimg.extra_data = hole_offset;
  +   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;
 
  For consistency with the existing code , how about renaming the macro
  XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines
 of
  BKPBLOCK_HAS_IMAGE.
 
  OK, why not...
 
 
  +   blk-hole_offset = extra_data 
 ~XLR_BLCK_COMPRESSED_MASK;
  Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
  more indicative of the fact that lower 15 bits of extra_data field
 comprises
  of hole_offset value. This suggestion is also just to achieve
 consistency
  with the existing BKPBLOCK_FORK_MASK for fork_flags field.
 
  Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK
  though.
 
  And comment typo
  +* First try to compress block, filling in the page hole
 with
  zeros
  +* to improve the compression of the whole. If the block is
  considered
  +* as incompressible, complete the block header information
 as
  if
  +* nothing happened.
 
  As hole is no longer being compressed, this needs to be changed.
 
  Fixed. As well as an additional comment block down.
 
  A couple of things noticed on the fly:
  - Fixed pg_xlogdump being not completely correct to report the FPW
  information
  - A couple of typos and malformed sentences fixed
  - Added an assertion to check that the hole offset value does not the bit
  used for compression status
  - Reworked docs, mentioning as well that wal_compression is off by
 default.
  - Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by
  Fujii-san)

 Thanks!

 +else
 +memcpy(compression_scratch, page, page_len);

 I don't think the block image needs to be copied to scratch buffer here.
 We can try to compress the page directly.

 +#include utils/pg_lzcompress.h
  #include utils/memutils.h

 pg_lzcompress.h should be after meutils.h.

 +/* Scratch buffer used to store block image to-be-compressed */
 +static char compression_scratch[PGLZ_MAX_BLCKSZ];

 Isn't it better to allocate the memory for compression_scratch in
 InitXLogInsert()
 like hdr_scratch?

 +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));

 Why don't we allocate the buffer for uncompressed page only once and
 keep reusing it like XLogReaderState-readBuf? The size of uncompressed
 page is at most BLCKSZ, so we can allocate the memory for it even before
 knowing the real size of each block image.

 -printf( (FPW); hole: offset: %u, length: %u\n,
 -   record-blocks[block_id].hole_offset,
 -   record-blocks[block_id].hole_length);
 +if (record-blocks[block_id].is_compressed)
 +printf( (FPW); hole offset: %u, compressed length
 %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);
 +else
 +printf( (FPW); hole offset: %u, length: %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);

 We need to consider what info about FPW we want pg_xlogdump to report.
 I'd like to calculate how much bytes FPW was compressed, from the report
 of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
 and that of compressed one in the report.

 In pg_config.h, the comment of BLCKSZ needs to be updated? Because
 the maximum size of BLCKSZ can be affected by not only itemid but also
 XLogRecordBlockImageHeader.

  boolhas_image;
 +boolis_compressed;

 Doesn't ResetDecoder need to reset is_compressed?

 +#wal_compression = off# enable compression of full-page writes

 Currently wal_compression compresses only FPW, so isn't it better to place
 it after full_page_writes in postgresql.conf.sample?

 +uint16extra_data;/* used to store offset of

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Fujii Masao

On Thu, Dec 18, 2014 at 7:31 PM, Rahila Syed rahilasye...@gmail.com wrote:
Isn't it better to allocate the memory for compression_scratch in
InitXLogInsert()
like hdr_scratch?

 I think making compression_scratch a statically allocated global variable
 is the result of  following discussion earlier,

 http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com

/*
 * Permanently allocate readBuf.  We do it this way, rather than just
 * making a static array, for two reasons: (1) no need to waste the
 * storage in most instantiations of the backend; (2) a static char array
 * isn't guaranteed to have any particular alignment, whereas palloc()
 * will provide MAXALIGN'd storage.
 */

The above source code comment in XLogReaderAllocate() makes me think that
it's better to avoid using a static array. The point (1) seems less important in
this case because most processes need the buffer for WAL compression,
though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Michael Paquier

On Thu, Dec 18, 2014 at 7:31 PM, Rahila Syed rahilasye...@gmail.com wrote:
Isn't it better to allocate the memory for compression_scratch in
InitXLogInsert()
like hdr_scratch?

 I think making compression_scratch a statically allocated global variable
 is the result of  following discussion earlier,
 http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com
Yep, in this case the OS does not request this memory as long as it is
not touched, like when wal_compression is off all the time in the
backend. Robert mentioned that upthread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Michael Paquier

On Thu, Dec 18, 2014 at 5:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks!
Thanks for your input.

 +else
 +memcpy(compression_scratch, page, page_len);

 I don't think the block image needs to be copied to scratch buffer here.
 We can try to compress the page directly.
Check.

 +#include utils/pg_lzcompress.h
  #include utils/memutils.h

 pg_lzcompress.h should be after meutils.h.
Oops.

 +/* Scratch buffer used to store block image to-be-compressed */
 +static char compression_scratch[PGLZ_MAX_BLCKSZ];

 Isn't it better to allocate the memory for compression_scratch in
 InitXLogInsert()
 like hdr_scratch?
Because the OS would not touch it if wal_compression is never used,
but now that you mention it, it may be better to get that in the
context of xlog_insert..

 +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));

 Why don't we allocate the buffer for uncompressed page only once and
 keep reusing it like XLogReaderState-readBuf? The size of uncompressed
 page is at most BLCKSZ, so we can allocate the memory for it even before
 knowing the real size of each block image.
OK, this would save some cycles. I was trying to make process allocate
a minimum of memory only when necessary.

 -printf( (FPW); hole: offset: %u, length: %u\n,
 -   record-blocks[block_id].hole_offset,
 -   record-blocks[block_id].hole_length);
 +if (record-blocks[block_id].is_compressed)
 +printf( (FPW); hole offset: %u, compressed length %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);
 +else
 +printf( (FPW); hole offset: %u, length: %u\n,
 +   record-blocks[block_id].hole_offset,
 +   record-blocks[block_id].bkp_len);

 We need to consider what info about FPW we want pg_xlogdump to report.
 I'd like to calculate how much bytes FPW was compressed, from the report
 of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
 and that of compressed one in the report.
OK, so let's add a parameter in the decoder for the uncompressed
length. Sounds fine?

 In pg_config.h, the comment of BLCKSZ needs to be updated? Because
 the maximum size of BLCKSZ can be affected by not only itemid but also
 XLogRecordBlockImageHeader.
Check.

  boolhas_image;
 +boolis_compressed;

 Doesn't ResetDecoder need to reset is_compressed?
Check.

 +#wal_compression = off# enable compression of full-page writes
 Currently wal_compression compresses only FPW, so isn't it better to place
 it after full_page_writes in postgresql.conf.sample?
Check.

 +uint16extra_data;/* used to store offset of bytes in
 hole, with
 + * last free bit used to check if block is
 + * compressed */
 At least to me, defining something like the following seems more easy to
 read.
 uint16hole_offset:15,
 is_compressed:1
Check++.

Updated patches addressing all those things are attached.
Regards,
-- 
Michael


20141219_fpw_compression_v9.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Rahila Syed

Hello,

Patches as well as results are attached.

I had a look at code. I have few minor points,

+   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+
+   if (is_compressed)
{
-   rdt_datas_last-data = page;
-   rdt_datas_last-len = BLCKSZ;
+   /* compressed block information */
+   bimg.length = compress_len;
+   bimg.extra_data = hole_offset;
+   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;

For consistency with the existing code , how about renaming the macro
XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of
BKPBLOCK_HAS_IMAGE.

+   blk-hole_offset = extra_data  ~XLR_BLCK_COMPRESSED_MASK;
Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
more indicative of the fact that lower 15 bits of extra_data field
comprises of hole_offset value. This suggestion is also just to achieve
consistency with the existing BKPBLOCK_FORK_MASK for fork_flags field.

And comment typo
+* First try to compress block, filling in the page hole with
zeros
+* to improve the compression of the whole. If the block is
considered
+* as incompressible, complete the block header information as
if
+* nothing happened.

As hole is no longer being compressed, this needs to be changed.

Thank you,
Rahila Syed







On Tue, Dec 16, 2014 at 10:04 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:



 On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier 
 michael.paqu...@gmail.com wrote:

 Actually, the original length of the compressed block in saved in
 PGLZ_Header, so if we are fine to not check the size of the block
 decompressed when decoding WAL we can do without the hole filled with
 zeros, and use only 1 bit to see if the block is compressed or not.

 And.. After some more hacking, I have been able to come up with a patch
 that is able to compress blocks without the page hole, and that keeps the
 WAL record length the same as HEAD when compression switch is off. The
 numbers are pretty good, CPU is saved in the same proportions as previous
 patches when compression is enabled, and there is zero delta with HEAD when
 compression switch is off.

 Here are the actual numbers:
test_name   | pg_size_pretty | user_diff | system_diff
 ---++---+-
  FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
  FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
  FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
  FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
  FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
  FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
  FPW on + 0 bytes, ffactor 50  | 582 MB | 42.174297 |0.790596
  FPW on + 0 bytes, ffactor 20  | 229 MB | 14.424233 |0.770459
  FPW on + 0 bytes, ffactor 10  | 117 MB |  7.057195 |0.584806
  FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516
  FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207
  FPW off + 0 bytes, ffactor 10 | 148 MB |  5.827191 |0.874285
  HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
  HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
  HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
  Record, ffactor 50| 582 MB | 54.904374 |0.678204
  Record, ffactor 20| 229 MB | 19.798268 |0.807220
  Record, ffactor 10| 116 MB |  9.401877 |0.668454
 (18 rows)

 The new tests of this patch are FPW off + 0 bytes. Patches as well as
 results are attached.
 Regards,
 --
 Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Fujii Masao

On Wed, Dec 17, 2014 at 1:34 AM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 Actually, the original length of the compressed block in saved in
 PGLZ_Header, so if we are fine to not check the size of the block
 decompressed when decoding WAL we can do without the hole filled with zeros,
 and use only 1 bit to see if the block is compressed or not.

 And.. After some more hacking, I have been able to come up with a patch that
 is able to compress blocks without the page hole, and that keeps the WAL
 record length the same as HEAD when compression switch is off. The numbers
 are pretty good, CPU is saved in the same proportions as previous patches
 when compression is enabled, and there is zero delta with HEAD when
 compression switch is off.

 Here are the actual numbers:
test_name   | pg_size_pretty | user_diff | system_diff
 ---++---+-
  FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
  FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
  FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
  FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
  FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
  FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
  FPW on + 0 bytes, ffactor 50  | 582 MB | 42.174297 |0.790596
  FPW on + 0 bytes, ffactor 20  | 229 MB | 14.424233 |0.770459
  FPW on + 0 bytes, ffactor 10  | 117 MB |  7.057195 |0.584806
  FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516
  FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207
  FPW off + 0 bytes, ffactor 10 | 148 MB |  5.827191 |0.874285
  HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
  HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
  HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
  Record, ffactor 50| 582 MB | 54.904374 |0.678204
  Record, ffactor 20| 229 MB | 19.798268 |0.807220
  Record, ffactor 10| 116 MB |  9.401877 |0.668454
 (18 rows)

 The new tests of this patch are FPW off + 0 bytes. Patches as well as
 results are attached.

I think that neither pg_control nor xl_parameter_change need to have the info
about WAL compression because each backup block has that entry.

Will review the remaining part later.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Michael Paquier

On Thu, Dec 18, 2014 at 1:05 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Dec 17, 2014 at 1:34 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I think that neither pg_control nor xl_parameter_change need to have the info
 about WAL compression because each backup block has that entry.

 Will review the remaining part later.
I got into wondering the utility of this part earlier this morning as
that's some remnant of when wal_compression was set as PGC_POSTMASTER.
Will remove.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Michael Paquier

On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed rahilasye...@gmail.com
wrote:

 I had a look at code. I have few minor points,

Thanks!

+   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
 +
 +   if (is_compressed)
 {
 -   rdt_datas_last-data = page;
 -   rdt_datas_last-len = BLCKSZ;
 +   /* compressed block information */
 +   bimg.length = compress_len;
 +   bimg.extra_data = hole_offset;
 +   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;

 For consistency with the existing code , how about renaming the macro
 XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of
 BKPBLOCK_HAS_IMAGE.

OK, why not...


 +   blk-hole_offset = extra_data  ~XLR_BLCK_COMPRESSED_MASK;
 Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
 more indicative of the fact that lower 15 bits of extra_data field
 comprises of hole_offset value. This suggestion is also just to achieve
 consistency with the existing BKPBLOCK_FORK_MASK for fork_flags field.

Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK
though.

And comment typo
 +* First try to compress block, filling in the page hole with
 zeros
 +* to improve the compression of the whole. If the block is
 considered
 +* as incompressible, complete the block header information as
 if
 +* nothing happened.

 As hole is no longer being compressed, this needs to be changed.

Fixed. As well as an additional comment block down.

A couple of things noticed on the fly:
- Fixed pg_xlogdump being not completely correct to report the FPW
information
- A couple of typos and malformed sentences fixed
- Added an assertion to check that the hole offset value does not the bit
used for compression status
- Reworked docs, mentioning as well that wal_compression is off by default.
- Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by
Fujii-san)

Regards,
-- 
Michael


20141218_fpw_compression_v8.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Tue, Dec 16, 2014 at 8:35 AM, Michael Paquier michael.paqu...@gmail.com
wrote:
 On Tue, Dec 16, 2014 at 3:46 AM, Robert Haas robertmh...@gmail.com
wrote:
 On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Something to be aware of btw is that this patch introduces an
 additional 8 bytes per block image in WAL as it contains additional
 information to control the compression. In this case this is the
 uint16 compress_len present in XLogRecordBlockImageHeader. In the case
 of the measurements done, knowing that 63638 FPWs have been written,
 there is a difference of a bit less than 500k in WAL between HEAD and
 FPW off in favor of HEAD. The gain with compression is welcome,
 still for the default there is a small price to track down if a block
 is compressed or not. This patch still takes advantage of it by not
 compressing the hole present in page and reducing CPU work a bit.

 That sounds like a pretty serious problem to me.
 OK. If that's so much a problem, I'll switch back to the version using
 1 bit in the block header to identify if a block is compressed or not.
 This way, when switch will be off the record length will be the same
 as HEAD.
And here are attached fresh patches reducing the WAL record size to what it
is in head when the compression switch is off. Looking at the logic in
xlogrecord.h, the block header stores the hole length and hole offset. I
changed that a bit to store the length of raw block, with hole or
compressed as the 1st uint16. The second uint16 is used to store the hole
offset, same as HEAD when compression switch is off. When compression is
on, a special value 0x is saved (actually only filling 1 in the 16th
bit is fine...). Note that this forces to fill in the hole with zeros and
to compress always BLCKSZ worth of data.
Those patches pass make check-world, even WAL replay on standbys.

I have done as well measurements using this patch set, with the following
things that can be noticed:
- When compression switch is off, the same quantity of WAL as HEAD is
produced
- pglz is very bad at compressing page hole. I mean, really bad. Have a
look at the user CPU particularly when pages are empty and you'll
understand... Other compression algorithms would be better here. Tests are
done with various values of fillfactor, 10 means that after the update 80%
of the page is empty, at 50% the page is more or less completely full.

Here are the results, with 5 test cases:
- FPW on + 2 bytes, compression switch is on, using 2 additional bytes in
block header, resulting in WAL records longer as 8 more bytes are used per
block with lower CPU usage as page holes are not compressed by pglz.
- FPW off + 2 bytes, same as previous, with compression switch to on.
- FPW on + 0 bytes, compression switch to on, the same block header size as
HEAD is used, at the cost of compressing page holes filled with zeros
- FPW on + 0 bytes, compression switch to off, same as previous
- HEAD, unpatched master (except with hack to calculate user and system CPU)
- Record, the record-level compression, with compression lower-bound set at
0.

=# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update -
pre_update), user_diff, system_diff from results;
   ?column?| pg_size_pretty | user_diff | system_diff
---++---+-
 FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
 FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
 FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
 FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
 FPW on + 0 bytes, ffactor 50  | 585 MB | 54.115496 |0.924891
 FPW on + 0 bytes, ffactor 20  | 234 MB | 26.270404 |0.755862
 FPW on + 0 bytes, ffactor 10  | 122 MB | 19.540131 |0.800981
 FPW off + 0 bytes, ffactor 50 | 746 MB | 25.102241 |1.110677
 FPW off + 0 bytes, ffactor 20 | 293 MB |  9.889374 |0.749884
 FPW off + 0 bytes, ffactor 10 | 148 MB |  5.286767 |0.682746
 HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
 HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
 HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
 Record, ffactor 50| 582 MB | 54.904374 |0.678204
 Record, ffactor 20| 229 MB | 19.798268 |0.807220
 Record, ffactor 10| 116 MB |  9.401877 |0.668454
(18 rows)

Attached are as well the results of the measurements, and the test case
used.
Regards,
-- 
Michael


20141216_fpw_compression_v7.tar.gz
Description: GNU Zip compressed data


results.sql
Description: Binary data


test_compress
Description: Binary data

--

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Alvaro Herrera

Michael Paquier wrote:

 And here are attached fresh patches reducing the WAL record size to what it
 is in head when the compression switch is off. Looking at the logic in
 xlogrecord.h, the block header stores the hole length and hole offset. I
 changed that a bit to store the length of raw block, with hole or
 compressed as the 1st uint16. The second uint16 is used to store the hole
 offset, same as HEAD when compression switch is off. When compression is
 on, a special value 0x is saved (actually only filling 1 in the 16th
 bit is fine...). Note that this forces to fill in the hole with zeros and
 to compress always BLCKSZ worth of data.

Why do we compress the hole?  This seems pointless, considering that we
know it's all zeroes.  Is it possible to compress the head and tail of
page separately?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Tue, Dec 16, 2014 at 11:24 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Michael Paquier wrote:

  And here are attached fresh patches reducing the WAL record size to what
 it
  is in head when the compression switch is off. Looking at the logic in
  xlogrecord.h, the block header stores the hole length and hole offset. I
  changed that a bit to store the length of raw block, with hole or
  compressed as the 1st uint16. The second uint16 is used to store the hole
  offset, same as HEAD when compression switch is off. When compression is
  on, a special value 0x is saved (actually only filling 1 in the 16th
  bit is fine...). Note that this forces to fill in the hole with zeros and
  to compress always BLCKSZ worth of data.

 Why do we compress the hole?  This seems pointless, considering that we
 know it's all zeroes.  Is it possible to compress the head and tail of
 page separately?

This would take 2 additional bytes at minimum in the block header,
resulting in 8 additional bytes in record each time a FPW shows up. IMO it
is important to check the length of things obtained when replaying WAL,
that's something the current code of HEAD does quite well.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Tue, Dec 16, 2014 at 11:30 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:



 On Tue, Dec 16, 2014 at 11:24 PM, Alvaro Herrera alvhe...@2ndquadrant.com
  wrote:

 Michael Paquier wrote:

  And here are attached fresh patches reducing the WAL record size to
 what it
  is in head when the compression switch is off. Looking at the logic in
  xlogrecord.h, the block header stores the hole length and hole offset. I
  changed that a bit to store the length of raw block, with hole or
  compressed as the 1st uint16. The second uint16 is used to store the
 hole
  offset, same as HEAD when compression switch is off. When compression is
  on, a special value 0x is saved (actually only filling 1 in the 16th
  bit is fine...). Note that this forces to fill in the hole with zeros
 and
  to compress always BLCKSZ worth of data.

 Why do we compress the hole?  This seems pointless, considering that we
 know it's all zeroes.  Is it possible to compress the head and tail of
 page separately?

 This would take 2 additional bytes at minimum in the block header,
 resulting in 8 additional bytes in record each time a FPW shows up. IMO it
 is important to check the length of things obtained when replaying WAL,
 that's something the current code of HEAD does quite well.

Actually, the original length of the compressed block in saved in
PGLZ_Header, so if we are fine to not check the size of the block
decompressed when decoding WAL we can do without the hole filled with
zeros, and use only 1 bit to see if the block is compressed or not.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Merlin Moncure

On Mon, Dec 15, 2014 at 5:37 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Dec 16, 2014 at 5:14 AM, Merlin Moncure mmonc...@gmail.com wrote:
 OTOH, Our built in compressor as we all know is a complete dog in
 terms of cpu when stacked up against some more modern implementations.
 All that said, as long as there is a clean path to migrating to
 another compression alg should one materialize, that problem can be
 nicely decoupled from this patch as Robert pointed out.
 I am curious to see some numbers about that. Has anyone done such
 comparison measurements?

I don't, but I can make some.  There are some numbers on the web but
it's better to make some new ones because IIRC some light optimization
had gone into plgz of late.

Compressing *one* file with lz4 and a quick/n/dirty plgz i hacked out
of the source (borrowing heavily from
https://github.com/maropu/pglz_bench/blob/master/pglz_bench.cpp),  I
tested the results:

lz4 real time:  0m0.032s
pglz real time: 0m0.281s

mmoncure@mernix2 ~/src/lz4/lz4-r125 $ ls -lh test.*
-rw-r--r-- 1 mmoncure mmoncure 2.7M Dec 16 09:04 test.lz4
-rw-r--r-- 1 mmoncure mmoncure 2.5M Dec 16 09:01 test.pglz

A better test would examine all manner of different xlog files in a
fashion closer to how your patch would need to compress them but the
numbers here tell a fairly compelling story: similar compression
results for around 9x the cpu usage.  Be advised that compression alg
selection is one of those types of discussions that tends to spin off
into outer space; that's not something you have to solve today.  Just
try and make things so that they can be switched out if things
change

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Wed, Dec 17, 2014 at 12:12 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Compressing *one* file with lz4 and a quick/n/dirty plgz i hacked out
 of the source (borrowing heavily from
 https://github.com/maropu/pglz_bench/blob/master/pglz_bench.cpp),  I
 tested the results:

 lz4 real time:  0m0.032s
 pglz real time: 0m0.281s

 mmoncure@mernix2 ~/src/lz4/lz4-r125 $ ls -lh test.*
 -rw-r--r-- 1 mmoncure mmoncure 2.7M Dec 16 09:04 test.lz4
 -rw-r--r-- 1 mmoncure mmoncure 2.5M Dec 16 09:01 test.pglz


 A better test would examine all manner of different xlog files in a
 fashion closer to how your patch would need to compress them but the
 numbers here tell a fairly compelling story: similar compression
 results for around 9x the cpu usage.

Yes that could be better... Thanks for some real numbers that's really
informative.

Be advised that compression alg
 selection is one of those types of discussions that tends to spin off
 into outer space; that's not something you have to solve today.  Just
 try and make things so that they can be switched out if things
 change

One way to get around that would be a set of hooks to allow people to set
up the compression algorithm they want:
- One for buffer compression
- One for buffer decompression
- Perhaps one to set up the size of the buffer used for
compression/decompression scratch buffer. In the case of pglz, this needs
for example to be PGLZ_MAX_OUTPUT(buffer_size). The part actually tricky is
that as xlogreader.c is used by pg_xlogdump, we cannot directly use a hook
in it, but we may as well be able to live with a fixed maximum size like
BLCKSZ * 2 by default, this would let largely enough room for all kinds of
compression algos IMO...
Regards,
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-15 Thread Robert Haas

On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Something to be aware of btw is that this patch introduces an
 additional 8 bytes per block image in WAL as it contains additional
 information to control the compression. In this case this is the
 uint16 compress_len present in XLogRecordBlockImageHeader. In the case
 of the measurements done, knowing that 63638 FPWs have been written,
 there is a difference of a bit less than 500k in WAL between HEAD and
 FPW off in favor of HEAD. The gain with compression is welcome,
 still for the default there is a small price to track down if a block
 is compressed or not. This patch still takes advantage of it by not
 compressing the hole present in page and reducing CPU work a bit.

That sounds like a pretty serious problem to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-15 Thread Merlin Moncure

On Fri, Dec 12, 2014 at 8:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-12-12 09:24:27 -0500, Bruce Momjian wrote:
 On Fri, Dec 12, 2014 at 03:22:24PM +0100, Andres Freund wrote:
   Well, the larger question is why wouldn't we just have the user compress
   the entire WAL file before archiving --- why have each backend do it?
   Is it the write volume we are saving?  I though this WAL compression
   gave better performance in some cases.
 
  Err. Streaming?

 Well, you can already set up SSL for compression while streaming.  In
 fact, I assume many are already using SSL for streaming as the majority
 of SSL overhead is from connection start.

 That's not really true. The overhead of SSL during streaming is
 *significant*. Both the kind of compression it does (which is far more
 expensive than pglz or lz4) and the encyrption itself. In many cases
 it's prohibitively expensive - there's even a fair number on-list
 reports about this.

(late to the party)
That may be true, but there are a number of ways to work around SSL
performance issues such as hardware acceleration (perhaps deferring
encryption to another point in the network), weakening the protocol,
or not using it at all.

OTOH, Our built in compressor as we all know is a complete dog in
terms of cpu when stacked up against some more modern implementations.
All that said, as long as there is a clean path to migrating to
another compression alg should one materialize, that problem can be
nicely decoupled from this patch as Robert pointed out.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-14 Thread Michael Paquier

Note: this patch has been moved to CF 2014-12 and I marked myself as
an author if that's fine... I've finished by being really involved in
that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Simon Riggs

On 12 December 2014 at 21:40, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 12, 2014 at 1:51 PM, Simon Riggs si...@2ndquadrant.com wrote:
 What I don't understand is why we aren't working on double buffering,
 since that cost would be paid in a background process and would be
 evenly spread out across a checkpoint. Plus we'd be able to remove
 FPWs altogether, which is like 100% compression.

 The previous patch to implement that - by somebody at vmware - was an
 epic fail.  I'm not opposed to seeing somebody try again, but it's a
 tricky problem.  When the double buffer fills up, then you've got to
 finish flushing the pages whose images are stored in the buffer to
 disk before you can overwrite it, which acts like a kind of
 mini-checkpoint.  That problem might be solvable, but let's use this
 thread to discuss this patch, not some other patch that someone might
 have chosen to write but didn't.

No, I think its relevant.

WAL compression looks to me like a short term tweak, not the end game.

On that basis, we should go for simple and effective, user-settable
compression of FPWs and not spend too much Valuable Committer Time on
it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Fri, Dec 12, 2014 at 11:50 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Wed, Dec 10, 2014 at 11:25 PM, Bruce Momjian br...@momjian.us wrote:

 On Wed, Dec 10, 2014 at 07:40:46PM +0530, Rahila Syed wrote:
  The tests ran for around 30 mins.Manual checkpoint was run before each
  test.
 
  Compression   WAL generated%compressionLatency-avg   CPU usage
  (seconds)  TPS
  Latency
  stddev
 
 
  on  1531.4 MB  ~35 %  7.351 ms
user diff: 562.67s system diff: 41.40s  135.96
13.759 ms
 
 
  off  2373.1 MB 6.781
  ms
user diff: 354.20s  system diff: 39.67s147.40
14.152 ms
 
  The compression obtained is quite high close to 35 %.
  CPU usage at user level when compression is on is quite noticeably high
  as
  compared to that when compression is off. But gain in terms of reduction
  of WAL
  is also high.

 I am sorry but I can't understand the above results due to wrapping.
 Are you saying compression was twice as slow?


 I got curious to see how the compression of an entire record would perform
 and how it compares for small WAL records, and here are some numbers based
 on the patch attached, this patch compresses the whole record including the
 block headers, letting only XLogRecord out of it with a flag indicating that
 the record is compressed (note that this patch contains a portion for replay
 untested, still this patch gives an idea on how much compression of the
 whole record affects user CPU in this test case). It uses a buffer of 4 *
 BLCKSZ, if the record is longer than that compression is simply given up.
 Those tests are using the hack upthread calculating user and system CPU
 using getrusage() when a backend.

 Here is the simple test case I used with 512MB of shared_buffers and small
 records, filling up a bunch of buffers, dirtying them and them compressing
 FPWs with a checkpoint.
 #!/bin/bash
 psql EOF
 SELECT pg_backend_pid();
 CREATE TABLE aa (a int);
 CREATE TABLE results (phase text, position pg_lsn);
 CREATE EXTENSION IF NOT EXISTS pg_prewarm;
 ALTER TABLE aa SET (FILLFACTOR = 50);
 INSERT INTO results VALUES ('pre-insert', pg_current_xlog_location());
 INSERT INTO aa VALUES (generate_series(1,700)); -- 484MB
 SELECT pg_size_pretty(pg_relation_size('aa'::regclass));
 SELECT pg_prewarm('aa'::regclass);
 CHECKPOINT;
 INSERT INTO results VALUES ('pre-update', pg_current_xlog_location());
 UPDATE aa SET a = 700 + a;
 CHECKPOINT;
 INSERT INTO results VALUES ('post-update', pg_current_xlog_location());
 SELECT * FROM results;
 EOF
Re-using this test case, I have produced more results by changing the
fillfactor of the table:
=# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update
- pre_update), user_diff, system_diff from results;
   ?column?| pg_size_pretty | user_diff | system_diff
---++---+-
 FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
 FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
 FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
 FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
 HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
 HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
 HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
 Record, ffactor 50| 582 MB | 54.904374 |0.678204
 Record, ffactor 20| 229 MB | 19.798268 |0.807220
 Record, ffactor 10| 116 MB |  9.401877 |0.668454
(12 rows)

The following tests are run:
- Record means the record-level compression
- HEAD is postgres at 1c5c70df
- FPW off is HEAD + patch with switch set to off
- FPW on is HEAD + patch with switch set to on
The gain in compression has a linear profile with the length of page
hole. There was visibly some noise in the tests: you can see that the
CPU of FPW off is a bit higher than HEAD.

Something to be aware of btw is that this patch introduces an
additional 8 bytes per block image in WAL as it contains additional
information to control the compression. In this case this is the
uint16 compress_len present in XLogRecordBlockImageHeader. In the case
of the measurements done, knowing that 63638 FPWs have been written,
there is a difference of a bit less than 500k in WAL between HEAD and
FPW off in favor of HEAD. The gain with compression is welcome,
still for the default there is a small price to track down if a block
is compressed or not. This patch still takes advantage of it by not
compressing the hole present in page and

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Simon Riggs

On 13 December 2014 at 14:36, Michael Paquier michael.paqu...@gmail.com wrote:

 Something to be aware of btw is that this patch introduces an
 additional 8 bytes per block image in WAL as it contains additional
 information to control the compression. In this case this is the
 uint16 compress_len present in XLogRecordBlockImageHeader.

So we add 8 bytes to all FPWs, or only for compressed FPWs?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 December 2014 at 14:36, Michael Paquier michael.paqu...@gmail.com 
 wrote:

 Something to be aware of btw is that this patch introduces an
 additional 8 bytes per block image in WAL as it contains additional
 information to control the compression. In this case this is the
 uint16 compress_len present in XLogRecordBlockImageHeader.

 So we add 8 bytes to all FPWs, or only for compressed FPWs?
In this case that was all. We could still use xl_info to put a flag
telling that blocks are compressed, but it feels more consistent to
have a way to identify if a block is compressed inside its own header.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Robert Haas

On Thu, Dec 11, 2014 at 11:34 AM, Bruce Momjian br...@momjian.us wrote:
 compression = 'on'  : 1838 secs
 = 'off' : 1701 secs

 Different is around 140 secs.

 OK, so the compression took 2x the cpu and was 8% slower.  The only
 benefit is WAL files are 35% smaller?

Compression didn't take 2x the CPU.  It increased user CPU from 354.20
s to 562.67 s over the course of the run, so it took about 60% more
CPU.

But I wouldn't be too discouraged by that.  At least AIUI, there are
quite a number of users for whom WAL volume is a serious challenge,
and they might be willing to pay that price to have less of it.  Also,
we have talked a number of times before about incorporating Snappy or
LZ4, which I'm guessing would save a fair amount of CPU -- but the
decision was made to leave that out of the first version, and just use
pg_lz, to keep the initial patch simple.  I think that was a good
decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 08:27:59 -0500, Robert Haas wrote:
 On Thu, Dec 11, 2014 at 11:34 AM, Bruce Momjian br...@momjian.us wrote:
  compression = 'on'  : 1838 secs
  = 'off' : 1701 secs
 
  Different is around 140 secs.
 
  OK, so the compression took 2x the cpu and was 8% slower.  The only
  benefit is WAL files are 35% smaller?
 
 Compression didn't take 2x the CPU.  It increased user CPU from 354.20
 s to 562.67 s over the course of the run, so it took about 60% more
 CPU.
 
 But I wouldn't be too discouraged by that.  At least AIUI, there are
 quite a number of users for whom WAL volume is a serious challenge,
 and they might be willing to pay that price to have less of it.

And it might actually result in *higher* performance in a good number of
cases if the the WAL flushes are a significant part of the cost.

IIRC he test used a single process - that's probably not too
representative...

 Also,
 we have talked a number of times before about incorporating Snappy or
 LZ4, which I'm guessing would save a fair amount of CPU -- but the
 decision was made to leave that out of the first version, and just use
 pg_lz, to keep the initial patch simple.  I think that was a good
 decision.

Agreed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Bruce Momjian

On Fri, Dec 12, 2014 at 08:27:59AM -0500, Robert Haas wrote:
 On Thu, Dec 11, 2014 at 11:34 AM, Bruce Momjian br...@momjian.us wrote:
  compression = 'on'  : 1838 secs
  = 'off' : 1701 secs
 
  Different is around 140 secs.
 
  OK, so the compression took 2x the cpu and was 8% slower.  The only
  benefit is WAL files are 35% smaller?
 
 Compression didn't take 2x the CPU.  It increased user CPU from 354.20
 s to 562.67 s over the course of the run, so it took about 60% more
 CPU.
 
 But I wouldn't be too discouraged by that.  At least AIUI, there are
 quite a number of users for whom WAL volume is a serious challenge,
 and they might be willing to pay that price to have less of it.  Also,
 we have talked a number of times before about incorporating Snappy or
 LZ4, which I'm guessing would save a fair amount of CPU -- but the
 decision was made to leave that out of the first version, and just use
 pg_lz, to keep the initial patch simple.  I think that was a good
 decision.

Well, the larger question is why wouldn't we just have the user compress
the entire WAL file before archiving --- why have each backend do it? 
Is it the write volume we are saving?  I though this WAL compression
gave better performance in some cases.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 09:18:01 -0500, Bruce Momjian wrote:
 On Fri, Dec 12, 2014 at 08:27:59AM -0500, Robert Haas wrote:
  On Thu, Dec 11, 2014 at 11:34 AM, Bruce Momjian br...@momjian.us wrote:
   compression = 'on'  : 1838 secs
   = 'off' : 1701 secs
  
   Different is around 140 secs.
  
   OK, so the compression took 2x the cpu and was 8% slower.  The only
   benefit is WAL files are 35% smaller?
  
  Compression didn't take 2x the CPU.  It increased user CPU from 354.20
  s to 562.67 s over the course of the run, so it took about 60% more
  CPU.
  
  But I wouldn't be too discouraged by that.  At least AIUI, there are
  quite a number of users for whom WAL volume is a serious challenge,
  and they might be willing to pay that price to have less of it.  Also,
  we have talked a number of times before about incorporating Snappy or
  LZ4, which I'm guessing would save a fair amount of CPU -- but the
  decision was made to leave that out of the first version, and just use
  pg_lz, to keep the initial patch simple.  I think that was a good
  decision.
 
 Well, the larger question is why wouldn't we just have the user compress
 the entire WAL file before archiving --- why have each backend do it? 
 Is it the write volume we are saving?  I though this WAL compression
 gave better performance in some cases.

Err. Streaming?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Bruce Momjian

On Fri, Dec 12, 2014 at 03:22:24PM +0100, Andres Freund wrote:
 On 2014-12-12 09:18:01 -0500, Bruce Momjian wrote:
  On Fri, Dec 12, 2014 at 08:27:59AM -0500, Robert Haas wrote:
   On Thu, Dec 11, 2014 at 11:34 AM, Bruce Momjian br...@momjian.us wrote:
compression = 'on'  : 1838 secs
= 'off' : 1701 secs
   
Different is around 140 secs.
   
OK, so the compression took 2x the cpu and was 8% slower.  The only
benefit is WAL files are 35% smaller?
   
   Compression didn't take 2x the CPU.  It increased user CPU from 354.20
   s to 562.67 s over the course of the run, so it took about 60% more
   CPU.
   
   But I wouldn't be too discouraged by that.  At least AIUI, there are
   quite a number of users for whom WAL volume is a serious challenge,
   and they might be willing to pay that price to have less of it.  Also,
   we have talked a number of times before about incorporating Snappy or
   LZ4, which I'm guessing would save a fair amount of CPU -- but the
   decision was made to leave that out of the first version, and just use
   pg_lz, to keep the initial patch simple.  I think that was a good
   decision.
  
  Well, the larger question is why wouldn't we just have the user compress
  the entire WAL file before archiving --- why have each backend do it? 
  Is it the write volume we are saving?  I though this WAL compression
  gave better performance in some cases.
 
 Err. Streaming?

Well, you can already set up SSL for compression while streaming.  In
fact, I assume many are already using SSL for streaming as the majority
of SSL overhead is from connection start.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 09:24:27 -0500, Bruce Momjian wrote:
 On Fri, Dec 12, 2014 at 03:22:24PM +0100, Andres Freund wrote:
   Well, the larger question is why wouldn't we just have the user compress
   the entire WAL file before archiving --- why have each backend do it? 
   Is it the write volume we are saving?  I though this WAL compression
   gave better performance in some cases.
  
  Err. Streaming?
 
 Well, you can already set up SSL for compression while streaming.  In
 fact, I assume many are already using SSL for streaming as the majority
 of SSL overhead is from connection start.

That's not really true. The overhead of SSL during streaming is
*significant*. Both the kind of compression it does (which is far more
expensive than pglz or lz4) and the encyrption itself. In many cases
it's prohibitively expensive - there's even a fair number on-list
reports about this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Rahila Syed

Hello,

Well, the larger question is why wouldn't we just have the user compress
the entire WAL file before archiving --- why have each backend do it?
Is it the write volume we are saving?

IIUC,  the idea here is to not only save the on disk size of WAL but to
reduce the overhead of flushing WAL records to disk in servers with heavy
write operations. So yes improving the performance by saving write volume
is a part of the requirement.

Thank you,
Rahila Syed


On Fri, Dec 12, 2014 at 7:48 PM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Dec 12, 2014 at 08:27:59AM -0500, Robert Haas wrote:
  On Thu, Dec 11, 2014 at 11:34 AM, Bruce Momjian br...@momjian.us
 wrote:
   compression = 'on'  : 1838 secs
   = 'off' : 1701 secs
  
   Different is around 140 secs.
  
   OK, so the compression took 2x the cpu and was 8% slower.  The only
   benefit is WAL files are 35% smaller?
 
  Compression didn't take 2x the CPU.  It increased user CPU from 354.20
  s to 562.67 s over the course of the run, so it took about 60% more
  CPU.
 
  But I wouldn't be too discouraged by that.  At least AIUI, there are
  quite a number of users for whom WAL volume is a serious challenge,
  and they might be willing to pay that price to have less of it.  Also,
  we have talked a number of times before about incorporating Snappy or
  LZ4, which I'm guessing would save a fair amount of CPU -- but the
  decision was made to leave that out of the first version, and just use
  pg_lz, to keep the initial patch simple.  I think that was a good
  decision.

 Well, the larger question is why wouldn't we just have the user compress
 the entire WAL file before archiving --- why have each backend do it?
 Is it the write volume we are saving?  I though this WAL compression
 gave better performance in some cases.

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + Everyone has their own god. +

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Bruce Momjian

On Fri, Dec 12, 2014 at 03:27:33PM +0100, Andres Freund wrote:
 On 2014-12-12 09:24:27 -0500, Bruce Momjian wrote:
  On Fri, Dec 12, 2014 at 03:22:24PM +0100, Andres Freund wrote:
Well, the larger question is why wouldn't we just have the user compress
the entire WAL file before archiving --- why have each backend do it? 
Is it the write volume we are saving?  I though this WAL compression
gave better performance in some cases.
   
   Err. Streaming?
  
  Well, you can already set up SSL for compression while streaming.  In
  fact, I assume many are already using SSL for streaming as the majority
  of SSL overhead is from connection start.
 
 That's not really true. The overhead of SSL during streaming is
 *significant*. Both the kind of compression it does (which is far more
 expensive than pglz or lz4) and the encyrption itself. In many cases
 it's prohibitively expensive - there's even a fair number on-list
 reports about this.

Well, I am just trying to understand when someone would benefit from WAL
compression.  Are we saying it is only useful for non-SSL streaming?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Michael Paquier

On Wed, Dec 10, 2014 at 11:25 PM, Bruce Momjian br...@momjian.us wrote:

 On Wed, Dec 10, 2014 at 07:40:46PM +0530, Rahila Syed wrote:
  The tests ran for around 30 mins.Manual checkpoint was run before each
 test.
 
  Compression   WAL generated%compressionLatency-avg   CPU usage
  (seconds)  TPS
  Latency
  stddev
 
 
  on  1531.4 MB  ~35 %  7.351 ms

user diff: 562.67s system diff: 41.40s  135.96

13.759 ms
 
 
  off  2373.1 MB 6.781
 ms
user diff: 354.20s  system diff: 39.67s147.40

14.152 ms
 
  The compression obtained is quite high close to 35 %.
  CPU usage at user level when compression is on is quite noticeably high
 as
  compared to that when compression is off. But gain in terms of reduction
 of WAL
  is also high.

 I am sorry but I can't understand the above results due to wrapping.
 Are you saying compression was twice as slow?


I got curious to see how the compression of an entire record would perform
and how it compares for small WAL records, and here are some numbers based
on the patch attached, this patch compresses the whole record including the
block headers, letting only XLogRecord out of it with a flag indicating
that the record is compressed (note that this patch contains a portion for
replay untested, still this patch gives an idea on how much compression of
the whole record affects user CPU in this test case). It uses a buffer of 4
* BLCKSZ, if the record is longer than that compression is simply given up.
Those tests are using the hack upthread calculating user and system CPU
using getrusage() when a backend.

Here is the simple test case I used with 512MB of shared_buffers and small
records, filling up a bunch of buffers, dirtying them and them compressing
FPWs with a checkpoint.
#!/bin/bash
psql EOF
SELECT pg_backend_pid();
CREATE TABLE aa (a int);
CREATE TABLE results (phase text, position pg_lsn);
CREATE EXTENSION IF NOT EXISTS pg_prewarm;
ALTER TABLE aa SET (FILLFACTOR = 50);
INSERT INTO results VALUES ('pre-insert', pg_current_xlog_location());
INSERT INTO aa VALUES (generate_series(1,700)); -- 484MB
SELECT pg_size_pretty(pg_relation_size('aa'::regclass));
SELECT pg_prewarm('aa'::regclass);
CHECKPOINT;
INSERT INTO results VALUES ('pre-update', pg_current_xlog_location());
UPDATE aa SET a = 700 + a;
CHECKPOINT;
INSERT INTO results VALUES ('post-update', pg_current_xlog_location());
SELECT * FROM results;
EOF

Note that autovacuum and fsync are off.
=# select phase, user_diff, system_diff,
pg_size_pretty(pre_update - pre_insert),
pg_size_pretty(post_update - pre_update) from results;
   phase| user_diff | system_diff | pg_size_pretty |
pg_size_pretty
+---+-++
 Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB
 No compression | 25.688731 |1.236551 | 429 MB | 727 MB
 Compression record | 56.376750 |0.769603 | 429 MB | 566 MB
(3 rows)
If we do record-level compression, we'll need to be very careful in
defining a lower-bound to not eat unnecessary CPU resources, perhaps
something that should be controlled with a GUC. I presume that this stands
true as well for the upper bound.

Regards,
-- 
Michael
From f1579d37a9f293d7cc911ea048b68d3270b2cdf5 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Wed, 10 Dec 2014 22:10:16 +0900
Subject: [PATCH] Prototype to support record-level compression

This will be enough for tests with compression.
---
 src/backend/access/transam/xlog.c   |  1 +
 src/backend/access/transam/xloginsert.c | 64 +
 src/backend/access/transam/xlogreader.c | 17 +
 src/backend/utils/misc/guc.c| 10 ++
 src/include/access/xlog.h   |  1 +
 src/include/access/xlogrecord.h |  5 +++
 6 files changed, 98 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0f09add..a0e15be 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -88,6 +88,7 @@ char	   *XLogArchiveCommand = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
+bool		wal_compression = false;
 bool		log_checkpoints = false;
 int			sync_method = DEFAULT_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_MINIMAL;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index f3d610f..a395842 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -29,6 +29,7 @@
 #include storage/proc.h
 #include utils/memutils.h
 #include pg_trace.h
+#include utils/pg_lzcompress.h
 
 /*
  * For each block reference registered with XLogRegisterBuffer, we fill in
@@ -56,6 +57,9 @@ static

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 09:46:13 -0500, Bruce Momjian wrote:
 On Fri, Dec 12, 2014 at 03:27:33PM +0100, Andres Freund wrote:
  On 2014-12-12 09:24:27 -0500, Bruce Momjian wrote:
   On Fri, Dec 12, 2014 at 03:22:24PM +0100, Andres Freund wrote:
 Well, the larger question is why wouldn't we just have the user 
 compress
 the entire WAL file before archiving --- why have each backend do it? 
 Is it the write volume we are saving?  I though this WAL compression
 gave better performance in some cases.

Err. Streaming?
   
   Well, you can already set up SSL for compression while streaming.  In
   fact, I assume many are already using SSL for streaming as the majority
   of SSL overhead is from connection start.
  
  That's not really true. The overhead of SSL during streaming is
  *significant*. Both the kind of compression it does (which is far more
  expensive than pglz or lz4) and the encyrption itself. In many cases
  it's prohibitively expensive - there's even a fair number on-list
  reports about this.
 
 Well, I am just trying to understand when someone would benefit from WAL
 compression.  Are we saying it is only useful for non-SSL streaming?

No, not at all. It's useful in a lot more situations:

* The amount of WAL in pg_xlog can make up a significant portion of a
  database's size. Especially in large OLTP databases. Compressing
  archives doesn't help with that.
* The original WAL volume itself can be quite problematic because at
  some point its exhausting the underlying IO subsystem. Both due to the
  pure write rate and to the fsync()s regularly required.
* ssl compression can often not be used for WAL streaming because it's
  too slow as it's uses a much more expensive algorithm. Which is why we
  even have a GUC to disable it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 23:50:43 +0900, Michael Paquier wrote:
 I got curious to see how the compression of an entire record would perform
 and how it compares for small WAL records, and here are some numbers based
 on the patch attached, this patch compresses the whole record including the
 block headers, letting only XLogRecord out of it with a flag indicating
 that the record is compressed (note that this patch contains a portion for
 replay untested, still this patch gives an idea on how much compression of
 the whole record affects user CPU in this test case). It uses a buffer of 4
 * BLCKSZ, if the record is longer than that compression is simply given up.
 Those tests are using the hack upthread calculating user and system CPU
 using getrusage() when a backend.
 
 Here is the simple test case I used with 512MB of shared_buffers and small
 records, filling up a bunch of buffers, dirtying them and them compressing
 FPWs with a checkpoint.
 #!/bin/bash
 psql EOF
 SELECT pg_backend_pid();
 CREATE TABLE aa (a int);
 CREATE TABLE results (phase text, position pg_lsn);
 CREATE EXTENSION IF NOT EXISTS pg_prewarm;
 ALTER TABLE aa SET (FILLFACTOR = 50);
 INSERT INTO results VALUES ('pre-insert', pg_current_xlog_location());
 INSERT INTO aa VALUES (generate_series(1,700)); -- 484MB
 SELECT pg_size_pretty(pg_relation_size('aa'::regclass));
 SELECT pg_prewarm('aa'::regclass);
 CHECKPOINT;
 INSERT INTO results VALUES ('pre-update', pg_current_xlog_location());
 UPDATE aa SET a = 700 + a;
 CHECKPOINT;
 INSERT INTO results VALUES ('post-update', pg_current_xlog_location());
 SELECT * FROM results;
 EOF
 
 Note that autovacuum and fsync are off.
 =# select phase, user_diff, system_diff,
 pg_size_pretty(pre_update - pre_insert),
 pg_size_pretty(post_update - pre_update) from results;
phase| user_diff | system_diff | pg_size_pretty |
 pg_size_pretty
 +---+-++
  Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB
  No compression | 25.688731 |1.236551 | 429 MB | 727 MB
  Compression record | 56.376750 |0.769603 | 429 MB | 566 MB
 (3 rows)
 If we do record-level compression, we'll need to be very careful in
 defining a lower-bound to not eat unnecessary CPU resources, perhaps
 something that should be controlled with a GUC. I presume that this stands
 true as well for the upper bound.

Record level compression pretty obviously would need a lower boundary
for when to use compression. It won't be useful for small heapam/btree
records, but it'll be rather useful for large multi_insert, clean or
similar records...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Robert Haas

On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund and...@anarazel.de wrote:
 Note that autovacuum and fsync are off.
 =# select phase, user_diff, system_diff,
 pg_size_pretty(pre_update - pre_insert),
 pg_size_pretty(post_update - pre_update) from results;
phase| user_diff | system_diff | pg_size_pretty |
 pg_size_pretty
 +---+-++
  Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB
  No compression | 25.688731 |1.236551 | 429 MB | 727 MB
  Compression record | 56.376750 |0.769603 | 429 MB | 566 MB
 (3 rows)
 If we do record-level compression, we'll need to be very careful in
 defining a lower-bound to not eat unnecessary CPU resources, perhaps
 something that should be controlled with a GUC. I presume that this stands
 true as well for the upper bound.

 Record level compression pretty obviously would need a lower boundary
 for when to use compression. It won't be useful for small heapam/btree
 records, but it'll be rather useful for large multi_insert, clean or
 similar records...

Unless I'm missing something, this test is showing that FPW
compression saves 298MB of WAL for 17.3 seconds of CPU time, as
against master.  And compressing the whole record saves a further 1MB
of WAL for a further 13.39 seconds of CPU time.  That makes
compressing the whole record sound like a pretty terrible idea - even
if you get more benefit by reducing the lower boundary, you're still
burning a ton of extra CPU time for almost no gain on the larger
records.  Ouch!

(Of course, I'm assuming that Michael's patch is reasonably efficient,
which might not be true.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
 Unless I'm missing something, this test is showing that FPW
 compression saves 298MB of WAL for 17.3 seconds of CPU time, as
 against master.  And compressing the whole record saves a further 1MB
 of WAL for a further 13.39 seconds of CPU time.  That makes
 compressing the whole record sound like a pretty terrible idea - even
 if you get more benefit by reducing the lower boundary, you're still
 burning a ton of extra CPU time for almost no gain on the larger
 records.  Ouch!

Well, that test pretty much doesn't have any large records besides FPWs
afaics. So it's unsurprising that it's not beneficial.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Robert Haas

On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund and...@anarazel.de wrote:
 On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
 Unless I'm missing something, this test is showing that FPW
 compression saves 298MB of WAL for 17.3 seconds of CPU time, as
 against master.  And compressing the whole record saves a further 1MB
 of WAL for a further 13.39 seconds of CPU time.  That makes
 compressing the whole record sound like a pretty terrible idea - even
 if you get more benefit by reducing the lower boundary, you're still
 burning a ton of extra CPU time for almost no gain on the larger
 records.  Ouch!

 Well, that test pretty much doesn't have any large records besides FPWs
 afaics. So it's unsurprising that it's not beneficial.

Not beneficial is rather an understatement.  It's actively harmful,
and not by a small margin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 11:15:46 -0500, Robert Haas wrote:
 On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund and...@anarazel.de wrote:
  On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
  Unless I'm missing something, this test is showing that FPW
  compression saves 298MB of WAL for 17.3 seconds of CPU time, as
  against master.  And compressing the whole record saves a further 1MB
  of WAL for a further 13.39 seconds of CPU time.  That makes
  compressing the whole record sound like a pretty terrible idea - even
  if you get more benefit by reducing the lower boundary, you're still
  burning a ton of extra CPU time for almost no gain on the larger
  records.  Ouch!
 
  Well, that test pretty much doesn't have any large records besides FPWs
  afaics. So it's unsurprising that it's not beneficial.
 
 Not beneficial is rather an understatement.  It's actively harmful,
 and not by a small margin.

Sure, but that's just because it's too simplistic. I don't think it
makes sense to make any inference about the worthyness of the general
approach from the, nearly obvious, fact that compressing every tiny
record is a bad idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Bruce Momjian

On Fri, Dec 12, 2014 at 05:19:42PM +0100, Andres Freund wrote:
 On 2014-12-12 11:15:46 -0500, Robert Haas wrote:
  On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund and...@anarazel.de wrote:
   On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
   Unless I'm missing something, this test is showing that FPW
   compression saves 298MB of WAL for 17.3 seconds of CPU time, as
   against master.  And compressing the whole record saves a further 1MB
   of WAL for a further 13.39 seconds of CPU time.  That makes
   compressing the whole record sound like a pretty terrible idea - even
   if you get more benefit by reducing the lower boundary, you're still
   burning a ton of extra CPU time for almost no gain on the larger
   records.  Ouch!
  
   Well, that test pretty much doesn't have any large records besides FPWs
   afaics. So it's unsurprising that it's not beneficial.
  
  Not beneficial is rather an understatement.  It's actively harmful,
  and not by a small margin.
 
 Sure, but that's just because it's too simplistic. I don't think it
 makes sense to make any inference about the worthyness of the general
 approach from the, nearly obvious, fact that compressing every tiny
 record is a bad idea.

Well, it seems we need to see some actual cases where compression does
help before moving forward.  I thought Amit had some amazing numbers for
WAL compression --- has that changed?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Simon Riggs

On 12 December 2014 at 18:04, Bruce Momjian br...@momjian.us wrote:

 Well, it seems we need to see some actual cases where compression does
 help before moving forward.  I thought Amit had some amazing numbers for
 WAL compression --- has that changed?

For background processes, like VACUUM, then WAL compression will be
helpful. The numbers show that only applies to FPWs.

I remain concerned about the cost in foreground processes, especially
since the cost will be paid immediately after checkpoint, making our
spikes worse.

What I don't understand is why we aren't working on double buffering,
since that cost would be paid in a background process and would be
evenly spread out across a checkpoint. Plus we'd be able to remove
FPWs altogether, which is like 100% compression.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Robert Haas

On Fri, Dec 12, 2014 at 1:51 PM, Simon Riggs si...@2ndquadrant.com wrote:
 What I don't understand is why we aren't working on double buffering,
 since that cost would be paid in a background process and would be
 evenly spread out across a checkpoint. Plus we'd be able to remove
 FPWs altogether, which is like 100% compression.

The previous patch to implement that - by somebody at vmware - was an
epic fail.  I'm not opposed to seeing somebody try again, but it's a
tricky problem.  When the double buffer fills up, then you've got to
finish flushing the pages whose images are stored in the buffer to
disk before you can overwrite it, which acts like a kind of
mini-checkpoint.  That problem might be solvable, but let's use this
thread to discuss this patch, not some other patch that someone might
have chosen to write but didn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Michael Paquier

On Sat, Dec 13, 2014 at 1:08 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund and...@anarazel.de wrote:
 Note that autovacuum and fsync are off.
 =# select phase, user_diff, system_diff,
 pg_size_pretty(pre_update - pre_insert),
 pg_size_pretty(post_update - pre_update) from results;
phase| user_diff | system_diff | pg_size_pretty |
 pg_size_pretty
 +---+-++
  Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB
  No compression | 25.688731 |1.236551 | 429 MB | 727 MB
  Compression record | 56.376750 |0.769603 | 429 MB | 566 MB
 (3 rows)
 If we do record-level compression, we'll need to be very careful in
 defining a lower-bound to not eat unnecessary CPU resources, perhaps
 something that should be controlled with a GUC. I presume that this stands
 true as well for the upper bound.

 Record level compression pretty obviously would need a lower boundary
 for when to use compression. It won't be useful for small heapam/btree
 records, but it'll be rather useful for large multi_insert, clean or
 similar records...

 Unless I'm missing something, this test is showing that FPW
 compression saves 298MB of WAL for 17.3 seconds of CPU time, as
 against master.  And compressing the whole record saves a further 1MB
 of WAL for a further 13.39 seconds of CPU time.  That makes
 compressing the whole record sound like a pretty terrible idea - even
 if you get more benefit by reducing the lower boundary, you're still
 burning a ton of extra CPU time for almost no gain on the larger
 records.  Ouch!

 (Of course, I'm assuming that Michael's patch is reasonably efficient,
 which might not be true.)
Note that I was curious about the worst-case ever, aka how much CPU
pg_lzcompress would use if everything is compressed, even the smallest
records. So we'll surely need a lower-bound. I think that doing some
tests with a lower bound set as a multiple of SizeOfXLogRecord would
be fine, but in this case what we'll see is a result similar to what
FPW compression does.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Claudio Freire

On Fri, Dec 12, 2014 at 7:25 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Sat, Dec 13, 2014 at 1:08 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund and...@anarazel.de wrote:
 Note that autovacuum and fsync are off.
 =# select phase, user_diff, system_diff,
 pg_size_pretty(pre_update - pre_insert),
 pg_size_pretty(post_update - pre_update) from results;
phase| user_diff | system_diff | pg_size_pretty |
 pg_size_pretty
 +---+-++
  Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB
  No compression | 25.688731 |1.236551 | 429 MB | 727 MB
  Compression record | 56.376750 |0.769603 | 429 MB | 566 MB
 (3 rows)
 If we do record-level compression, we'll need to be very careful in
 defining a lower-bound to not eat unnecessary CPU resources, perhaps
 something that should be controlled with a GUC. I presume that this stands
 true as well for the upper bound.

 Record level compression pretty obviously would need a lower boundary
 for when to use compression. It won't be useful for small heapam/btree
 records, but it'll be rather useful for large multi_insert, clean or
 similar records...

 Unless I'm missing something, this test is showing that FPW
 compression saves 298MB of WAL for 17.3 seconds of CPU time, as
 against master.  And compressing the whole record saves a further 1MB
 of WAL for a further 13.39 seconds of CPU time.  That makes
 compressing the whole record sound like a pretty terrible idea - even
 if you get more benefit by reducing the lower boundary, you're still
 burning a ton of extra CPU time for almost no gain on the larger
 records.  Ouch!

 (Of course, I'm assuming that Michael's patch is reasonably efficient,
 which might not be true.)
 Note that I was curious about the worst-case ever, aka how much CPU
 pg_lzcompress would use if everything is compressed, even the smallest
 records. So we'll surely need a lower-bound. I think that doing some
 tests with a lower bound set as a multiple of SizeOfXLogRecord would
 be fine, but in this case what we'll see is a result similar to what
 FPW compression does.


In general, lz4 (and pg_lz is similar to lz4) compresses very poorly
anything below about 128b in length. Of course there are outliers,
with some very compressible stuff, but with regular text or JSON data,
it's quite unlikely to compress at all with smaller input. Compression
is modest up to about 1k when it starts to really pay off.

That's at least my experience with lots JSON-ish, text-ish and CSV
data sets, compressible but not so much in small bits.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 >

1 - 100 of 240 matches

Mail list logo