Re: [Proposal] Page Compression for OLTP

2023-01-31 Thread vignesh C
On Fri, 4 Nov 2022 at 07:02, Ian Lawrence Barwick  wrote:
>
> 2022年7月27日(水) 2:47 chenhj :
> >
> > Hi hackers,
> >
> > I have rebase this patch and made some improvements.
> >
> >
> > 1. A header is added to each chunk in the pcd file, which records the chunk 
> > of which block the chunk belongs to, and the checksum of the chunk.
> >
> >   Accordingly, all pages in a compressed relation are stored in compressed 
> > format, even if the compressed page is larger than BLCKSZ.
> >
> >   The maximum space occupied by a compressed page is BLCKSZ + chunk_size 
> > (exceeding this range will report an error when writing the page).
> >
> > 2. Repair the pca file through the information recorded in the pcd when 
> > recovering from a crash
> >
> > 3. For compressed relation, do not release the free blocks at the end of 
> > the relation (just like what old_snapshot_threshold does), reducing the 
> > risk of data inconsistency between pcd and pca file.
> >
> > 4. During backup, only check the checksum in the chunk header for the pcd 
> > file, and avoid assembling and decompressing chunks into the original page.
> >
> > 5. bugfix, doc, code style and so on
> >
> >
> > And see src/backend/storage/smgr/README.compression for detail
> >
> >
> > Other
> >
> > 1. remove support of default compression option in tablespace, I'm not sure 
> > about the necessity of this feature, so don't support it for now.
> >
> > 2. pg_rewind currently does not support copying only changed blocks from 
> > pcd file. This feature is relatively independent and could be implemented 
> > later.
>
> Hi
>
> cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
> currently underway, this would be an excellent time to update the patch.

There has been no updates on this thread for some time, so this has
been switched as Returned with Feedback. Feel free to open it in the
next commitfest if you plan to continue on this.

Regards,
Vignesh




Re: [Proposal] Page Compression for OLTP

2022-11-03 Thread Ian Lawrence Barwick
2022年7月27日(水) 2:47 chenhj :
>
> Hi hackers,
>
> I have rebase this patch and made some improvements.
>
>
> 1. A header is added to each chunk in the pcd file, which records the chunk 
> of which block the chunk belongs to, and the checksum of the chunk.
>
>   Accordingly, all pages in a compressed relation are stored in compressed 
> format, even if the compressed page is larger than BLCKSZ.
>
>   The maximum space occupied by a compressed page is BLCKSZ + chunk_size 
> (exceeding this range will report an error when writing the page).
>
> 2. Repair the pca file through the information recorded in the pcd when 
> recovering from a crash
>
> 3. For compressed relation, do not release the free blocks at the end of the 
> relation (just like what old_snapshot_threshold does), reducing the risk of 
> data inconsistency between pcd and pca file.
>
> 4. During backup, only check the checksum in the chunk header for the pcd 
> file, and avoid assembling and decompressing chunks into the original page.
>
> 5. bugfix, doc, code style and so on
>
>
> And see src/backend/storage/smgr/README.compression for detail
>
>
> Other
>
> 1. remove support of default compression option in tablespace, I'm not sure 
> about the necessity of this feature, so don't support it for now.
>
> 2. pg_rewind currently does not support copying only changed blocks from pcd 
> file. This feature is relatively independent and could be implemented later.

Hi

cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Thanks

Ian Barwick




Re: [Proposal] Page Compression for OLTP

2021-02-18 Thread David Fetter
On Tue, Feb 16, 2021 at 11:15:36PM +0800, chenhj wrote:
> At 2021-02-16 21:51:14, "Daniel Gustafsson"  wrote:
> 
> >> On 16 Feb 2021, at 15:45, chenhj  wrote:
> >
> >> I want to know whether this patch can be accepted by the community, that 
> >> is, whether it is necessary for me to continue working for this Patch. 
> >> If you have any suggestions, please feedback to me.
> >
> >It doesn't seem like the patch has been registered in the commitfest app so 
> >it
> >may have been forgotten about, the number of proposed patches often outnumber
> >the code review bandwidth.  Please register it at:
> >
> > https://commitfest.postgresql.org/32/
> >
> >..to make sure it doesn't get lost.
> >
> >--
> 
> >Daniel Gustafssonhttps://vmware.com/
> 
> 
> Thanks, I will complete this patch and registered it later.
> Chen Huajun

The simplest way forward is to register it now so it doesn't miss the
window for the upcoming commitfest (CF), which closes at the end of
this month. That way, everybody has the entire time between now and
the end of the CF to review the patch, work on it, etc, and the CF bot
will be testing it against the changing code base to ensure people
know if such a change causes it to need a rebase.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [Proposal] Page Compression for OLTP

2021-02-16 Thread chenhj
At 2021-02-16 21:51:14, "Daniel Gustafsson"  wrote:

>> On 16 Feb 2021, at 15:45, chenhj  wrote:
>
>> I want to know whether this patch can be accepted by the community, that is, 
>> whether it is necessary for me to continue working for this Patch. 
>> If you have any suggestions, please feedback to me.
>
>It doesn't seem like the patch has been registered in the commitfest app so it
>may have been forgotten about, the number of proposed patches often outnumber
>the code review bandwidth.  Please register it at:
>
>   https://commitfest.postgresql.org/32/
>
>..to make sure it doesn't get lost.
>
>--

>Daniel Gustafsson  https://vmware.com/


Thanks, I will complete this patch and registered it later.
Chen Huajun

Re: [Proposal] Page Compression for OLTP

2021-02-16 Thread Daniel Gustafsson
> On 16 Feb 2021, at 15:45, chenhj  wrote:

> I want to know whether this patch can be accepted by the community, that is, 
> whether it is necessary for me to continue working for this Patch. 
> If you have any suggestions, please feedback to me.

It doesn't seem like the patch has been registered in the commitfest app so it
may have been forgotten about, the number of proposed patches often outnumber
the code review bandwidth.  Please register it at:

https://commitfest.postgresql.org/32/

..to make sure it doesn't get lost.

--
Daniel Gustafsson   https://vmware.com/





Re: [Proposal] Page Compression for OLTP

2021-02-16 Thread chenhj
Hi, hackers


I want to know whether this patch can be accepted by the community, that is, 
whether it is necessary for me to continue working for this Patch. 
If you have any suggestions, please feedback to me.


Best Regards
Chen Huajun



Re: [Proposal] Page Compression for OLTP

2020-06-05 Thread chenhj
Hi hackers,


> # Page storage(Plan C)
>
> Further, since the size of the compress address file is fixed, the above 
> address file and data file can also be combined into one file
>
> 0   1   2 1230710 1 2
> +===+===+===+ +===+=+=+
> | head  |  1|2  | ... |   | data1   | data2   | ...  
> +===+===+===+ +===+=+=+
>   head  |  address|  data  |

I made a prototype according to the above storage method. Any suggestions are 
welcome.

# Page compress file storage related definitions

/*
* layout of Page Compress file:
*
* - PageCompressHeader
* - PageCompressAddr[]
* - chuncks of PageCompressData
*
*/
typedef struct PageCompressHeader
{
 pg_atomic_uint32 nblocks; /* number of total blocks in this 
segment */
 pg_atomic_uint32 allocated_chunks; /* number of total allocated 
chunks in data area */
 uint16chunk_size; /* size of each chunk, must be 
1/2 1/4 or 1/8 of BLCKSZ */
 uint8algorithm; /* compress algorithm, 1=pglz, 
2=lz4 */
} PageCompressHeader;

typedef struct PageCompressAddr
{
 uint8nchunks;   /* number of chunks for 
this block */
 uint8allocated_chunks; /* number of allocated 
chunks for this block */

 /* variable-length fields, 1 based chunk no array for this block, size of 
the array must be 2, 4 or 8 */
 pc_chunk_number_t chunknos[FLEXIBLE_ARRAY_MEMBER];
} PageCompressAddr;

typedef struct PageCompressData
{
 char page_header[SizeOfPageHeaderData]; /* page header */
 uint32 size;/* size of 
compressed data */
 char data[FLEXIBLE_ARRAY_MEMBER];  /* compressed page, except 
for the page header */
} PageCompressData;


# Usage

Set whether to use compression through storage parameters of tables and indexes

- compress_type
 Set whether to compress and the compression algorithm used, supported values: 
none, pglz, zstd

- compress_chunk_size

 Chunk is the smallest unit of storage space allocated for compressed pages.
 The size of the chunk can only be 1/2, 1/4 or 1/8 of BLCKSZ

- compress_prealloc_chunks

  The number of chunks pre-allocated for each page. The maximum value allowed 
is: BLCKSZ/compress_chunk_size -1.
  If the number of chunks required for a compressed page is less than 
`compress_prealloc_chunks`,
  It allocates `compress_prealloc_chunks` chunks to avoid future storage 
fragmentation when the page needs more storage space.


# Sample

## requirement

- zstd

## build

./configure --with-zstd
make
make install

## create compressed table and index

create table tb1(id int,c1 text);
create table tb1_zstd(id int,c1 text) 
with(compress_type=zstd,compress_chunk_size=1024);
create table tb1_zstd_4(id int,c1 text) 
with(compress_type=zstd,compress_chunk_size=1024,compress_prealloc_chunks=4);

create index tb1_idx_id on tb1(id);
create index tb1_idx_id_zstd on tb1(id) 
with(compress_type=zstd,compress_chunk_size=1024);
create index tb1_idx_id_zstd_4 on tb1(id) 
with(compress_type=zstd,compress_chunk_size=1024,compress_prealloc_chunks=4);

create index tb1_idx_c1 on tb1(c1);
create index tb1_idx_c1_zstd on tb1(c1) 
with(compress_type=zstd,compress_chunk_size=1024);
create index tb1_idx_c1_zstd_4 on tb1(c1) 
with(compress_type=zstd,compress_chunk_size=1024,compress_prealloc_chunks=4);

insert into tb1 select generate_series(1,100),md5(random()::text);
insert into tb1_zstd select generate_series(1,100),md5(random()::text);
insert into tb1_zstd_4 select generate_series(1,100),md5(random()::text);

## show size of table and index

postgres=# \d+
List of relations
Schema |Name| Type  |  Owner   | Persistence | Size  | Description
++---+--+-+---+-
public | tb1| table | postgres | permanent   | 65 MB |
public | tb1_zstd   | table | postgres | permanent   | 37 MB |
public | tb1_zstd_4 | table | postgres | permanent   | 37 MB |
(3 rows)

postgres=# \di+
List of relations
Schema |   Name| Type  |  Owner   | Table | Persistence | Size  | 
Description
+---+---+--+---+-+---+-
public | tb1_idx_c1| index | postgres | tb1   | permanent   | 73 MB |
public | tb1_idx_c1_zstd   | index | postgres | tb1   | permanent   | 36 MB |
public | tb1_idx_c1_zstd_4 | index | postgres | tb1   | permanent   | 41 MB |
public | tb1_idx_id| index | postgres | tb1   | permanent   | 21 MB |
public | tb1_idx_id_zstd   | index | postgres | tb1   | permanent   | 13 MB |
public | tb1_idx_id_zstd_4 | index | postgres | tb1   | permanent   | 15 MB |
(6 rows)


# pgbench performance testing(TPC-B)


Re: [Proposal] Page Compression for OLTP

2020-05-22 Thread chenhj
Sorry, There may be a problem with the display format of the previous mail. So 
resend it


At 2020-05-21 15:04:55, "Fabien COELHO"  wrote:

>
>Hello,
>
>My 0.02, some of which may just show some misunderstanding on my part:
>
>  - Could this be proposed as some kind of extension, provided that enough
>hooks are available? ISTM that foreign tables and/or alternative
>storage engine (aka ACCESS METHOD) provide convenient APIs which could
>fit the need for these? Or are they not appropriate? You seem to
>suggest that there are not.
>
>If not, what could be done to improve API to allow what you are seeking
>to do? Maybe you need a somehow lower-level programmable API which does
>not exist already, or at least is not exported already, but could be
>specified and implemented with limited effort? Basically you would like
>to read/write pg pages to somewhere, and then there is the syncing
>issue to consider. Maybe such a "page storage" API could provide
>benefit for some specialized hardware, eg persistent memory stores,
>so there would be more reason to define it anyway? I think it might
>be valuable to give it some thoughts.

Thank you for giving so many comments.
In my opinion, developing a foreign table or a new storage engine, in addition 
to compression, also needs to do a lot of extra things.
A similar explanation was mentioned in Nikolay P's email.

The "page storage" API may be a good choice, and I will consider it, but I have 
not yet figured out how to implement it.

>  - Could you maybe elaborate on how your plan differs from [4] and [5]?

My solution is similar to CFS, and it is also embedded in the file access layer 
(fd.c, md.c) to realize the mapping from block number to the corresponding file 
and location where compressed data is stored.

However, the most important difference is that I hope to avoid the need for GC 
through the design of the page layout.

https://www.postgresql.org/message-id/flat/11996861554042351%40iva4-dd95b404a60b.qloud-c.yandex.net

>> The most difficult thing in CFS development is certainly
>> defragmentation. In CFS it is done using background garbage collection,
>> by one or one
>> GC worker processes. The main challenges were to minimize its
>> interaction with normal work of the system, make it fault tolerant and
>> prevent unlimited growth of data segments.

>> CFS is not introducing its own storage manager, it is mostly embedded in
>> existed Postgres file access layer (fd.c, md.c). It allows to reused
>> code responsible for mapping relations and file descriptors cache. As it
>> was recently discussed in hackers, it may be good idea to separate the
>> questions "how to map blocks to filenames and offsets" and "how to
>> actually perform IO". In this it will be easier to implement compressed
>> storage manager.


>  - Have you consider keeping page headers and compressing tuple data
>only?

In that case, we must add some additional information in the page header to 
identify whether this is a compressed page or an uncompressed page.
When a compressed page becomes an uncompressed page, or vice versa, an 
uncompressed page becomes a compressed page, the original page header must be 
modified.
This is unacceptable because it requires modifying the shared buffer and 
recalculating the checksum.

However, it should be feasible to put this flag in the compressed address file.
The problem with this is that even if a page only occupies the size of one 
compressed block, the address file needs to be read, that is, from 1 IO to 2 IO.
Since the address file is very small, it is basically a memory access, this 
cost may not be as large as I had imagined.

>  - I'm not sure there is a point in going below the underlying file
>system blocksize, quite often 4 KiB? Or maybe yes? Or is there
>a benefit to aim at 1/4 even if most pages overflow?

My solution is mainly optimized for scenarios where the original page can be 
compressed to only require one compressed block of storage.
The scene where the original page is stored in multiple compressed blocks is 
suitable for scenarios that are not particularly sensitive to performance, but 
are more concerned about the compression rate, such as cold data.

In addition, users can also choose to compile PostgreSQL with 16KB or 32KB 
BLOCKSZ.

>  - ISTM that your approach entails 3 "files". Could it be done with 2?
>I'd suggest that the possible overflow pointers (coa) could be part of
>the headers so that when reading the 3.1 page, then the header would
>tell where to find the overflow 3.2, without requiring an additional
>independent structure with very small data in it, most of it zeros.
>Possibly this is not possible, because it would require some available
>space in standard headers when the is page is not compressible, and
>there is not enough. 

Re: [Proposal] Page Compression for OLTP

2020-05-21 Thread chenhj
At 2020-05-21 15:04:55, "Fabien COELHO"  wrote: > >Hello, 
> >My 0.02, some of which may just show some misunderstanding on my part: > > 
- Could this be proposed as some kind of extension, provided that enough > 
hooks are available? ISTM that foreign tables and/or alternative > storage 
engine (aka ACCESS METHOD) provide convenient APIs which could > fit the need 
for these? Or are they not appropriate? You seem to > suggest that there are 
not. > > If not, what could be done to improve API to allow what you are 
seeking > to do? Maybe you need a somehow lower-level programmable API which 
does > not exist already, or at least is not exported already, but could be > 
specified and implemented with limited effort? Basically you would like > to 
read/write pg pages to somewhere, and then there is the syncing > issue to 
consider. Maybe such a "page storage" API could provide > benefit for some 
specialized hardware, eg persistent memory stores, > so there would be more 
reason to define it anyway? I think it might > be valuable to give it some 
thoughts. Thank you for giving so many comments. In my opinion, developing a 
foreign table or a new storage engine, in addition to compression, also needs 
to do a lot of extra things. A similar explanation was mentioned in Nikolay P's 
email. The "page storage" API may be a good choice, and I will consider it, but 
I have not yet figured out how to implement it. > - Could you maybe elaborate 
on how your plan differs from [4] and [5]? My solution is similar to CFS, and 
it is also embedded in the file access layer (fd.c, md.c) to realize the 
mapping from block number to the corresponding file and location where 
compressed data is stored. However, the most important difference is that I 
hope to avoid the need for GC through the design of the page layout. 
https://www.postgresql.org/message-id/flat/11996861554042351%40iva4-dd95b404a60b.qloud-c.yandex.net
 >> The most difficult thing in CFS development is certainly >> 
defragmentation. In CFS it is done using background garbage collection, >> by 
one or one >> GC worker processes. The main challenges were to minimize its >> 
interaction with normal work of the system, make it fault tolerant and >> 
prevent unlimited growth of data segments. >> CFS is not introducing its own 
storage manager, it is mostly embedded in >> existed Postgres file access layer 
(fd.c, md.c). It allows to reused >> code responsible for mapping relations and 
file descriptors cache. As it >> was recently discussed in hackers, it may be 
good idea to separate the >> questions "how to map blocks to filenames and 
offsets" and "how to >> actually perform IO". In this it will be easier to 
implement compressed >> storage manager. > - Have you consider keeping page 
headers and compressing tuple data > only? In that case, we must add some 
additional information in the page header to identify whether this is a 
compressed page or an uncompressed page. When a compressed page becomes an 
uncompressed page, or vice versa, an uncompressed page becomes a compressed 
page, the original page header must be modified. This is unacceptable because 
it requires modifying the shared buffer and recalculating the checksum. 
However, it should be feasible to put this flag in the compressed address file. 
The problem with this is that even if a page only occupies the size of one 
compressed block, the address file needs to be read, that is, from 1 IO to 2 
IO. Since the address file is very small, it is basically a memory access, this 
cost may not be as large as I had imagined. > - I'm not sure there is a point 
in going below the underlying file > system blocksize, quite often 4 KiB? Or 
maybe yes? Or is there > a benefit to aim at 1/4 even if most pages overflow? 
My solution is mainly optimized for scenarios where the original page can be 
compressed to only require one compressed block of storage. The scene where the 
original page is stored in multiple compressed blocks is suitable for scenarios 
that are not particularly sensitive to performance, but are more concerned 
about the compression rate, such as cold data. In addition, users can also 
choose to compile PostgreSQL with 16KB or 32KB BLOCKSZ. > - ISTM that your 
approach entails 3 "files". Could it be done with 2? > I'd suggest that the 
possible overflow pointers (coa) could be part of > the headers so that when 
reading the 3.1 page, then the header would > tell where to find the overflow 
3.2, without requiring an additional > independent structure with very small 
data in it, most of it zeros. > Possibly this is not possible, because it would 
require some available > space in standard headers when the is page is not 
compressible, and > there is not enough. Maybe creating a little room for that 
in > existing headers (4 bytes could be enough?) would be a good compromise. > 
Hmmm. Maybe the approach I suggest would only work for 1/2 compression, > but 
not for other target ratios, but I 

Re: [Proposal] Page Compression for OLTP

2020-05-21 Thread Fabien COELHO


Hello,

My 0.02€, some of which may just show some misunderstanding on my part:

 - you have clearly given quite a few thoughts about the what and how…
   which makes your message an interesting read.

 - Could this be proposed as some kind of extension, provided that enough
   hooks are available? ISTM that foreign tables and/or alternative
   storage engine (aka ACCESS METHOD) provide convenient APIs which could
   fit the need for these? Or are they not appropriate? You seem to
   suggest that there are not.

   If not, what could be done to improve API to allow what you are seeking
   to do? Maybe you need a somehow lower-level programmable API which does
   not exist already, or at least is not exported already, but could be
   specified and implemented with limited effort? Basically you would like
   to read/write pg pages to somewhere, and then there is the syncing
   issue to consider. Maybe such a "page storage" API could provide
   benefit for some specialized hardware, eg persistent memory stores,
   so there would be more reason to define it anyway? I think it might
   be valuable to give it some thoughts.

 - Could you maybe elaborate on how your plan differs from [4] and [5]?

 - Have you consider keeping page headers and compressing tuple data
   only?

 - I'm not sure there is a point in going below the underlying file
   system blocksize, quite often 4 KiB? Or maybe yes? Or is there
   a benefit to aim at 1/4 even if most pages overflow?

 - ISTM that your approach entails 3 "files". Could it be done with 2?
   I'd suggest that the possible overflow pointers (coa) could be part of
   the headers so that when reading the 3.1 page, then the header would
   tell where to find the overflow 3.2, without requiring an additional
   independent structure with very small data in it, most of it zeros.
   Possibly this is not possible, because it would require some available
   space in standard headers when the is page is not compressible, and
   there is not enough. Maybe creating a little room for that in
   existing headers (4 bytes could be enough?) would be a good compromise.
   Hmmm. Maybe the approach I suggest would only work for 1/2 compression,
   but not for other target ratios, but I think it could be made to work
   if the pointer can entail several blocks in the overflow table.

 - If one page is split in 3 parts, could it creates problems on syncing,
   if 1/3 or 2/3 pages get written, but maybe that is manageable with WAL
as it would note that the page was not synced and that is enough for
replay.

 - I'm unclear how you would manage the 2 representations of a page in
   memory. I'm afraid that a 8 KiB page compressed to 4 KiB would
   basically take 12 KiB, i.e. reduce the available memory for caching
   purposes. Hmmm. The current status is that a written page probably
   takes 16 KiB, once in shared buffers and once in the system caches,
   so it would be an improvement anyway.

 - Maybe the compressed and overflow table could become bloated somehow,
   which would require the vaccuuming implementation and add to the
   complexity of the implementation?

 - External tools should be available to allow page inspection, eg for
   debugging purposes.

--
Fabien.