Re: [HACKERS] Custom compression methods

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 3:38 PM Justin Pryzby  wrote:
> It looks like this should not remove the word "data" ?

Oh, yes, right.

>  The compression technique used for either in-line or out-of-line compressed
> -data is a fairly simple and very fast member
> -of the LZ family of compression techniques.  See
> -src/common/pg_lzcompress.c for the details.
> +can be selected using the COMPRESSION option on a 
> per-column
> +basis when creating a table. The default for columns with no explicit setting
> +is taken from the value of .
>
> I thought this patch would need to update parts about borrowing 2 spare bits,
> but maybe that's the wrong header..before.

We're not borrowing any more bits from the places where we were
borrowing 2 bits before. We are borrowing 2 bits from places that
don't seem to be discussed in detail here, where no bits were borrowed
before.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-04-08 Thread Justin Pryzby
On Thu, Apr 08, 2021 at 02:58:04PM -0400, Robert Haas wrote:
> On Thu, Apr 8, 2021 at 11:32 AM David Steele  wrote:
> > It looks like this CF entry should have been marked as committed so I
> > did that.
> 
> Thanks.
> 
> Here's a patch for the doc update which was mentioned as an open item 
> upthread.

Thanks too.

It looks like this should not remove the word "data" ?

 The compression technique used for either in-line or out-of-line compressed
-data is a fairly simple and very fast member
-of the LZ family of compression techniques.  See
-src/common/pg_lzcompress.c for the details.
+can be selected using the COMPRESSION option on a per-column
+basis when creating a table. The default for columns with no explicit setting
+is taken from the value of .

I thought this patch would need to update parts about borrowing 2 spare bits,
but maybe that's the wrong header..

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 11:32 AM David Steele  wrote:
> It looks like this CF entry should have been marked as committed so I
> did that.

Thanks.

Here's a patch for the doc update which was mentioned as an open item upthread.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


update-storage-docs.patch
Description: Binary data


Re: [HACKERS] Custom compression methods

2021-04-08 Thread David Steele

On 3/30/21 10:30 AM, Tom Lane wrote:

Robert Haas  writes:

On Wed, Mar 24, 2021 at 2:15 PM Tom Lane  wrote:

But let's ignore the case of pg_upgrade and just consider a dump/restore.
I'd still say that unless you give --no-toast-compression then I would
expect the dump/restore to preserve the tables' old compression behavior.
Robert's argument that the pre-v14 database had no particular compression
behavior seems nonsensical to me.  We know exactly which compression
behavior it has.



I said that it didn't have a state, not that it didn't have a
behavior. That's not exactly the same thing. But I don't want to argue
about it, either. It's a judgement call what's best here, and I don't
pretend to have all the answers. If you're sure you've got it right
... great!


I've not heard any other comments about this, but I'm pretty sure that
preserving a table's old toast behavior is in line with what we'd normally
expect pg_dump to do --- especially in light of the fact that we did not
provide any --preserve-toast-compression switch to tell it to do so.
So I'm going to go change it.


It looks like this CF entry should have been marked as committed so I 
did that.


Regards,
--
-David
da...@pgmasters.net




Re: [HACKERS] Custom compression methods

2021-03-30 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 24, 2021 at 2:15 PM Tom Lane  wrote:
>> But let's ignore the case of pg_upgrade and just consider a dump/restore.
>> I'd still say that unless you give --no-toast-compression then I would
>> expect the dump/restore to preserve the tables' old compression behavior.
>> Robert's argument that the pre-v14 database had no particular compression
>> behavior seems nonsensical to me.  We know exactly which compression
>> behavior it has.

> I said that it didn't have a state, not that it didn't have a
> behavior. That's not exactly the same thing. But I don't want to argue
> about it, either. It's a judgement call what's best here, and I don't
> pretend to have all the answers. If you're sure you've got it right
> ... great!

I've not heard any other comments about this, but I'm pretty sure that
preserving a table's old toast behavior is in line with what we'd normally
expect pg_dump to do --- especially in light of the fact that we did not
provide any --preserve-toast-compression switch to tell it to do so.
So I'm going to go change it.

regards, tom lane




Re: [HACKERS] Custom compression methods (buildfarm xupgrade)

2021-03-28 Thread Justin Pryzby
On Sun, Mar 28, 2021 at 04:48:29PM -0400, Andrew Dunstan wrote:
> Nothing is hidden here - the diffs are reported, see for example
> 
> What we're comparing here is target pg_dumpall against the original
> source vs target pg_dumpall against the upgraded source.

The command being run is:

https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm#L610
system( "diff -I '^-- ' -u $upgrade_loc/origin-$oversion.sql "
  . "$upgrade_loc/converted-$oversion-to-$this_branch.sql "
  . "> $upgrade_loc/dumpdiff-$oversion 2>&1");
...
my $difflines = `wc -l < $upgrade_loc/dumpdiff-$oversion`;

where -I means: --ignore-matching-lines=RE

I think wc -l should actually be grep -c '^[-+]'
otherwise context lines count for as much as diff lines.
You could write that with diff -U0 |wc -l, except the context is useful to
humans.

With some more effort, the number of lines of diff can be very small, allowing
a smaller fudge factor.  

For upgrade from v10:
time make -C src/bin/pg_upgrade check oldsrc=`pwd`/10 
oldbindir=`pwd`/10/tmp_install/usr/local/pgsql/bin

$ diff -u src/bin/pg_upgrade/tmp_check/dump1.sql 
src/bin/pg_upgrade/tmp_check/dump2.sql |wc -l
622

Without context:
$ diff -u src/bin/pg_upgrade/tmp_check/dump1.sql 
src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c '^[-+]'
142

Without comments:
$ diff -I '^-- ' -u src/bin/pg_upgrade/tmp_check/dump1.sql 
src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c '^[-+]'
130

Without SET default stuff:
diff -I '^$' -I "SET default_table_access_method = heap;" -I "^SET 
default_toast_compression = 'pglz';$" -I '^-- ' -u 
/home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql 
/home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump2.sql |less |grep 
-c '^[-+]'
117

Without trigger function call noise:
diff -I "^CREATE TRIGGER [_[:alnum:]]\+ .* FOR EACH \(ROW\|STATEMENT\) EXECUTE 
\(PROCEDURE\|FUNCTION\)" -I '^$' -I "SET default_table_access_method = heap;" 
-I "^SET default_toast_compression = 'pglz';$" -I '^-- ' -u 
/home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql 
/home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c 
'^[-+]'
11

Maybe it's important not to totally ignore that, and instead perhaps clean up
the known/accepted changes like s/FUNCTION/PROCEDURE/:



Re: [HACKERS] Custom compression methods

2021-03-28 Thread Andrew Dunstan


On 3/24/21 12:45 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Wed, Mar 24, 2021 at 11:42 AM Tom Lane  wrote:
>>> On reflection, though, I wonder if we've made pg_dump do the right
>>> thing anyway.  There is a strong case to be made for the idea that
>>> when dumping from a pre-14 server, it should emit
>>> SET default_toast_compression = 'pglz';
>>> rather than omitting any mention of the variable, which is what
>>> I made it do in aa25d1089.
>> But also ... aren't we just doing this to work around a test case that
>> isn't especially good in the first place? Counting the number of lines
>> in the diff between A and B is an extremely crude proxy for "they're
>> similar enough that we probably haven't broken anything."
> I wouldn't be proposing this if the xversion failures were the only
> reason; making them go away is just a nice side-effect.  The core
> point is that the charter of pg_dump is to reproduce the source
> database's state, and as things stand we're failing to ensure we
> do that.
>
> (But yeah, we really need a better way of making this check in
> the xversion tests.  I don't like the arbitrary "n lines of diff
> is probably OK" business one bit.)
>
>   



Well, I ran this module for years privately and used to have a matrix of
the exact number of diff lines expected for each combination of source
and target branch. If I didn't get that exact number of lines I reported
an error on stderr. That was fine when we weren't reporting the results
on the server, and I just sent an email to -hackers if I found an error.
I kept this matrix by examining the diffs to make sure they were all
benign. That was a pretty laborious process. So I decided to try a
heuristic approach instead, and by trial and error came up with this
2000 lines measurement. When this appeared to be working and stable the
module was released into the wild for other buildfarm owners to deploy.

Nothing is hidden here - the diffs are reported, see for example

What we're comparing here is target pg_dumpall against the original
source vs target pg_dumpall against the upgraded source.

If someone wants to come up with a better rule for detecting that
nothing has gone wrong, I'll be happy to implement it. I don't
particularly like the current rule either, it's there faute de mieux.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [HACKERS] Custom compression methods

2021-03-25 Thread Justin Pryzby
On Mon, Mar 22, 2021 at 10:41:33AM -0400, Robert Haas wrote:
> trying to explain how TOAST works here, I added a link. It looks,
> though, like that documentation also needs to be patched for this
> change. I'll look into that, and your remaining patches, next.

I added an Opened Item for any necessary updates to the toast docs.

https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-25 Thread Robert Haas
On Thu, Mar 25, 2021 at 5:44 AM Dilip Kumar  wrote:
> Okay got it.  Fixed as suggested.

Committed with a bit of editing of the comments.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-25 Thread Robert Haas
On Wed, Mar 24, 2021 at 2:15 PM Tom Lane  wrote:
> But let's ignore the case of pg_upgrade and just consider a dump/restore.
> I'd still say that unless you give --no-toast-compression then I would
> expect the dump/restore to preserve the tables' old compression behavior.
> Robert's argument that the pre-v14 database had no particular compression
> behavior seems nonsensical to me.  We know exactly which compression
> behavior it has.

I said that it didn't have a state, not that it didn't have a
behavior. That's not exactly the same thing. But I don't want to argue
about it, either. It's a judgement call what's best here, and I don't
pretend to have all the answers. If you're sure you've got it right
... great!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-25 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 10:57 PM Robert Haas  wrote:
>
> On Wed, Mar 24, 2021 at 12:14 PM Dilip Kumar  wrote:
> > Actually, we are already doing this, I mean ALTER TABLE .. ALTER
> > COLUMN .. SET COMPRESSION is already updating the compression method
> > of the index attribute.  So 0003 doesn't make sense, sorry for the
> > noise.  However, 0001 and 0002 are still valid, or do you think that
> > we don't want 0001 also?  If we don't need 0001 also then we need to
> > update the test output for 0002 slightly.
>
> It seems to me that 0002 is still not right. We can't fix the
> attcompression to whatever the default is at the time the index is
> created, because the default can be changed later, and there's no way
> to fix index afterward. I mean, it would be fine to do it that way if
> we were going to go with the other model, where the index state is
> separate from the table state, either can be changed independently,
> and it all gets dumped and restored. But, as it is, I think we should
> be deciding how to compress new values for an expression column based
> on the default_toast_compression setting at the time of compression,
> not the time of index creation.
>

Okay got it.  Fixed as suggested.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From c478a311b3410d5360946e953e1bfd53bbef7197 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 24 Mar 2021 15:08:14 +0530
Subject: [PATCH v4 1/2] Show compression method in index describe

---
 src/bin/psql/describe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index eeac0ef..9d15324 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1897,6 +1897,7 @@ describeOneTableDetails(const char *schemaname,
 		if (pset.sversion >= 14 &&
 			!pset.hide_compression &&
 			(tableinfo.relkind == RELKIND_RELATION ||
+			 tableinfo.relkind == RELKIND_INDEX ||
 			 tableinfo.relkind == RELKIND_PARTITIONED_TABLE ||
 			 tableinfo.relkind == RELKIND_MATVIEW))
 		{
-- 
1.8.3.1

From 75dbb3094973a4e69dafc213b760f5f13ab33a91 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 24 Mar 2021 14:02:06 +0530
Subject: [PATCH v4 2/2] Fix attcompression for index expression columns

For the expression columns, set the attcompression to the invalid
compression method and while actually compressing the data if the
attcompression is not valid then use the default compression method.

Basically, attcompression for the index attribute is always in sync with
that of the table attribute but for the expression columns that is not
possible so for them always use the current default method while
compressing the data.
---
 src/backend/access/brin/brin_tuple.c|  8 +---
 src/backend/access/common/indextuple.c  | 16 ++--
 src/backend/catalog/index.c |  9 +
 src/test/regress/expected/compression.out   |  6 ++
 src/test/regress/expected/compression_1.out | 13 +
 src/test/regress/sql/compression.sql|  7 +++
 6 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 8d03e60..32ffd9f 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -220,10 +220,12 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
 
 /*
  * If the BRIN summary and indexed attribute use the same data
- * type, we can use the same compression method. Otherwise we
- * have to use the default method.
+ * type and it has a valid compression method, we can use the
+ * same compression method. Otherwise we have to use the
+ * default method.
  */
-if (att->atttypid == atttype->type_id)
+if (att->atttypid == atttype->type_id &&
+	CompressionMethodIsValid(att->attcompression))
 	compression = att->attcompression;
 else
 	compression = GetDefaultToastCompression();
diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index 1f6b7b7..4d58644 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -103,8 +103,20 @@ index_form_tuple(TupleDesc tupleDescriptor,
 			(att->attstorage == TYPSTORAGE_EXTENDED ||
 			 att->attstorage == TYPSTORAGE_MAIN))
 		{
-			Datum		cvalue = toast_compress_datum(untoasted_values[i],
-	  att->attcompression);
+			Datum	cvalue;
+			char	compression = att->attcompression;
+
+			/*
+			 * If the compression method is not valid then use the default
+			 * compression method. This can happen because, for the expression
+			 * columns, we set the attcompression to the invalid compression
+			 * method, while creating the index so that we can use the current
+			 * default compression method when we actually need to compress.
+			 */
+			if (!CompressionMethodIsValid(compression))
+compression = GetDefaultToastCompression();
+
+			cvalue 

Re: [HACKERS] Custom compression methods

2021-03-24 Thread Tom Lane
Justin Pryzby  writes:
> On Wed, Mar 24, 2021 at 01:30:26PM -0400, Robert Haas wrote:
>> ... So --no-toast-compression is just fine for people who are
>> dumping and restoring, but it's no help at all if you want to switch
>> TOAST compression methods while doing a pg_upgrade. However, what does
>> help with that is sticking with what Tom committed before rather than
>> changing to what he's proposing now.

> I don't know what/any other cases support using pg_upgrade to change stuff 
> like
> the example (changing to lz4).  The way to do it is to make the changes either
> before or after.  It seems weird to think that pg_upgrade would handle that.

Yeah; I think the charter of pg_upgrade is to reproduce the old database
state.  If you try to twiddle the process to incorporate some changes
in that state, maybe it will work, but if it breaks you get to keep both
pieces.  I surely don't wish to consider such shenanigans as supported.

But let's ignore the case of pg_upgrade and just consider a dump/restore.
I'd still say that unless you give --no-toast-compression then I would
expect the dump/restore to preserve the tables' old compression behavior.
Robert's argument that the pre-v14 database had no particular compression
behavior seems nonsensical to me.  We know exactly which compression
behavior it has.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Justin Pryzby
On Wed, Mar 24, 2021 at 01:30:26PM -0400, Robert Haas wrote:
> On Wed, Mar 24, 2021 at 1:24 PM Justin Pryzby  wrote:
> > I think it's not specific to pg_upgrade, but any pg_dump |pg_restore.
> >
> > The analogy with tablespaces is restoring from a cluster where the 
> > tablespace
> > is named "vast" to one where it's named "huge".  I do this by running
> > PGOPTIONS=-cdefault_tablespace=huge pg_restore --no-tablespaces
> >
> > So I thinks as long as --no-toast-compression does the corresponding thing, 
> > the
> > "restore with alternate compression" case is handled fine.
> 
> I think you might be missing the point. If you're using pg_dump and
> pg_restore, you can pass --no-toast-compression if you want. But if

Actually, I forgot that pg_restore doesn't (can't) have --no-toast-compression.
So my analogy is broken.

> you're using pg_upgrade, and it's internally calling pg_dump
> --binary-upgrade, then you don't have control over what options get
> passed. So --no-toast-compression is just fine for people who are
> dumping and restoring, but it's no help at all if you want to switch
> TOAST compression methods while doing a pg_upgrade. However, what does
> help with that is sticking with what Tom committed before rather than
> changing to what he's proposing now.

I don't know what/any other cases support using pg_upgrade to change stuff like
the example (changing to lz4).  The way to do it is to make the changes either
before or after.  It seems weird to think that pg_upgrade would handle that.

I'm going to risk making the analogy that it's not supported to pass
--no-tablespaces from pg_upgrade to pg_dump/restore.  It certainly can't work
for --link (except in the weird case that the dirs are on the same filesystem). 
 

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 1:24 PM Justin Pryzby  wrote:
> I think it's not specific to pg_upgrade, but any pg_dump |pg_restore.
>
> The analogy with tablespaces is restoring from a cluster where the tablespace
> is named "vast" to one where it's named "huge".  I do this by running
> PGOPTIONS=-cdefault_tablespace=huge pg_restore --no-tablespaces
>
> So I thinks as long as --no-toast-compression does the corresponding thing, 
> the
> "restore with alternate compression" case is handled fine.

I think you might be missing the point. If you're using pg_dump and
pg_restore, you can pass --no-toast-compression if you want. But if
you're using pg_upgrade, and it's internally calling pg_dump
--binary-upgrade, then you don't have control over what options get
passed. So --no-toast-compression is just fine for people who are
dumping and restoring, but it's no help at all if you want to switch
TOAST compression methods while doing a pg_upgrade. However, what does
help with that is sticking with what Tom committed before rather than
changing to what he's proposing now.

If you like his current proposal, that's fine with me, as long as
we're on the same page about what happens if we adopt it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 12:14 PM Dilip Kumar  wrote:
> Actually, we are already doing this, I mean ALTER TABLE .. ALTER
> COLUMN .. SET COMPRESSION is already updating the compression method
> of the index attribute.  So 0003 doesn't make sense, sorry for the
> noise.  However, 0001 and 0002 are still valid, or do you think that
> we don't want 0001 also?  If we don't need 0001 also then we need to
> update the test output for 0002 slightly.

It seems to me that 0002 is still not right. We can't fix the
attcompression to whatever the default is at the time the index is
created, because the default can be changed later, and there's no way
to fix index afterward. I mean, it would be fine to do it that way if
we were going to go with the other model, where the index state is
separate from the table state, either can be changed independently,
and it all gets dumped and restored. But, as it is, I think we should
be deciding how to compress new values for an expression column based
on the default_toast_compression setting at the time of compression,
not the time of index creation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Justin Pryzby
On Wed, Mar 24, 2021 at 12:24:38PM -0400, Robert Haas wrote:
> On Wed, Mar 24, 2021 at 11:42 AM Tom Lane  wrote:
> > On reflection, though, I wonder if we've made pg_dump do the right
> > thing anyway.  There is a strong case to be made for the idea that
> > when dumping from a pre-14 server, it should emit
> > SET default_toast_compression = 'pglz';
> > rather than omitting any mention of the variable, which is what
> > I made it do in aa25d1089.  If we changed that, I think all these
> > diffs would go away.  Am I right in thinking that what's being
> > compared here is new pg_dump's dump from old server versus new
> > pg_dump's dump from new server?
> >
> > The "strong case" goes like this: initdb a v14 cluster, change
> > default_toast_compression to lz4 in its postgresql.conf, then
> > try to pg_upgrade from an old server.  If the dump script doesn't
> > set default_toast_compression = 'pglz' then the upgrade will
> > do the wrong thing because all the tables will be recreated with
> > a different behavior than they had before.  IIUC, this wouldn't
> > result in broken data, but it still seems to me to be undesirable.
> > dump/restore ought to do its best to preserve the old DB state,
> > unless you explicitly tell it --no-toast-compression or the like.
> 
> This feels a bit like letting the tail wag the dog, because one might
> reasonably guess that the user's intention in such a case was to
> switch to using LZ4, and we've subverted that intention by deciding
> that we know better. I wouldn't blame someone for thinking that using
> --no-toast-compression with a pre-v14 server ought to have no effect,
> but with your proposal here, it would. Furthermore, IIUC, the user has
> no way of passing --no-toast-compression through to pg_upgrade, so
> they're just going to have to do the upgrade and then fix everything
> manually afterward to the state that they intended to have all along.
> Now, on the other hand, if they wanted to make practically any other
> kind of change while upgrading, they'd have to do something like that
> anyway, so I guess this is no worse.

I think it's not specific to pg_upgrade, but any pg_dump |pg_restore.

The analogy with tablespaces is restoring from a cluster where the tablespace
is named "vast" to one where it's named "huge".  I do this by running
PGOPTIONS=-cdefault_tablespace=huge pg_restore --no-tablespaces

So I thinks as long as --no-toast-compression does the corresponding thing, the
"restore with alternate compression" case is handled fine.

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 12:45 PM Tom Lane  wrote:
> I wouldn't be proposing this if the xversion failures were the only
> reason; making them go away is just a nice side-effect.  The core
> point is that the charter of pg_dump is to reproduce the source
> database's state, and as things stand we're failing to ensure we
> do that.

Well, that state is just a mental construct, right? In reality, there
is no such state stored anywhere in the old database. You're choosing
to attribute to it an implicit state that matches what would need to
be configured in the newer version to get the same behavior, which is
a reasonable thing to do, but it is an interpretive choice rather than
a bare fact.

I don't care very much if you want to change this, but to me it seems
slightly worse than the status quo. It's hard to imagine that someone
is going to create a new cluster, set the default to lz4, run
pg_upgrade, and then complain that the new columns ended up with lz4
as the default. It seems much more likely that they're going to
complain if the new columns *don't* end up with lz4 as the default.
And I also can't see any other scenario where imagining that the TOAST
compression property of the old database simply does not exist, rather
than being pglz implicitly, is worse.

But I could be wrong, and even if I'm right it's not a hill upon which
I wish to die.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 24, 2021 at 11:42 AM Tom Lane  wrote:
>> On reflection, though, I wonder if we've made pg_dump do the right
>> thing anyway.  There is a strong case to be made for the idea that
>> when dumping from a pre-14 server, it should emit
>> SET default_toast_compression = 'pglz';
>> rather than omitting any mention of the variable, which is what
>> I made it do in aa25d1089.

> But also ... aren't we just doing this to work around a test case that
> isn't especially good in the first place? Counting the number of lines
> in the diff between A and B is an extremely crude proxy for "they're
> similar enough that we probably haven't broken anything."

I wouldn't be proposing this if the xversion failures were the only
reason; making them go away is just a nice side-effect.  The core
point is that the charter of pg_dump is to reproduce the source
database's state, and as things stand we're failing to ensure we
do that.

(But yeah, we really need a better way of making this check in
the xversion tests.  I don't like the arbitrary "n lines of diff
is probably OK" business one bit.)

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Robert Haas
On Mon, Mar 22, 2021 at 4:57 PM Robert Haas  wrote:
> > Fixed.
>
> Fixed some more.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 11:42 AM Tom Lane  wrote:
> On reflection, though, I wonder if we've made pg_dump do the right
> thing anyway.  There is a strong case to be made for the idea that
> when dumping from a pre-14 server, it should emit
> SET default_toast_compression = 'pglz';
> rather than omitting any mention of the variable, which is what
> I made it do in aa25d1089.  If we changed that, I think all these
> diffs would go away.  Am I right in thinking that what's being
> compared here is new pg_dump's dump from old server versus new
> pg_dump's dump from new server?
>
> The "strong case" goes like this: initdb a v14 cluster, change
> default_toast_compression to lz4 in its postgresql.conf, then
> try to pg_upgrade from an old server.  If the dump script doesn't
> set default_toast_compression = 'pglz' then the upgrade will
> do the wrong thing because all the tables will be recreated with
> a different behavior than they had before.  IIUC, this wouldn't
> result in broken data, but it still seems to me to be undesirable.
> dump/restore ought to do its best to preserve the old DB state,
> unless you explicitly tell it --no-toast-compression or the like.

This feels a bit like letting the tail wag the dog, because one might
reasonably guess that the user's intention in such a case was to
switch to using LZ4, and we've subverted that intention by deciding
that we know better. I wouldn't blame someone for thinking that using
--no-toast-compression with a pre-v14 server ought to have no effect,
but with your proposal here, it would. Furthermore, IIUC, the user has
no way of passing --no-toast-compression through to pg_upgrade, so
they're just going to have to do the upgrade and then fix everything
manually afterward to the state that they intended to have all along.
Now, on the other hand, if they wanted to make practically any other
kind of change while upgrading, they'd have to do something like that
anyway, so I guess this is no worse.

But also ... aren't we just doing this to work around a test case that
isn't especially good in the first place? Counting the number of lines
in the diff between A and B is an extremely crude proxy for "they're
similar enough that we probably haven't broken anything."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 9:32 PM Robert Haas  wrote:
>
> On Wed, Mar 24, 2021 at 11:41 AM Dilip Kumar  wrote:
> > Okay, that sounds like a reasonable design idea.  But the problem is
> > that in index_form_tuple we only have index tuple descriptor, not the
> > heap tuple descriptor. Maybe we will have to pass the heap tuple
> > descriptor as a parameter to index_form_tuple.   I will think more
> > about this that how can we do that.
>
> Another option might be to decide that the pg_attribute tuples for the
> index columns always have to match the corresponding table columns.
> So, if you alter with ALTER TABLE, it runs around and updates all of
> the indexes to match. For expression index columns, we could store
> InvalidCompressionMethod, causing index_form_tuple() to substitute the
> run-time default. That kinda sucks, because it's a significant
> impediment to ever reducing the lock level for ALTER TABLE .. ALTER
> COLUMN .. SET COMPRESSION, but I'm not sure we have the luxury of
> worrying about that problem right now.

Actually, we are already doing this, I mean ALTER TABLE .. ALTER
COLUMN .. SET COMPRESSION is already updating the compression method
of the index attribute.  So 0003 doesn't make sense, sorry for the
noise.  However, 0001 and 0002 are still valid, or do you think that
we don't want 0001 also?  If we don't need 0001 also then we need to
update the test output for 0002 slightly.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 11:41 AM Dilip Kumar  wrote:
> Okay, that sounds like a reasonable design idea.  But the problem is
> that in index_form_tuple we only have index tuple descriptor, not the
> heap tuple descriptor. Maybe we will have to pass the heap tuple
> descriptor as a parameter to index_form_tuple.   I will think more
> about this that how can we do that.

Another option might be to decide that the pg_attribute tuples for the
index columns always have to match the corresponding table columns.
So, if you alter with ALTER TABLE, it runs around and updates all of
the indexes to match. For expression index columns, we could store
InvalidCompressionMethod, causing index_form_tuple() to substitute the
run-time default. That kinda sucks, because it's a significant
impediment to ever reducing the lock level for ALTER TABLE .. ALTER
COLUMN .. SET COMPRESSION, but I'm not sure we have the luxury of
worrying about that problem right now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Tom Lane
Andrew Dunstan  writes:
> On 3/20/21 3:03 PM, Tom Lane wrote:
>> I fixed up some issues in 0008/0009 (mostly cosmetic, except that
>> you forgot a server version check in dumpToastCompression) and
>> pushed that, so we can see if it makes crake happy.

> It's still produced a significant amount more difference between the
> dumps. For now I've increased the fuzz factor a bit like this:
> -   if (   ($oversion ne $this_branch && $difflines < 2000)
> +   if (   ($oversion ne $this_branch && $difflines < 2700)
> I'll try to come up with something better. Maybe just ignore lines like
> SET default_toast_compression = 'pglz';
> when taking the diff.

I see that some other buildfarm animals besides your own critters
are still failing the xversion tests, presumably because they lack
this hack :-(.

On reflection, though, I wonder if we've made pg_dump do the right
thing anyway.  There is a strong case to be made for the idea that
when dumping from a pre-14 server, it should emit
SET default_toast_compression = 'pglz';
rather than omitting any mention of the variable, which is what
I made it do in aa25d1089.  If we changed that, I think all these
diffs would go away.  Am I right in thinking that what's being
compared here is new pg_dump's dump from old server versus new
pg_dump's dump from new server?

The "strong case" goes like this: initdb a v14 cluster, change
default_toast_compression to lz4 in its postgresql.conf, then
try to pg_upgrade from an old server.  If the dump script doesn't
set default_toast_compression = 'pglz' then the upgrade will
do the wrong thing because all the tables will be recreated with
a different behavior than they had before.  IIUC, this wouldn't
result in broken data, but it still seems to me to be undesirable.
dump/restore ought to do its best to preserve the old DB state,
unless you explicitly tell it --no-toast-compression or the like.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 8:41 PM Robert Haas  wrote:
>
> On Wed, Mar 24, 2021 at 7:45 AM Dilip Kumar  wrote:
> > I have anyway created a patch for this as well.  Including all three
> > patches so we don't lose track.
> >
> > 0001 ->shows compression method for the index attribute in index describe
> > 0002 -> fix the reported bug (test case included)
> > (optional) 0003-> Alter set compression for index column
>
> As I understand it, the design idea here up until now has been that
> the index's attcompression values are irrelevant and ignored and that
> any compression which happens for index attributes is based either on
> the table attribute's assigned attcompression value, or the default.
> If that's the idea, then all of these patches are wrong.

The current design is that whenever we create an index, the index's
attribute copies the attcompression from the table's attribute.  And,
while compressing the index tuple we will use the attcompression from
the index attribute.

> Now, a possible alternative design would be that the index's
> attcompression controls compression for the index same as a table's
> does for the table. But in that case, it seems to me that these
> patches are insufficient, because then we'd also need to, for example,
> dump and restore the setting, which I don't think anything in these
> patches or the existing code will do.

Yeah, you are right.

> My vote, as of now, is for the first design, in which case you need to
> forget about trying to get pg_attribute to have the right contents -
> in fact, I think we should set all the values there to
> InvalidCompressionMethod to make sure we're not relying on them
> anywhere. And then you need to make sure that everything that tries to
> compress an index value uses the setting from the table column or the
> default, not the setting on the index column.

Okay, that sounds like a reasonable design idea.  But the problem is
that in index_form_tuple we only have index tuple descriptor, not the
heap tuple descriptor. Maybe we will have to pass the heap tuple
descriptor as a parameter to index_form_tuple.   I will think more
about this that how can we do that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 7:45 AM Dilip Kumar  wrote:
> I have anyway created a patch for this as well.  Including all three
> patches so we don't lose track.
>
> 0001 ->shows compression method for the index attribute in index describe
> 0002 -> fix the reported bug (test case included)
> (optional) 0003-> Alter set compression for index column

As I understand it, the design idea here up until now has been that
the index's attcompression values are irrelevant and ignored and that
any compression which happens for index attributes is based either on
the table attribute's assigned attcompression value, or the default.
If that's the idea, then all of these patches are wrong.

Now, a possible alternative design would be that the index's
attcompression controls compression for the index same as a table's
does for the table. But in that case, it seems to me that these
patches are insufficient, because then we'd also need to, for example,
dump and restore the setting, which I don't think anything in these
patches or the existing code will do.

My vote, as of now, is for the first design, in which case you need to
forget about trying to get pg_attribute to have the right contents -
in fact, I think we should set all the values there to
InvalidCompressionMethod to make sure we're not relying on them
anywhere. And then you need to make sure that everything that tries to
compress an index value uses the setting from the table column or the
default, not the setting on the index column.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 3:40 PM Dilip Kumar  wrote:
>
> 0001 ->shows compression method for the index attribute in index describe
> 0002 -> fix the reported bug (test case included)
>
> Apart from this, I was thinking that currently, we are allowing to
> ALTER SET COMPRESSION only for the table and matview,  IMHO it makes
> sense to allow to alter the compression method for the index column as
> well?  I mean it is just a one-line change, but just wanted to know
> the opinion from others.  It is not required for the storage because
> indexes can not have a toast table but index attributes can be
> compressed so it makes sense to allow to alter the compression method.
> Thought?

I have anyway created a patch for this as well.  Including all three
patches so we don't lose track.

0001 ->shows compression method for the index attribute in index describe
0002 -> fix the reported bug (test case included)
(optional) 0003-> Alter set compression for index column

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 6f83c026ddfe9cda14a9c5e965841b6cd2df0a2a Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 24 Mar 2021 17:08:31 +0530
Subject: [PATCH v3 3/3] ALTER SET COMPRESSION for index columns

---
 src/backend/commands/tablecmds.c|  2 +-
 src/bin/psql/tab-complete.c |  4 ++--
 src/test/regress/expected/compression.out   | 16 
 src/test/regress/expected/compression_1.out |  6 ++
 src/test/regress/sql/compression.sql|  4 
 5 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3349bcf..6d5b79d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4335,7 +4335,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_MISC;
 			break;
 		case AT_SetCompression:	/* ALTER COLUMN SET COMPRESSION */
-			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW);
+			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_INDEX);
 			/* This command never recurses */
 			/* No command-specific prep needed */
 			pass = AT_PASS_MISC;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index b67f4ea..305665e 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1772,10 +1772,10 @@ psql_completion(const char *text, int start, int end)
 	}
 	/* ALTER INDEX  ALTER COLUMN  */
 	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER", "COLUMN", MatchAny))
-		COMPLETE_WITH("SET STATISTICS");
+		COMPLETE_WITH("SET");
 	/* ALTER INDEX  ALTER COLUMN  SET */
 	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER", "COLUMN", MatchAny, "SET"))
-		COMPLETE_WITH("STATISTICS");
+		COMPLETE_WITH("COMPRESSION", "STATISTICS");
 	/* ALTER INDEX  ALTER COLUMN  SET STATISTICS */
 	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER", "COLUMN", MatchAny, "SET", "STATISTICS"))
 	{
diff --git a/src/test/regress/expected/compression.out b/src/test/regress/expected/compression.out
index 19707fb..000983f 100644
--- a/src/test/regress/expected/compression.out
+++ b/src/test/regress/expected/compression.out
@@ -326,6 +326,22 @@ generate_series(1, 50) g), VERSION());
  expr   | text | yes  | (f1 || f2) | extended | pglz| 
 unique, btree, for table "public.cmdata2"
 
+CREATE INDEX idx2 ON cmdata2(f2);
+\d+ idx2
+Index "public.idx2"
+ Column | Type | Key? | Definition | Storage  | Compression | Stats target 
++--+--++--+-+--
+ f2 | text | yes  | f2 | extended | lz4 | 
+btree, for table "public.cmdata2"
+
+ALTER INDEX idx2 ALTER COLUMN f2 SET COMPRESSION pglz;
+\d+ idx2
+Index "public.idx2"
+ Column | Type | Key? | Definition | Storage  | Compression | Stats target 
++--+--++--+-+--
+ f2 | text | yes  | f2 | extended | pglz| 
+btree, for table "public.cmdata2"
+
 -- check data is ok
 SELECT length(f1) FROM cmdata;
  length 
diff --git a/src/test/regress/expected/compression_1.out b/src/test/regress/expected/compression_1.out
index 84b933d..761fee4 100644
--- a/src/test/regress/expected/compression_1.out
+++ b/src/test/regress/expected/compression_1.out
@@ -324,6 +324,12 @@ ERROR:  relation "cmdata2" does not exist
 LINE 1: INSERT INTO cmdata2 VALUES((SELECT array_agg(md5(g::TEXT))::...
 ^
 \d+ idx1
+CREATE INDEX idx2 ON cmdata2(f2);
+ERROR:  relation "cmdata2" does not exist
+\d+ idx2
+ALTER INDEX idx2 ALTER COLUMN f2 SET COMPRESSION pglz;
+ERROR:  relation "idx2" does not exist
+\d+ idx2
 -- check data is ok
 SELECT length(f1) FROM cmdata;
  length 
diff --git a/src/test/regress/sql/compression.sql b/src/test/regress/sql/compression.sql
index 4afd5a2..6777f11 100644
--- a/src/test/regress/sql/compression.sql
+++ b/src/test/regress/sql/compression.sql
@@ -137,6 

Re: [HACKERS] Custom compression methods

2021-03-24 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 3:10 PM Dilip Kumar  wrote:
>
> On Wed, Mar 24, 2021 at 2:49 PM Justin Pryzby  wrote:
> >
> > On Wed, Mar 24, 2021 at 02:24:41PM +0530, Dilip Kumar wrote:
> > > On Wed, Mar 24, 2021 at 1:43 PM Dilip Kumar  wrote:
> > > > > create table t1 (col1 text, col2 text);
> > > > > create unique index on t1 ((col1 || col2));
> > > > > insert into t1 values((select array_agg(md5(g::text))::text from
> > > > > generate_series(1, 256) g), version());
> > > > >
> > > > > Attached is a backtrace from current HEAD
> > > >
> > > > Thanks for reporting this issue.  Actually, I missed setting the
> > > > attcompression for the expression index and that is causing this
> > > > assert.  I will send a patch in some time.
> > >
> > > PFA, patch to fix the issue.
> >
> > Could you include a test case exercizing this code path ?
> > Like Jaime's reproducer.
>
> I will do that.

0001 ->shows compression method for the index attribute in index describe
0002 -> fix the reported bug (test case included)

Apart from this, I was thinking that currently, we are allowing to
ALTER SET COMPRESSION only for the table and matview,  IMHO it makes
sense to allow to alter the compression method for the index column as
well?  I mean it is just a one-line change, but just wanted to know
the opinion from others.  It is not required for the storage because
indexes can not have a toast table but index attributes can be
compressed so it makes sense to allow to alter the compression method.
Thought?


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From fa3ee05e21af64af86bedcc919488300c4f9a92d Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 24 Mar 2021 15:08:14 +0530
Subject: [PATCH v2 1/2] Show compression method in index describe

---
 src/bin/psql/describe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index eeac0ef..9d15324 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1897,6 +1897,7 @@ describeOneTableDetails(const char *schemaname,
 		if (pset.sversion >= 14 &&
 			!pset.hide_compression &&
 			(tableinfo.relkind == RELKIND_RELATION ||
+			 tableinfo.relkind == RELKIND_INDEX ||
 			 tableinfo.relkind == RELKIND_PARTITIONED_TABLE ||
 			 tableinfo.relkind == RELKIND_MATVIEW))
 		{
-- 
1.8.3.1

From 54edcec2d44cfac01e1c2d20cc413dac57496ca9 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 24 Mar 2021 14:02:06 +0530
Subject: [PATCH v2 2/2] Fix attcompression for index expression columns

For expression columns the attcompression was not set.  So this patch
set it to the default compression method for compressible types otherwise
to the invalid compression method.
---
 src/backend/catalog/index.c | 11 +++
 src/test/regress/expected/compression.out   | 13 +
 src/test/regress/expected/compression_1.out | 14 ++
 src/test/regress/sql/compression.sql|  8 
 4 files changed, 46 insertions(+)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 397d70d..b5a79ce 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -30,6 +30,7 @@
 #include "access/relscan.h"
 #include "access/sysattr.h"
 #include "access/tableam.h"
+#include "access/toast_compression.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
 #include "access/xact.h"
@@ -379,6 +380,16 @@ ConstructTupleDescriptor(Relation heapRelation,
 			to->attalign = typeTup->typalign;
 			to->atttypmod = exprTypmod(indexkey);
 
+			/*
+			 * For expression column, if attribute type storage is compressible
+			 * then set the default compression method, otherwise invalid
+			 * compression method.
+			 */
+			if (IsStorageCompressible(typeTup->typstorage))
+to->attcompression = GetDefaultToastCompression();
+			else
+to->attcompression = InvalidCompressionMethod;
+
 			ReleaseSysCache(tuple);
 
 			/*
diff --git a/src/test/regress/expected/compression.out b/src/test/regress/expected/compression.out
index c2f2e0e..19707fb 100644
--- a/src/test/regress/expected/compression.out
+++ b/src/test/regress/expected/compression.out
@@ -313,6 +313,19 @@ SELECT pg_column_compression(f1) FROM cmdata;
  lz4
 (2 rows)
 
+-- test expression index
+DROP TABLE cmdata2;
+CREATE TABLE cmdata2 (f1 TEXT COMPRESSION pglz, f2 TEXT COMPRESSION lz4);
+CREATE UNIQUE INDEX idx1 ON cmdata2 ((f1 || f2));
+INSERT INTO cmdata2 VALUES((SELECT array_agg(md5(g::TEXT))::TEXT FROM
+generate_series(1, 50) g), VERSION());
+\d+ idx1
+Index "public.idx1"
+ Column | Type | Key? | Definition | Storage  | Compression | Stats target 
++--+--++--+-+--
+ expr   | text | yes  | (f1 || f2) | extended | pglz| 
+unique, btree, for table "public.cmdata2"
+
 -- check data is ok
 SELECT length(f1) FROM cmdata;
  length 
diff --git a/src/test/regress/expected/compression_1.out 

Re: [HACKERS] Custom compression methods

2021-03-24 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 2:49 PM Justin Pryzby  wrote:
>
> On Wed, Mar 24, 2021 at 02:24:41PM +0530, Dilip Kumar wrote:
> > On Wed, Mar 24, 2021 at 1:43 PM Dilip Kumar  wrote:
> > > > create table t1 (col1 text, col2 text);
> > > > create unique index on t1 ((col1 || col2));
> > > > insert into t1 values((select array_agg(md5(g::text))::text from
> > > > generate_series(1, 256) g), version());
> > > >
> > > > Attached is a backtrace from current HEAD
> > >
> > > Thanks for reporting this issue.  Actually, I missed setting the
> > > attcompression for the expression index and that is causing this
> > > assert.  I will send a patch in some time.
> >
> > PFA, patch to fix the issue.
>
> Could you include a test case exercizing this code path ?
> Like Jaime's reproducer.

I will do that.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Justin Pryzby
On Wed, Mar 24, 2021 at 02:24:41PM +0530, Dilip Kumar wrote:
> On Wed, Mar 24, 2021 at 1:43 PM Dilip Kumar  wrote:
> > > create table t1 (col1 text, col2 text);
> > > create unique index on t1 ((col1 || col2));
> > > insert into t1 values((select array_agg(md5(g::text))::text from
> > > generate_series(1, 256) g), version());
> > >
> > > Attached is a backtrace from current HEAD
> >
> > Thanks for reporting this issue.  Actually, I missed setting the
> > attcompression for the expression index and that is causing this
> > assert.  I will send a patch in some time.
> 
> PFA, patch to fix the issue.

Could you include a test case exercizing this code path ?
Like Jaime's reproducer.

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 1:43 PM Dilip Kumar  wrote:

> >
> > """
> > create table t1 (col1 text, col2 text);
> > create unique index on t1 ((col1 || col2));
> > insert into t1 values((select array_agg(md5(g::text))::text from
> > generate_series(1, 256) g), version());
> > """
> >
> > Attached is a backtrace from current HEAD
>
> Thanks for reporting this issue.  Actually, I missed setting the
> attcompression for the expression index and that is causing this
> assert.  I will send a patch in some time.

PFA, patch to fix the issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Fix-attcompression-for-index-expression-columns.patch
Description: Binary data


Re: [HACKERS] Custom compression methods

2021-03-24 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 1:22 PM Jaime Casanova
 wrote:
>
> On Fri, Mar 19, 2021 at 2:44 PM Robert Haas  wrote:
> >
> > I committed the core patch (0003) with a bit more editing.  Let's see
> > what the buildfarm thinks.
> >
>
> I think this is bbe0a81db69bd10bd166907c3701492a29aca294, right?
> This introduced a new assert failure, steps to reproduce:
>
> """
> create table t1 (col1 text, col2 text);
> create unique index on t1 ((col1 || col2));
> insert into t1 values((select array_agg(md5(g::text))::text from
> generate_series(1, 256) g), version());
> """
>
> Attached is a backtrace from current HEAD

Thanks for reporting this issue.  Actually, I missed setting the
attcompression for the expression index and that is causing this
assert.  I will send a patch in some time.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-24 Thread Jaime Casanova
On Fri, Mar 19, 2021 at 2:44 PM Robert Haas  wrote:
>
> I committed the core patch (0003) with a bit more editing.  Let's see
> what the buildfarm thinks.
>

I think this is bbe0a81db69bd10bd166907c3701492a29aca294, right?
This introduced a new assert failure, steps to reproduce:

"""
create table t1 (col1 text, col2 text);
create unique index on t1 ((col1 || col2));
insert into t1 values((select array_agg(md5(g::text))::text from
generate_series(1, 256) g), version());
"""

Attached is a backtrace from current HEAD

-- 
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {4194304, 140726735034080, 2, 6, 6515006, 
94155769016304, 4611686018427388799, 140459852548774, 0, 281470681751456, 0, 0, 
0, 0, 0, 0}}
pid = 
tid = 
ret = 
#1  0x7fbf5b657535 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {0, 0, 0, 0, 0, 140459850305525, 2, 3630241276386869248, 
7003431909949847861, 94155769016304, 
  7003772758621645568, 0, 1523958137291918592, 140726735034320, 0, 
140726735035184}}, sa_flags = 1495953392, sa_restorer = 0x0}
sigs = {__val = {32, 0 }}
#2  0x55a25992a983 in ExceptionalCondition (conditionName=0x55a2599a9248 
"CompressionMethodIsValid(cmethod)", errorType=0x55a2599a91d2 
"FailedAssertion", fileName=0x55a2599a91c0 "toast_internals.c", 
lineNumber=56) at assert.c:69
No locals.
#3  0x55a2592c0709 in toast_compress_datum (value=94155801611904, cmethod=0 
'\000') at toast_internals.c:56
tmp = 0x0
valsize = 1528257472
cmid = TOAST_INVALID_COMPRESSION_ID
__func__ = "toast_compress_datum"
#4  0x55a2592b9532 in index_form_tuple (tupleDescriptor=0x7fbf524881a8, 
values=0x7ffd7f0d5e10, isnull=0x7ffd7f0d5df0) at indextuple.c:106
cvalue = 140726735035664
att = 0x7fbf524881c0
tp = 0x25b19e830 
tuple = 0x55a25b1a4608
size = 94155801511432
data_size = 0
hoff = 0
i = 0
infomask = 0
hasnull = false
tupmask = 0
numberOfAttributes = 1
untoasted_values = {94155801611904, 94155771761321, 140726735035376, 
94155801506904, 94155779258656, 140726735035488, 94155779258656, 
140726735035488, 0, 140459848948121, 140726735035488, 
  94155779258656, 140726735035520, 0, 140726735035520, 94155769016304, 
94155801611904, 8589934592, 94155801512576, 94155801512544, 94155801511432, 
94155801506904, 94155771775058, 94155801352832, 0, 
  94155779258656, 6426549472, 94155771784441, 94155801487408, 0, 0, 0}
untoasted_free = {false, false, false, false, false, false, false, 
false, false, 4, false, false, false, false, false, false, 8, 69, 26, 91, 162, 
85, false, false, 192, 91, 23, 91, 162, 85, false, 
  false}
__func__ = "index_form_tuple"
#5  0x55a259355563 in btinsert (rel=0x7fbf5248b3f0, values=0x7ffd7f0d5e10, 
isnull=0x7ffd7f0d5df0, ht_ctid=0x55a25b19e860, heapRel=0x7fbf524840a8, 
checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, 
indexInfo=0x55a25b17d968) at nbtree.c:196
result = false
itup = 0x55a2593eb96d 
#6  0x55a259341253 in index_insert (indexRelation=0x7fbf5248b3f0, 
values=0x7ffd7f0d5e10, isnull=0x7ffd7f0d5df0, heap_t_ctid=0x55a25b19e860, 
heapRelation=0x7fbf524840a8, checkUnique=UNIQUE_CHECK_YES, 
indexUnchanged=false, indexInfo=0x55a25b17d968) at indexam.c:193
__func__ = "index_insert"
#7  0x55a259551d6b in ExecInsertIndexTuples (resultRelInfo=0x55a25b17d0a8, 
slot=0x55a25b19e830, estate=0x55a25b175ce0, update=false, noDupErr=false, 
specConflict=0x0, arbiterIndexes=0x0)
at execIndexing.c:416
applyNoDupErr = false
checkUnique = UNIQUE_CHECK_YES
indexRelation = 0x7fbf5248b3f0
indexInfo = 0x55a25b17d968
indexUnchanged = false
satisfiesConstraint = false
tupleid = 0x55a25b19e860
result = 0x0
i = 0
numIndices = 1
relationDescs = 0x55a25b17d900
heapRelation = 0x7fbf524840a8
indexInfoArray = 0x55a25b17d920
econtext = 0x55a25b1a2f88
values = {94155801611904, 94155801320384, 23002747632, 94155801505976, 
94155801505952, 94155801320384, 140726735036016, 94155776090255, 
94155800974064, 94155801505976, 140726735036048, 94155801320384, 
  140726735036048, 94155769089800, 5801161712, 94155801505976, 
140726735036192, 94155769471556, 0, 0, 94155801603232, 140459695947944, 
140726735036224, 6764221304040416, 1, 23351616, 140459705036672, 
  856226916400, 94155801505976, 2325076142599, 549755813894, 0}
isnull = {false, true, false, false, false, false, false, false, 184, 
48, 26, 91, 162, 85, false, false, 64, 94, 13, 127, 253, 127, false, false, 
135, 177, 149, 89, 162, 85, 

Re: [HACKERS] Custom compression methods

2021-03-22 Thread Tom Lane
Justin Pryzby  writes:
> On Mon, Mar 22, 2021 at 03:47:58PM -0400, Robert Haas wrote:
>> Are you sure you're looking at the patch I sent,
>> toast-compression-guc-rmh.patch? I can't help wondering if you applied
>> it to a dirty source tree or got the wrong file or something, because
>> otherwise I don't understand why you're seeing things that I'm not
>> seeing.

> I'm guessing Tom read this hunk as being changes to
> check_default_toast_compression() rather than removing the function ?

Yeah, after looking closer, the diff looks like
check_default_toast_compression is being modified in-place,
whereas actually it's getting replaced by CompressionNameToMethod
which does something entirely different.  I'd also not looked
closely enough at where NO_LZ4_SUPPORT() was being moved to.
My apologies --- I can only plead -ENOCAFFEINE.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Justin Pryzby
On Mon, Mar 22, 2021 at 03:47:58PM -0400, Robert Haas wrote:
> On Mon, Mar 22, 2021 at 2:10 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > I think this is significantly cleaner than what we have now, and I
> > > also prefer it to your proposal.
> >
> > +1 in general.  However, I suspect that you did not try to compile
> > this without --with-lz4, because if you had you'd have noticed the
> > other uses of NO_LZ4_SUPPORT() that you broke.  I think you need
> > to leave that macro where it is.
> 
> You're correct that I hadn't tried this without --with-lz4, but I did
> grep for other uses of NO_LZ4_SUPPORT() and found none. I also just
> tried it without --with-lz4 just now, and it worked fine.
> 
> > Also, it's not nice for GUC check
> > functions to throw ereport(ERROR); we prefer the caller to be able
> > to decide if it's a hard error or not.  That usage should be using
> > GUC_check_errdetail() or a cousin, so it can't share the macro anyway.
> 
> I agree that these are valid points about GUC check functions in
> general, but the patch I sent adds 0 GUC check functions and removes
> 1, and it didn't do the stuff you describe here anyway.
> 
> Are you sure you're looking at the patch I sent,
> toast-compression-guc-rmh.patch? I can't help wondering if you applied
> it to a dirty source tree or got the wrong file or something, because
> otherwise I don't understand why you're seeing things that I'm not
> seeing.

I'm guessing Tom read this hunk as being changes to
check_default_toast_compression() rather than removing the function ?


- * Validate a new value for the default_toast_compression GUC.
+ * CompressionNameToMethod - Get compression method from compression name
+ *
+ * Search in the available built-in methods.  If the compression not found
+ * in the built-in methods then return InvalidCompressionMethod.
  */
-bool
-check_default_toast_compression(char **newval, void **extra, GucSource source)
+char
+CompressionNameToMethod(const char *compression)
 {
-   if (**newval == '\0')
+   if (strcmp(compression, "pglz") == 0)
+   return TOAST_PGLZ_COMPRESSION;
+   else if (strcmp(compression, "lz4") == 0)
{
-   GUC_check_errdetail("%s cannot be empty.",
-   
"default_toast_compression");
-   return false;
+#ifndef USE_LZ4
+   NO_LZ4_SUPPORT();
+#endif
+   return TOAST_LZ4_COMPRESSION;

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 4:33 PM Robert Haas  wrote:
> On Mon, Mar 22, 2021 at 1:58 PM Justin Pryzby  wrote:
> > guc.c should not longer define this as extern:
> > default_toast_compression_options
>
> Fixed.

Fixed some more.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-Tidy-up-more-loose-ends-related-to-configurable-T.patch
Description: Binary data


Re: [HACKERS] Custom compression methods

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 1:58 PM Justin Pryzby  wrote:
> guc.c should not longer define this as extern:
> default_toast_compression_options

Fixed.

> I think you should comment that default_toast_compression is an int as far as
> guc.c is concerned, but storing one of the char value of TOAST_*_COMPRESSION

Done.

> Shouldn't varlena.c pg_column_compression() call GetCompressionMethodName () ?
> I guess it should already have done that.

It has a 0-3 integer, not a char value.

> Maybe pg_dump.c can't use those constants, though (?)

Hmm, toast_compression.h might actually be safe for frontend code now,
or if necessary we could add #ifdef FRONTEND stanzas to make it so. I
don't know if that is really this patch's job, but I guess it could
be.

A couple of other things:

- Upon further reflection, I think the NO_LZ4_SUPPORT() message is
kinda not great. I'm thinking we should change it to say "LZ4 is not
supported by this build" instead of "unsupported LZ4 compression
method" and drop the hint and detail. That seems more like how we've
handled other such cases.

- It is not very nice that the three possible values of attcompression
are TOAST_PGLZ_COMPRESSION, TOAST_LZ4_COMPRESSION, and
InvalidCompressionMethod. One of those three identifiers looks very
little like the other two, and there's no real good reason for that. I
think we should try to standardize on something, but I'm not sure what
it should be. It would also be nice if these names were more visually
distinct from the related but very different enum values
TOAST_PGLZ_COMPRESSION_ID and TOAST_LZ4_COMPRESSION_ID. Really, as the
comments I added explain, we want to minimize the amount of code that
knows about the 0-3 "ID" values, and use the char values whenever we
can.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Tidy-up-more-loose-ends-related-to-configurable-T.patch
Description: Binary data


Re: [HACKERS] Custom compression methods

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 2:10 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I think this is significantly cleaner than what we have now, and I
> > also prefer it to your proposal.
>
> +1 in general.  However, I suspect that you did not try to compile
> this without --with-lz4, because if you had you'd have noticed the
> other uses of NO_LZ4_SUPPORT() that you broke.  I think you need
> to leave that macro where it is.

You're correct that I hadn't tried this without --with-lz4, but I did
grep for other uses of NO_LZ4_SUPPORT() and found none. I also just
tried it without --with-lz4 just now, and it worked fine.

> Also, it's not nice for GUC check
> functions to throw ereport(ERROR); we prefer the caller to be able
> to decide if it's a hard error or not.  That usage should be using
> GUC_check_errdetail() or a cousin, so it can't share the macro anyway.

I agree that these are valid points about GUC check functions in
general, but the patch I sent adds 0 GUC check functions and removes
1, and it didn't do the stuff you describe here anyway.

Are you sure you're looking at the patch I sent,
toast-compression-guc-rmh.patch? I can't help wondering if you applied
it to a dirty source tree or got the wrong file or something, because
otherwise I don't understand why you're seeing things that I'm not
seeing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Tom Lane
Robert Haas  writes:
> I think this is significantly cleaner than what we have now, and I
> also prefer it to your proposal.

+1 in general.  However, I suspect that you did not try to compile
this without --with-lz4, because if you had you'd have noticed the
other uses of NO_LZ4_SUPPORT() that you broke.  I think you need
to leave that macro where it is.  Also, it's not nice for GUC check
functions to throw ereport(ERROR); we prefer the caller to be able
to decide if it's a hard error or not.  That usage should be using
GUC_check_errdetail() or a cousin, so it can't share the macro anyway.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Justin Pryzby
On Mon, Mar 22, 2021 at 01:38:36PM -0400, Robert Haas wrote:
> On Mon, Mar 22, 2021 at 12:16 PM Robert Haas  wrote:
> > But, what about giving the default_toast_compression_method GUC an
> > assign hook that sets a global variable of type "char" to the
> > appropriate value? Then GetDefaultToastCompression() goes away
> > entirely. That might be worth exploring.
> 
> Actually, we can do even better. We should just make the values
> actually assigned to the GUC be TOAST_PGLZ_COMPRESSION etc. rather
> than TOAST_PGLZ_COMPRESSION_ID etc. Then a whole lot of complexity
> just goes away. I added some comments explaining why using
> TOAST_PGLZ_COMPRESSION is the wrong thing anyway. Then I got hacking
> and rearranged a few other things. So the attached patch does these
> thing:
> 
> - Changes default_toast_compression to an enum, as in your patch, but
> now with values that are the same as what ultimately gets stored in
> attcompression.
> - Adds a comment warning against incautious use of
> TOAST_PGLZ_COMPRESSION_ID, etc.
> - Moves default_toast_compression_options to guc.c.
> - After doing the above two things, we can remove the #include of
> utils/guc.h into access/toast_compression.h, so the patch does that.
> - Moves NO_LZ4_SUPPORT, GetCompressionMethodName, and
> CompressionNameToMethod to guc.c. Making these inline functions
> doesn't save anything meaningful; it's more important not to export a
> bunch of random identifiers.
> - Removes an unnecessary cast to bool from the definition of
> CompressionMethodIsValid.
> 
> I think this is significantly cleaner than what we have now, and I
> also prefer it to your proposal.

+1

guc.c should not longer define this as extern:
default_toast_compression_options

I think you should comment that default_toast_compression is an int as far as
guc.c is concerned, but storing one of the char value of TOAST_*_COMPRESSION

Shouldn't varlena.c pg_column_compression() call GetCompressionMethodName () ?
I guess it should already have done that.

Maybe pg_dump.c can't use those constants, though (?)

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 12:16 PM Robert Haas  wrote:
> But, what about giving the default_toast_compression_method GUC an
> assign hook that sets a global variable of type "char" to the
> appropriate value? Then GetDefaultToastCompression() goes away
> entirely. That might be worth exploring.

Actually, we can do even better. We should just make the values
actually assigned to the GUC be TOAST_PGLZ_COMPRESSION etc. rather
than TOAST_PGLZ_COMPRESSION_ID etc. Then a whole lot of complexity
just goes away. I added some comments explaining why using
TOAST_PGLZ_COMPRESSION is the wrong thing anyway. Then I got hacking
and rearranged a few other things. So the attached patch does these
thing:

- Changes default_toast_compression to an enum, as in your patch, but
now with values that are the same as what ultimately gets stored in
attcompression.
- Adds a comment warning against incautious use of
TOAST_PGLZ_COMPRESSION_ID, etc.
- Moves default_toast_compression_options to guc.c.
- After doing the above two things, we can remove the #include of
utils/guc.h into access/toast_compression.h, so the patch does that.
- Moves NO_LZ4_SUPPORT, GetCompressionMethodName, and
CompressionNameToMethod to guc.c. Making these inline functions
doesn't save anything meaningful; it's more important not to export a
bunch of random identifiers.
- Removes an unnecessary cast to bool from the definition of
CompressionMethodIsValid.

I think this is significantly cleaner than what we have now, and I
also prefer it to your proposal.

Comments?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


toast-compression-guc-rmh.patch
Description: Binary data


Re: [HACKERS] Custom compression methods

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 11:13 AM Justin Pryzby  wrote:
> The first iteration was pretty rough, and there's still some question in my
> mind about where default_toast_compression_options[] should be defined.  If
> it's in the header file, then I could use lengthof() - but then it probably
> gets multiply defined.

What do you want to use lengthof() for?

> In the latest patch, there's multiple "externs".  Maybe
> guc.c doesn't need the extern, since it includes toast_compression.h.  But 
> then
> it's the only "struct config_enum_entry" which has an "extern" outside of
> guc.c.

Oh, yeah, we certainly shouldn't have an extern in guc.c itself, if
we've already got it in the header file.

As to the more general question of where to put stuff, I don't think
there's any conceptual problem with putting it in a header file rather
than in guc.c. It's not very scalable to just keeping inventing new
GUCs and sticking all their accoutrement into guc.c. That might have
kind of made sense when guc.c was invented, since there were probably
fewer settings there and guc.c itself was new, but at this point it's
a well-established part of the infrastructure and having other
subsystems cater to what it needs rather than the other way around
seems logical. However, it's not great to have "utils/guc.h" included
in "access/toast_compression.h", because then anything that includes
"access/toast_compression.h" or "access/toast_internals.h" sucks in
"utils/guc.h" even though it's not really topically related to what
they intended to include. We can't avoid that just by choosing to put
this enum in guc.c, because GetDefaultToastCompression() also uses it.
But, what about giving the default_toast_compression_method GUC an
assign hook that sets a global variable of type "char" to the
appropriate value? Then GetDefaultToastCompression() goes away
entirely. That might be worth exploring.

> Also, it looks like you added default_toast_compression out of order, so maybe
> you'd fix that at the same time.

You know, I looked at where you had it and said to myself, "surely
this is a silly place to put this, it would make much more sense to
move this up a bit." Now I feel dumb.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-22 Thread Tom Lane
Robert Haas  writes:
> On Mon, Mar 22, 2021 at 11:48 AM Tom Lane  wrote:
>> Possibly the former names should survive and the latter become
>> wrappers around them, not sure.  But we shouldn't be using the "4B"
>> terminology anyplace except this part of postgres.h.

> I would argue that it shouldn't be used any place at all, and that we
> ought to go the other direction and get rid of the existing macros -
> e.g. change #define VARATT_IS_1B_E to #define VARATT_IS_EXTERNAL
> instead of defining the latter as a no-value-added wrapper around the
> former. Maybe at one time somebody thought that the test for
> VARATT_IS_EXTERNAL might someday have more cases than just
> VARATT_IS_1B_E, but that's not looking like a good bet in 2021.

Maybe.  I think the original idea was exactly what the comment says,
to have a layer of macros that'd deal with endianness issues and no more.
That still seems like a reasonable plan to me, though perhaps it wasn't
executed very well.

regards, tom lane




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 11:48 AM Tom Lane  wrote:
> > Anyway, this particular macro name was chosen, it seems, for symmetry
> > with VARDATA_4B_C, but if you want to change it to something else, I'm
> > OK with that, too.
>
> After looking at postgres.h for a bit, I'm thinking that what these
> should have been symmetric with is the considerably-less-terrible
> names used for the corresponding VARATT_EXTERNAL cases.  Thus,
> something like
>
> s/VARRAWSIZE_4B_C/VARDATA_COMPRESSED_GET_RAWSIZE/
> s/VARCOMPRESS_4B_C/VARDATA_COMPRESSED_GET_COMPRESSION/

Works for me.

> Possibly the former names should survive and the latter become
> wrappers around them, not sure.  But we shouldn't be using the "4B"
> terminology anyplace except this part of postgres.h.

I would argue that it shouldn't be used any place at all, and that we
ought to go the other direction and get rid of the existing macros -
e.g. change #define VARATT_IS_1B_E to #define VARATT_IS_EXTERNAL
instead of defining the latter as a no-value-added wrapper around the
former. Maybe at one time somebody thought that the test for
VARATT_IS_EXTERNAL might someday have more cases than just
VARATT_IS_1B_E, but that's not looking like a good bet in 2021.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-22 Thread Tom Lane
Robert Haas  writes:
> On Mon, Mar 22, 2021 at 10:41 AM Tom Lane  wrote:
>> Yeah, I thought about that too, but do we want to assume that
>> VARRAWSIZE_4B_C is the correct way to get the decompressed size
>> for all compression methods?

> I think it's OK to assume this.

OK, cool.

>> (If so, I think it would be better style to have a less opaque macro
>> name for the purpose.)

> Complaining about the name of one particular TOAST-related macro name
> seems a bit like complaining about the greenhouse gasses emitted by
> one particular car.

Maybe, but that's not a reason to make it worse.  Anyway, my understanding
of that is that the really opaque names are *only* meant to be used in
this very stretch of postgres.h, ie they are just intermediate steps on
the way to the macros below them.  As an example, the only use of
VARDATA_1B_E() is in VARDATA_EXTERNAL().

> Anyway, this particular macro name was chosen, it seems, for symmetry
> with VARDATA_4B_C, but if you want to change it to something else, I'm
> OK with that, too.

After looking at postgres.h for a bit, I'm thinking that what these
should have been symmetric with is the considerably-less-terrible
names used for the corresponding VARATT_EXTERNAL cases.  Thus,
something like

s/VARRAWSIZE_4B_C/VARDATA_COMPRESSED_GET_RAWSIZE/
s/VARCOMPRESS_4B_C/VARDATA_COMPRESSED_GET_COMPRESSION/

Possibly the former names should survive and the latter become
wrappers around them, not sure.  But we shouldn't be using the "4B"
terminology anyplace except this part of postgres.h.

regards, tom lane




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 10:41 AM Tom Lane  wrote:
> > Okay, the fix makes sense.  In fact, IMHO, in general also this fix
> > looks like an optimization, I mean when slicelength >=
> > VARRAWSIZE_4B_C(value), then why do we need to allocate extra memory
> > even in the case of pglz.  So shall we put this check directly in
> > toast_decompress_datum_slice instead of handling it at the lz4 level?
>
> Yeah, I thought about that too, but do we want to assume that
> VARRAWSIZE_4B_C is the correct way to get the decompressed size
> for all compression methods?

I think it's OK to assume this. If and when we add a third compression
method, it seems certain to just grab one of the two remaining bit
patterns. Now, things get a bit more complicated if and when we want
to add a fourth method, because at that point you've got to ask
yourself how comfortable you feel about stealing the last bit pattern
for your feature. But, if the solution to that problem were to decide
that whenever that last bit pattern is used, we will add an extra byte
(or word) after va_tcinfo indicating the real compression method, then
using VARRAWSIZE_4B_C here would still be correct. To imagine this
decision being wrong, you have to posit a world in which one of the
two remaining bit patterns for the high 2 bits cause the low 30 bits
to be interpreted as something other than the size, which I guess is
not totally impossible, but my first reaction is to think that such a
design would be (1) hard to make work and (2) unnecessarily painful.

> (If so, I think it would be better style to have a less opaque macro
> name for the purpose.)

Complaining about the name of one particular TOAST-related macro name
seems a bit like complaining about the greenhouse gasses emitted by
one particular car. They're pretty uniformly terrible. Does anyone
really know when to use VARATT_IS_1B_E or VARATT_IS_4B_U or any of
that cruft? Like, who decided that "is this varatt 1B E?" would be a
perfectly reasonable way of asking "is this varlena is TOAST
pointer?". While I'm complaining, it's hard to say enough bad things
about the fact that we have 12 consecutive completely obscure macro
definitions for which the only comments are (a) that they are
endian-dependent - which isn't even true for all of them - and (b)
that they are "considered internal." Apparently, they're SO internal
that they don't even need to be understandable to other developers.

Anyway, this particular macro name was chosen, it seems, for symmetry
with VARDATA_4B_C, but if you want to change it to something else, I'm
OK with that, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-22 Thread Dilip Kumar
On Mon, Mar 22, 2021 at 8:11 PM Tom Lane  wrote:
>
> Dilip Kumar  writes:
> > On Mon, Mar 22, 2021 at 5:22 AM Tom Lane  wrote:
> >> Also, after studying the documentation for LZ4_decompress_safe
> >> and LZ4_decompress_safe_partial, I realized that liblz4 is also
> >> counting on the *output* buffer size to not be a lie.  So we
> >> cannot pass it a number larger than the chunk's true decompressed
> >> size.  The attached patch resolves the issue I'm seeing.
>
> > Okay, the fix makes sense.  In fact, IMHO, in general also this fix
> > looks like an optimization, I mean when slicelength >=
> > VARRAWSIZE_4B_C(value), then why do we need to allocate extra memory
> > even in the case of pglz.  So shall we put this check directly in
> > toast_decompress_datum_slice instead of handling it at the lz4 level?
>
> Yeah, I thought about that too, but do we want to assume that
> VARRAWSIZE_4B_C is the correct way to get the decompressed size
> for all compression methods?

Yeah, VARRAWSIZE_4B_C is the macro getting the rawsize of the data
stored in the compressed varlena.

> (If so, I think it would be better style to have a less opaque macro
> name for the purpose.)

Okay, I have added another macro that is less opaque and came up with
this patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v1-0001-fix-slice-decompression.patch
Description: Binary data


Re: [HACKERS] Custom compression methods

2021-03-22 Thread Justin Pryzby
On Mon, Mar 22, 2021 at 11:05:19AM -0400, Robert Haas wrote:
> On Mon, Mar 22, 2021 at 10:44 AM Justin Pryzby  wrote:
> > Thanks.  I just realized that if you also push the GUC change, then the docs
> > should change from  to 
> >
> > doc/src/sgml/config.sgml:  
> > default_toast_compression (string)
> 
> I've now also committed your 0005. As for 0006, aside from the note
> above, which is a good one, is there any particular reason why this
> patch is labelled as WIP? I think this change makes sense and we
> should just do it unless there's some problem with it.

The first iteration was pretty rough, and there's still some question in my
mind about where default_toast_compression_options[] should be defined.  If
it's in the header file, then I could use lengthof() - but then it probably
gets multiply defined.  In the latest patch, there's multiple "externs".  Maybe
guc.c doesn't need the extern, since it includes toast_compression.h.  But then
it's the only "struct config_enum_entry" which has an "extern" outside of
guc.c.

Also, it looks like you added default_toast_compression out of order, so maybe
you'd fix that at the same time.

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Robert Haas
On Mon, Mar 22, 2021 at 10:44 AM Justin Pryzby  wrote:
> Thanks.  I just realized that if you also push the GUC change, then the docs
> should change from  to 
>
> doc/src/sgml/config.sgml:  
> default_toast_compression (string)

I've now also committed your 0005. As for 0006, aside from the note
above, which is a good one, is there any particular reason why this
patch is labelled as WIP? I think this change makes sense and we
should just do it unless there's some problem with it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Justin Pryzby
On Mon, Mar 22, 2021 at 10:41:33AM -0400, Robert Haas wrote:
> On Sun, Mar 21, 2021 at 7:55 PM Justin Pryzby  wrote:
> > Rebased again.
> 
> Thanks, Justin. I committed 0003 and 0004 together as
> 226e2be3876d0bda3dc33d16dfa0bed246b7b74f. I also committed 0001 and
> 0002 together as 24f0e395ac5892cd12e8914646fe921fac5ba23d, but with
> some revisions, because your text was not clear that this is setting
> the default for new tables, not new values; it also implied that this
> only affects out-of-line compression, which is not true. In lieu of
> trying to explain how TOAST works here, I added a link. It looks,
> though, like that documentation also needs to be patched for this
> change. I'll look into that, and your remaining patches, next.

Thanks.  I just realized that if you also push the GUC change, then the docs
should change from  to 

doc/src/sgml/config.sgml:  
default_toast_compression (string)

-- 
Justin




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-22 Thread Tom Lane
Dilip Kumar  writes:
> On Mon, Mar 22, 2021 at 5:22 AM Tom Lane  wrote:
>> Also, after studying the documentation for LZ4_decompress_safe
>> and LZ4_decompress_safe_partial, I realized that liblz4 is also
>> counting on the *output* buffer size to not be a lie.  So we
>> cannot pass it a number larger than the chunk's true decompressed
>> size.  The attached patch resolves the issue I'm seeing.

> Okay, the fix makes sense.  In fact, IMHO, in general also this fix
> looks like an optimization, I mean when slicelength >=
> VARRAWSIZE_4B_C(value), then why do we need to allocate extra memory
> even in the case of pglz.  So shall we put this check directly in
> toast_decompress_datum_slice instead of handling it at the lz4 level?

Yeah, I thought about that too, but do we want to assume that
VARRAWSIZE_4B_C is the correct way to get the decompressed size
for all compression methods?

(If so, I think it would be better style to have a less opaque macro
name for the purpose.)

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Robert Haas
On Sun, Mar 21, 2021 at 7:55 PM Justin Pryzby  wrote:
> Rebased again.

Thanks, Justin. I committed 0003 and 0004 together as
226e2be3876d0bda3dc33d16dfa0bed246b7b74f. I also committed 0001 and
0002 together as 24f0e395ac5892cd12e8914646fe921fac5ba23d, but with
some revisions, because your text was not clear that this is setting
the default for new tables, not new values; it also implied that this
only affects out-of-line compression, which is not true. In lieu of
trying to explain how TOAST works here, I added a link. It looks,
though, like that documentation also needs to be patched for this
change. I'll look into that, and your remaining patches, next.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-22 Thread Dilip Kumar
On Mon, Mar 22, 2021 at 5:25 AM Justin Pryzby  wrote:
>
> On Sat, Mar 20, 2021 at 06:20:39PM -0500, Justin Pryzby wrote:
> > Rebased on HEAD.
> > 0005 forgot to update compression_1.out.
> > Included changes to ./configure.ac and some other patches, but not Tomas's,
> > since it'll make CFBOT get mad as soon as that's pushed.
>
> Rebased again.
> Renamed "t" to a badcompresstbl to avoid name conflicts.
> Polish the enum GUC patch some.
>
> I noticed that TOAST_INVALID_COMPRESSION_ID was unused ... but then I found a
> use for it.

Yeah, it is used in toast_compress_datum, toast_get_compression_id,
reform_and_rewrite_tuple and pg_column_compression function.  Your
patches look fine to me.  I agree that v3-0006 also makes sense as it
is simplifying the GUC handling.  Thanks for fixing these.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-22 Thread Dilip Kumar
On Mon, Mar 22, 2021 at 5:22 AM Tom Lane  wrote:
> Actually, after reading that closer, the problem only affects the
> case where the compressed-data-length passed to the function is
> a lie.  So it shouldn't be a problem for our usage.
>
> Also, after studying the documentation for LZ4_decompress_safe
> and LZ4_decompress_safe_partial, I realized that liblz4 is also
> counting on the *output* buffer size to not be a lie.  So we
> cannot pass it a number larger than the chunk's true decompressed
> size.  The attached patch resolves the issue I'm seeing.

Okay, the fix makes sense.  In fact, IMHO, in general also this fix
looks like an optimization, I mean when slicelength >=
VARRAWSIZE_4B_C(value), then why do we need to allocate extra memory
even in the case of pglz.  So shall we put this check directly in
toast_decompress_datum_slice instead of handling it at the lz4 level?

Like this.

diff --git a/src/backend/access/common/detoast.c
b/src/backend/access/common/detoast.c
index bed50e8..099ac15 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -506,6 +506,10 @@ toast_decompress_datum_slice(struct varlena
*attr, int32 slicelength)

Assert(VARATT_IS_COMPRESSED(attr));

+   /* liblz4 assumes that slicelength is not an overestimate */
+   if (slicelength >= VARRAWSIZE_4B_C(attr))
+   return toast_decompress_datum(attr);
+
/*
 * Fetch the compression method id stored in the compression header and
 * decompress the data slice using the appropriate
decompression routine.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-21 Thread Justin Pryzby
On Sat, Mar 20, 2021 at 06:20:39PM -0500, Justin Pryzby wrote:
> Rebased on HEAD.
> 0005 forgot to update compression_1.out.
> Included changes to ./configure.ac and some other patches, but not Tomas's,
> since it'll make CFBOT get mad as soon as that's pushed.

Rebased again.
Renamed "t" to a badcompresstbl to avoid name conflicts.
Polish the enum GUC patch some.

I noticed that TOAST_INVALID_COMPRESSION_ID was unused ... but then I found a
use for it.
>From edc0e6ab248600c6d33dd681359554550c88a679 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 19:12:53 -0500
Subject: [PATCH v3 01/12] Add docs for default_toast_compression..

bbe0a81db69bd10bd166907c3701492a29aca294
---
 doc/src/sgml/config.sgml | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..5cb851a5eb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8151,6 +8151,29 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  default_toast_compression (string)
+  
+   default_toast_compression configuration parameter
+  
+  
+  
+   
+This parameter specifies the default compression method to use
+when compressing data in TOAST tables.
+It applies only to variable-width data types.
+It may be overriden by compression clauses in the
+CREATE command, or changed after the relation is
+created by ALTER TABLE ... SET COMPRESSION.
+
+The supported compression methods are pglz and
+(if configured at the time PostgreSQL was
+built) lz4.
+The default is pglz.
+   
+  
+ 
+
  
   temp_tablespaces (string)
   
-- 
2.17.0

>From 02a9038381bdfe577aa32b67bcd231acd61f2c20 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 21:52:28 -0500
Subject: [PATCH v3 02/12] doc: pg_dump --no-toast-compression

---
 doc/src/sgml/ref/pg_dump.sgml | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index bcbb7a25fb..989b8e2381 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -931,6 +931,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-toast-compression
+  
+   
+Do not output commands to set TOAST compression
+methods.
+With this option, all objects will be created using whichever
+compression method is the default during restore.
+   
+  
+ 
+
  
   --no-unlogged-table-data
   
-- 
2.17.0

>From 7e9b4b3c6f12abaf7531ef5109e084dc188e7dda Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 20:23:40 -0500
Subject: [PATCH v3 03/12] Compression method is an char not an OID

---
 src/backend/commands/tablecmds.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 877c08814e..22f3c5efc0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7847,6 +7847,7 @@ SetIndexStorageProperties(Relation rel, Relation attrelation,
 		index_close(indrel, lockmode);
 	}
 }
+
 /*
  * ALTER TABLE ALTER COLUMN SET STORAGE
  *
@@ -15070,7 +15071,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	AttrNumber	attnum;
 	char	   *compression;
 	char		typstorage;
-	Oid			cmoid;
+	char		cmethod;
 	ObjectAddress address;
 
 	Assert(IsA(newValue, String));
@@ -15104,10 +15105,10 @@ ATExecSetCompression(AlteredTableInfo *tab,
 		format_type_be(atttableform->atttypid;
 
 	/* get the attribute compression method. */
-	cmoid = GetAttributeCompression(atttableform, compression);
+	cmethod = GetAttributeCompression(atttableform, compression);
 
 	/* update pg_attribute entry */
-	atttableform->attcompression = cmoid;
+	atttableform->attcompression = cmethod;
 	CatalogTupleUpdate(attrel, >t_self, tuple);
 
 	InvokeObjectPostAlterHook(RelationRelationId,
@@ -15118,7 +15119,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	 * Apply the change to indexes as well (only for simple index columns,
 	 * matching behavior of index.c ConstructTupleDescriptor()).
 	 */
-	SetIndexStorageProperties(rel, attrel, attnum, cmoid, '\0', lockmode);
+	SetIndexStorageProperties(rel, attrel, attnum, cmethod, '\0', lockmode);
 
 	heap_freetuple(tuple);
 
-- 
2.17.0

>From 2262c8ffb041e8a7452bda724c676b6118895f4b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 20:07:31 -0500
Subject: [PATCH v3 04/12] Remove duplicative macro

---
 src/include/access/toast_compression.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/include/access/toast_compression.h b/src/include/access/toast_compression.h
index 5c9220c171..44b73bd57c 100644
--- a/src/include/access/toast_compression.h
+++ b/src/include/access/toast_compression.h
@@ -50,8 +50,6 @@ typedef enum 

Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 21, 2021 at 07:11:50PM -0400, Tom Lane wrote:
>> I hate to be the bearer of bad news, but this suggests that
>> LZ4_decompress_safe_partial is seriously broken in 1.9.2
>> as well:
>> https://github.com/lz4/lz4/issues/783

> Ouch

Actually, after reading that closer, the problem only affects the
case where the compressed-data-length passed to the function is
a lie.  So it shouldn't be a problem for our usage.

Also, after studying the documentation for LZ4_decompress_safe
and LZ4_decompress_safe_partial, I realized that liblz4 is also
counting on the *output* buffer size to not be a lie.  So we
cannot pass it a number larger than the chunk's true decompressed
size.  The attached patch resolves the issue I'm seeing.

regards, tom lane
diff --git a/src/backend/access/common/toast_compression.c b/src/backend/access/common/toast_compression.c
index 00af1740cf..74e449992a 100644
--- a/src/backend/access/common/toast_compression.c
+++ b/src/backend/access/common/toast_compression.c
@@ -220,6 +220,10 @@ lz4_decompress_datum_slice(const struct varlena *value, int32 slicelength)
 	if (LZ4_versionNumber() < 10803)
 		return lz4_decompress_datum(value);
 
+	/* liblz4 assumes that slicelength is not an overestimate */
+	if (slicelength >= VARRAWSIZE_4B_C(value))
+		return lz4_decompress_datum(value);
+
 	/* allocate memory for the uncompressed data */
 	result = (struct varlena *) palloc(slicelength + VARHDRSZ);
 


Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 07:11:50PM -0400, Tom Lane wrote:
> I wrote:
> > Justin Pryzby  writes:
> >> On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
> >>> This seems somewhat repeatable (three identical failures in three
> >>> attempts).  Not sure why I did not see it yesterday; but anyway,
> >>> there is something wrong with partial detoasting for LZ4.
> 
> >> With what version of LZ4 ?
> 
> > RHEL8's, which is
> > lz4-1.8.3-2.el8.x86_64
> 
> I hate to be the bearer of bad news, but this suggests that
> LZ4_decompress_safe_partial is seriously broken in 1.9.2
> as well:
> 
> https://github.com/lz4/lz4/issues/783

Ouch

> Maybe we cannot rely on that function for a few more years yet.
> 
> Also, I don't really understand why this code:
> 
>   /* slice decompression not supported prior to 1.8.3 */
>   if (LZ4_versionNumber() < 10803)
>   return lz4_decompress_datum(value);
> 
> It seems likely to me that we'd get a flat out build failure
> from library versions lacking LZ4_decompress_safe_partial,
> and thus that this run-time test is dead code and we should
> better be using a configure probe if we intend to allow old
> liblz4 versions.  Though that might be moot.

The function existed before 1.8.3, but didn't handle slicing.
https://github.com/lz4/lz4/releases/tag/v1.8.3
|Finally, an existing function, LZ4_decompress_safe_partial(), has been 
enhanced to make it possible to decompress only the beginning of an LZ4 block, 
up to a specified number of bytes. Partial decoding can be useful to save CPU 
time and memory, when the objective is to extract a limited portion from a 
larger block.

Possibly we could allow v >= 1.9.3 || (ver >= 1.8.3 && ver < 1.9.2).

Or maybe not: the second half apparently worked "by accident", and we shouldn't
need to have intimate knowledge of someone else's patchlevel releases, 

-- 
Justin




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
I wrote:
> Justin Pryzby  writes:
>> On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
>>> This seems somewhat repeatable (three identical failures in three
>>> attempts).  Not sure why I did not see it yesterday; but anyway,
>>> there is something wrong with partial detoasting for LZ4.

>> With what version of LZ4 ?

> RHEL8's, which is
> lz4-1.8.3-2.el8.x86_64

I hate to be the bearer of bad news, but this suggests that
LZ4_decompress_safe_partial is seriously broken in 1.9.2
as well:

https://github.com/lz4/lz4/issues/783

Maybe we cannot rely on that function for a few more years yet.

Also, I don't really understand why this code:

/* slice decompression not supported prior to 1.8.3 */
if (LZ4_versionNumber() < 10803)
return lz4_decompress_datum(value);

It seems likely to me that we'd get a flat out build failure
from library versions lacking LZ4_decompress_safe_partial,
and thus that this run-time test is dead code and we should
better be using a configure probe if we intend to allow old
liblz4 versions.  Though that might be moot.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-21 Thread Tom Lane
... btw, now that I look at this, why are we expending a configure
probe for  ?  If we need to cater for that spelling of
the header name, the C code proper is not ready for it.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-21 Thread Tom Lane
Justin Pryzby  writes:
> Rebased on HEAD.
> 0005 forgot to update compression_1.out.
> Included changes to ./configure.ac and some other patches, but not Tomas's,
> since it'll make CFBOT get mad as soon as that's pushed.

I pushed a version of the configure fixes that passes my own sanity
checks, and removes the configure warning with MacPorts.  That
obsoletes your 0006.  Of the rest, I prefer the 0009 approach
(make the GUC an enum) to 0008, and the others seem sane but I haven't
studied the code, so I'll leave it to Robert to handle them.

regards, tom lane




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
>> This seems somewhat repeatable (three identical failures in three
>> attempts).  Not sure why I did not see it yesterday; but anyway,
>> there is something wrong with partial detoasting for LZ4.

> With what version of LZ4 ?

RHEL8's, which is
lz4-1.8.3-2.el8.x86_64

regards, tom lane




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
> This seems somewhat repeatable (three identical failures in three
> attempts).  Not sure why I did not see it yesterday; but anyway,
> there is something wrong with partial detoasting for LZ4.

With what version of LZ4 ?

-- 
Justin




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
Dilip Kumar  writes:
>> Yeah, we need to set the default_toast_compression in the beginning of
>> the test as attached.
> In the last patch, I did not adjust the compression_1.out so fixed
> that in the attached patch.

Pushed that; however, while testing that it works as expected,
I saw a new and far more concerning regression diff:

diff -U3 /home/postgres/pgsql/src/test/regress/expected/strings.out 
/home/postgres/pgsql/src/test/regress/results/strings.out
--- /home/postgres/pgsql/src/test/regress/expected/strings.out  2021-02-18 
10:34:58.190304138 -0500
+++ /home/postgres/pgsql/src/test/regress/results/strings.out   2021-03-21 
16:27:22.029402834 -0400
@@ -1443,10 +1443,10 @@
 -- If start plus length is > string length, the result is truncated to
 -- string length
 SELECT substr(f1, 5, 10) from toasttest;
- substr 
-
- 567890
- 567890
+ substr 
+
+ 567890\x7F\x7F\x7F\x7F
+ 567890\x7F\x7F\x7F\x7F
  567890
  567890
 (4 rows)
@@ -1520,10 +1520,10 @@
 -- If start plus length is > string length, the result is truncated to
 -- string length
 SELECT substr(f1, 5, 10) from toasttest;
- substr 
-
- 567890
- 567890
+ substr 
+
+ 567890\177\177\177\177
+ 567890\177\177\177\177
  567890
  567890
 (4 rows)

This seems somewhat repeatable (three identical failures in three
attempts).  Not sure why I did not see it yesterday; but anyway,
there is something wrong with partial detoasting for LZ4.

regards, tom lane




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Dilip Kumar
On Sun, Mar 21, 2021 at 7:03 AM Tom Lane  wrote:
> Also, I see some diffs in the
> indirect_toast test, which seems perhaps worthy of investigation.
> (The diffs look to be just row ordering, but why?)

I have investigated that,  actually in the below insert, after
compression the data size of (repeat('1234567890',5)) is 1980
bytes with the lz4 whereas with pglz it is 5737 bytes.  So with lz4,
the compressed data are stored inline whereas with pglz those are
getting externalized.  Due to this for one of the update statements
followed by an insert, there was no space on the first page as data
are stored inline so the new tuple is stored on the next page and that
is what affecting the order.  I hope this makes sense.

INSERT INTO indtoasttest(descr, f1, f2) VALUES('one-toasted,one-null',
NULL, repeat('1234567890',5));


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-20 Thread Dilip Kumar
On Sun, Mar 21, 2021 at 9:10 AM Dilip Kumar  wrote:
>
> On Sun, Mar 21, 2021 at 7:03 AM Tom Lane  wrote:
> >
> > BTW, I tried doing "make installcheck" after having adjusted
> > default_toast_compression to be "lz4".  The compression test
> > itself fails because it's expecting the other setting; that
> > ought to be made more robust.
>
> Yeah, we need to set the default_toast_compression in the beginning of
> the test as attached.

In the last patch, I did not adjust the compression_1.out so fixed
that in the attached patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v1-0001-fix-compression-test.patch
Description: Binary data


Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-20 Thread Dilip Kumar
On Sun, Mar 21, 2021 at 7:03 AM Tom Lane  wrote:
>
> BTW, I tried doing "make installcheck" after having adjusted
> default_toast_compression to be "lz4".  The compression test
> itself fails because it's expecting the other setting; that
> ought to be made more robust.

Yeah, we need to set the default_toast_compression in the beginning of
the test as attached.

  Also, I see some diffs in the
> indirect_toast test, which seems perhaps worthy of investigation.
> (The diffs look to be just row ordering, but why?)

I will look into this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0001-fix-compression-test.patch
Description: Binary data


Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-20 Thread Tom Lane
BTW, I tried doing "make installcheck" after having adjusted
default_toast_compression to be "lz4".  The compression test
itself fails because it's expecting the other setting; that
ought to be made more robust.  Also, I see some diffs in the
indirect_toast test, which seems perhaps worthy of investigation.
(The diffs look to be just row ordering, but why?)

regards, tom lane




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-20 Thread Tom Lane
Justin Pryzby  writes:
> On Sat, Mar 20, 2021 at 05:37:07PM -0400, Tom Lane wrote:
>> Digging around, it looks like the "-I/opt/local/include" bit came
>> from LZ4_CFLAGS, which we then stuck into CFLAGS, but it needed
>> to be put in CPPFLAGS in order to make this test work.

> If it's the same as the issue Andrey reported, then it causes a ./configure
> WARNING, which is resolved by the ac_save hack, which I copied from ICU.

I think probably what we need to do, rather than shove the pkg-config
results willy-nilly into our flags, is to disassemble them like we do
with the same results for xml2.  If you ask me, the way we are handling
ICU flags is a poor precedent that is going to blow up at some point;
the only reason it hasn't is that people aren't building --with-icu that
much yet.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tomas Vondra
On 3/20/21 4:21 PM, Justin Pryzby wrote:
> On Sat, Mar 20, 2021 at 04:13:47PM +0100, Tomas Vondra wrote:
>> +++ b/src/backend/access/brin/brin_tuple.c
>> @@ -213,10 +213,20 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, 
>> BrinMemTuple *tuple,
>>  (atttype->typstorage == TYPSTORAGE_EXTENDED ||
>>   atttype->typstorage == TYPSTORAGE_MAIN))
>>  {
>> +Datum   cvalue;
>> +charcompression = 
>> GetDefaultToastCompression();
>>  Form_pg_attribute att = 
>> TupleDescAttr(brdesc->bd_tupdesc,
>>  
>>   keyno);
>> -Datum   cvalue = 
>> toast_compress_datum(value,
>> -
>>   att->attcompression);
>> +
>> +/*
>> + * If the BRIN summary and indexed attribute 
>> use the same data
>> + * type, we can the same compression method. 
>> Otherwise we have
> 
> can *use ?
> 
>> + * to use the default method.
>> + */
>> +if (att->atttypid == atttype->type_id)
>> +compression = att->attcompression;
> 
> It would be more obvious to me if this said here: 
> | else: compression = GetDefaultToastCompression
> 

Thanks. I've pushed a patch tweaked per your feedback.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Justin Pryzby
Rebased on HEAD.
0005 forgot to update compression_1.out.
Included changes to ./configure.ac and some other patches, but not Tomas's,
since it'll make CFBOT get mad as soon as that's pushed.

-- 
Justin
>From bf1336d284792b29e30b42af2ec820bb0c256916 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 19:12:53 -0500
Subject: [PATCH v2 01/10] Add docs for default_toast_compression..

bbe0a81db69bd10bd166907c3701492a29aca294
---
 doc/src/sgml/config.sgml | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..5cb851a5eb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8151,6 +8151,29 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  default_toast_compression (string)
+  
+   default_toast_compression configuration parameter
+  
+  
+  
+   
+This parameter specifies the default compression method to use
+when compressing data in TOAST tables.
+It applies only to variable-width data types.
+It may be overriden by compression clauses in the
+CREATE command, or changed after the relation is
+created by ALTER TABLE ... SET COMPRESSION.
+
+The supported compression methods are pglz and
+(if configured at the time PostgreSQL was
+built) lz4.
+The default is pglz.
+   
+  
+ 
+
  
   temp_tablespaces (string)
   
-- 
2.17.0

>From 3184c87f129ec935fa46773728869c7c6af88025 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 21:52:28 -0500
Subject: [PATCH v2 02/10] doc: pg_dump --no-toast-compression

---
 doc/src/sgml/ref/pg_dump.sgml | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index bcbb7a25fb..4a521186fb 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -931,6 +931,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-toast-compression
+  
+   
+Do not output commands to set TOASTcompression
+methods.
+With this option, all objects will be created using whichever
+compression method is the default during restore.
+   
+  
+ 
+
  
   --no-unlogged-table-data
   
-- 
2.17.0

>From 2212020a50478c66a830f15b23c3410e4d7bb501 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 20:23:40 -0500
Subject: [PATCH v2 03/10] Compression method is an char not an OID

---
 src/backend/commands/tablecmds.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9b2800bf5e..8e756e59d5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7847,6 +7847,7 @@ SetIndexStorageProperties(Relation rel, Relation attrelation,
 		index_close(indrel, lockmode);
 	}
 }
+
 /*
  * ALTER TABLE ALTER COLUMN SET STORAGE
  *
@@ -15070,7 +15071,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	AttrNumber	attnum;
 	char	   *compression;
 	char		typstorage;
-	Oid			cmoid;
+	char		cmethod;
 	Datum		values[Natts_pg_attribute];
 	bool		nulls[Natts_pg_attribute];
 	bool		replace[Natts_pg_attribute];
@@ -15111,9 +15112,9 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	memset(replace, false, sizeof(replace));
 
 	/* get the attribute compression method. */
-	cmoid = GetAttributeCompression(atttableform, compression);
+	cmethod = GetAttributeCompression(atttableform, compression);
 
-	atttableform->attcompression = cmoid;
+	atttableform->attcompression = cmethod;
 	CatalogTupleUpdate(attrel, >t_self, tuple);
 
 	InvokeObjectPostAlterHook(RelationRelationId,
@@ -15123,7 +15124,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	ReleaseSysCache(tuple);
 
 	/* apply changes to the index column as well */
-	SetIndexStorageProperties(rel, attrel, attnum, cmoid, '\0', lockmode);
+	SetIndexStorageProperties(rel, attrel, attnum, cmethod, '\0', lockmode);
 	table_close(attrel, RowExclusiveLock);
 
 	/* make changes visible */
-- 
2.17.0

>From d73643d7569ef74b9008d232f8c0ea33cb96ff3c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 20:07:31 -0500
Subject: [PATCH v2 04/10] Remove duplicative macro

---
 src/include/access/toast_compression.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/include/access/toast_compression.h b/src/include/access/toast_compression.h
index 514df0bed1..09af6e7810 100644
--- a/src/include/access/toast_compression.h
+++ b/src/include/access/toast_compression.h
@@ -50,8 +50,6 @@ typedef enum ToastCompressionId
 			 errdetail("This functionality requires the server to be built with lz4 support."), \
 			 errhint("You need to rebuild PostgreSQL using --with-lz4.")))
 
-#define IsValidCompression(cm)  ((cm) != InvalidCompressionMethod)
-
 #define 

Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-20 Thread Justin Pryzby
On Sat, Mar 20, 2021 at 05:37:07PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Fri, Mar 19, 2021 at 02:07:31PM -0400, Robert Haas wrote:
> >> On Fri, Mar 19, 2021 at 1:44 PM Justin Pryzby  wrote:
> >>> configure: WARNING: lz4.h: accepted by the compiler, rejected by the 
> >>> preprocessor!
> >>> configure: WARNING: lz4.h: proceeding with the compiler's result
> 
> >> No, I don't see this. I wonder whether this could possibly be an
> >> installation issue on Andrey's machine? If not, it must be
> >> version-dependent or installation-dependent in some way.
> 
> > Andrey, can you check if latest HEAD (bbe0a81db) has these ./configure 
> > warnings ?
> 
> FWIW, I also saw that, when building HEAD against MacPorts' version
> of liblz4 on an M1 Mac.  config.log has
...
> Digging around, it looks like the "-I/opt/local/include" bit came
> from LZ4_CFLAGS, which we then stuck into CFLAGS, but it needed
> to be put in CPPFLAGS in order to make this test work.

If it's the same as the issue Andrey reported, then it causes a ./configure
WARNING, which is resolved by the ac_save hack, which I copied from ICU.

I'll shortly send a patchset including my tentative fix for that.

The configure.ac bits are also on this other thread:
https://www.postgresql.org/message-id/20210315180918.GW29463%40telsasoft.com
0005-re-add-wal_compression_method-lz4.patch 

+if test "$with_lz4" = yes; then
+  ac_save_CPPFLAGS=$CPPFLAGS
+  CPPFLAGS="$LZ4_CFLAGS $CPPFLAGS"
+
+  # Verify we have LZ4's header files
+  AC_CHECK_HEADERS(lz4/lz4.h, [],
+   [AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is 
required for LZ4])])])
+
+  CPPFLAGS=$ac_save_CPPFLAGS
+fi

-- 
Justin




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-20 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, Mar 19, 2021 at 02:07:31PM -0400, Robert Haas wrote:
>> On Fri, Mar 19, 2021 at 1:44 PM Justin Pryzby  wrote:
>>> configure: WARNING: lz4.h: accepted by the compiler, rejected by the 
>>> preprocessor!
>>> configure: WARNING: lz4.h: proceeding with the compiler's result

>> No, I don't see this. I wonder whether this could possibly be an
>> installation issue on Andrey's machine? If not, it must be
>> version-dependent or installation-dependent in some way.

> Andrey, can you check if latest HEAD (bbe0a81db) has these ./configure 
> warnings ?

FWIW, I also saw that, when building HEAD against MacPorts' version
of liblz4 on an M1 Mac.  config.log has

configure:13536: checking lz4.h usability
configure:13536: ccache clang -c -I/opt/local/include -Wall -Wmissing-prototype\
s -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmi\
ssing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unus\
ed-command-line-argument -g -O2 -isysroot /Applications/Xcode.app/Contents/Deve\
loper/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk   conftest.c >&5
configure:13536: $? = 0
configure:13536: result: yes
configure:13536: checking lz4.h presence
configure:13536: ccache clang -E -isysroot /Applications/Xcode.app/Contents/Dev\
eloper/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk   conftest.c
conftest.c:67:10: fatal error: 'lz4.h' file not found
#include 
 ^~~
1 error generated.
configure:13536: $? = 1

Digging around, it looks like the "-I/opt/local/include" bit came
from LZ4_CFLAGS, which we then stuck into CFLAGS, but it needed
to be put in CPPFLAGS in order to make this test work.

regards, tom lane




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-20 Thread Justin Pryzby
On Fri, Mar 19, 2021 at 02:07:31PM -0400, Robert Haas wrote:
> On Fri, Mar 19, 2021 at 1:44 PM Justin Pryzby  wrote:
> > Working with one of Andrey's patches on another thread, he reported offlist
> > getting this message, resolved by this patch.  Do you see this warning 
> > during
> > ./configure ?  The latest CI is of a single patch without the LZ4 stuff, so 
> > I
> > can't check its log.
> >
> > configure: WARNING: lz4.h: accepted by the compiler, rejected by the 
> > preprocessor!
> > configure: WARNING: lz4.h: proceeding with the compiler's result
> 
> No, I don't see this. I wonder whether this could possibly be an
> installation issue on Andrey's machine? If not, it must be
> version-dependent or installation-dependent in some way.

Andrey, can you check if latest HEAD (bbe0a81db) has these ./configure warnings 
?

If so, can you check if your environment is clean, specifically lz4.h - if
you've installed LZ4 from source, I have to imagine that's relevant.
Most users won't do that, but it should be a supported configuration, too.

-- 
Justin




Re: [HACKERS] Custom compression methods (./configure)

2021-03-20 Thread Tom Lane
Justin Pryzby  writes:
> On Fri, Mar 19, 2021 at 05:35:58PM -0300, Alvaro Herrera wrote:
>> I find this behavior confusing; I'd rather have configure error out if
>> it can't find the package support I requested, than continuing with a
>> set of configure options different from what I gave.

> That's clearly wrong, but that's not the behavior I see:

Yeah, it errors out as-expected for me too, on a couple of different
machines (see sifaka's latest run for documentation).

regards, tom lane




Re: [HACKERS] Custom compression methods (./configure)

2021-03-20 Thread Alvaro Herrera
On 2021-Mar-20, Justin Pryzby wrote:

> On Fri, Mar 19, 2021 at 05:35:58PM -0300, Alvaro Herrera wrote:
> > Hmm, if I use configure --with-lz4, I get this:
> > 
> > checking whether to build with LZ4 support... yes
> > checking for liblz4... no
> > configure: error: Package requirements (liblz4) were not met:
> > 
> > No package 'liblz4' found
> ...
> > See the pkg-config man page for more details.
> > running CONFIG_SHELL=/bin/bash /bin/bash /pgsql/source/master/configure 
> > --enable-debug --enable-depend --enable-cassert --enable-nls 
> > --cache-file=/home/alvherre/run/pgconfig.master.cache 
> > --enable-thread-safety --with-python --with-perl --with-tcl --with-openssl 
> > --with-libxml --enable-tap-tests --with-tclconfig=/usr/lib/tcl8.6 
> > PYTHON=/usr/bin/python3 --with-llvm --prefix=/pgsql/install/master 
> > --with-pgport=55432 --no-create --no-recursion

> I can't reproduce the behavior - is it because of your --cache-file or
> something ?

Argh, yeah, you're right -- my custom scripting was confusing the issue,
by rerunning configure automatically with the options previously in the
cache file.  I had the equivalent of "configure ; make" so when
configure failed, the make step re-ran configure using the options in
the cache file, which did not have --with-lz4.

-- 
Álvaro Herrera   Valdivia, Chile




Re: [HACKERS] Custom compression methods (./configure)

2021-03-20 Thread Justin Pryzby
On Fri, Mar 19, 2021 at 05:35:58PM -0300, Alvaro Herrera wrote:
> Hmm, if I use configure --with-lz4, I get this:
> 
>   checking whether to build with LZ4 support... yes
>   checking for liblz4... no
>   configure: error: Package requirements (liblz4) were not met:
> 
>   No package 'liblz4' found
...
>   See the pkg-config man page for more details.
>   running CONFIG_SHELL=/bin/bash /bin/bash /pgsql/source/master/configure 
> --enable-debug --enable-depend --enable-cassert --enable-nls 
> --cache-file=/home/alvherre/run/pgconfig.master.cache --enable-thread-safety 
> --with-python --with-perl --with-tcl --with-openssl --with-libxml 
> --enable-tap-tests --with-tclconfig=/usr/lib/tcl8.6 PYTHON=/usr/bin/python3 
> --with-llvm --prefix=/pgsql/install/master --with-pgport=55432 --no-create 
> --no-recursion
>   ...
> 
> I find this behavior confusing; I'd rather have configure error out if
> it can't find the package support I requested, than continuing with a
> set of configure options different from what I gave.

That's clearly wrong, but that's not the behavior I see:

|$ ./configure --with-lz4 ; echo $?
|...
|checking for liblz4... no
|configure: error: Package requirements (liblz4) were not met:
|
|No package 'liblz4' found
|
|Consider adjusting the PKG_CONFIG_PATH environment variable if you
|installed software in a non-standard prefix.
|
|Alternatively, you may set the environment variables LZ4_CFLAGS
|and LZ4_LIBS to avoid the need to call pkg-config.
|See the pkg-config man page for more details.
|1

I can't reproduce the behavior - is it because of your --cache-file or
something ?

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tom Lane
Andrew Dunstan  writes:
> On 3/20/21 3:03 PM, Tom Lane wrote:
>> I fixed up some issues in 0008/0009 (mostly cosmetic, except that
>> you forgot a server version check in dumpToastCompression) and
>> pushed that, so we can see if it makes crake happy.

> It's still produced a significant amount more difference between the
> dumps. For now I've increased the fuzz factor a bit like this:

Ah, our emails crossed.

> I'll try to come up with something better. Maybe just ignore lines like
> SET default_toast_compression = 'pglz';
> when taking the diff.

I noticed that there were a fair number of other diffs besides those.
Seems like we need some better comparison technology, really, but I'm
not certain what.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tom Lane
I wrote:
> I fixed up some issues in 0008/0009 (mostly cosmetic, except that
> you forgot a server version check in dumpToastCompression) and
> pushed that, so we can see if it makes crake happy.

crake was still unhappy with that:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2021-03-20%2019%3A03%3A56

but I see it just went green ... did you do something to adjust
the expected output?

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Andrew Dunstan


On 3/20/21 3:03 PM, Tom Lane wrote:
> I wrote:
>> Yeah, _doSetFixedOutputState is the wrong place: that runs on the
>> pg_restore side of the fence, and would not have access to the
>> necessary info in a separated dump/restore run.
>> It might be necessary to explicitly pass the state through in a TOC item,
>> as we do for things like the standard_conforming_strings setting.
> Ah, now that I read your patch I see that's exactly what you did.
>
> I fixed up some issues in 0008/0009 (mostly cosmetic, except that
> you forgot a server version check in dumpToastCompression) and
> pushed that, so we can see if it makes crake happy.
>
>   


It's still produced a significant amount more difference between the
dumps. For now I've increased the fuzz factor a bit like this:


diff --git a/PGBuild/Modules/TestUpgradeXversion.pm
b/PGBuild/Modules/TestUpgradeXversion.pm
index 1d1d313..567d7cb 100644
--- a/PGBuild/Modules/TestUpgradeXversion.pm
+++ b/PGBuild/Modules/TestUpgradeXversion.pm
@@ -621,7 +621,7 @@ sub test_upgrade    ## no critic
(Subroutines::ProhibitManyArgs)
    # generally from reordering of larg object output.
    # If not we heuristically allow up to 2000 lines of diffs
 
-   if (   ($oversion ne $this_branch && $difflines < 2000)
+   if (   ($oversion ne $this_branch && $difflines < 2700)
    || ($oversion eq $this_branch) && $difflines < 50)
    {
    return 1;


I'll try to come up with something better. Maybe just ignore lines like

SET default_toast_compression = 'pglz';

when taking the diff.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tom Lane
I wrote:
> Yeah, _doSetFixedOutputState is the wrong place: that runs on the
> pg_restore side of the fence, and would not have access to the
> necessary info in a separated dump/restore run.

> It might be necessary to explicitly pass the state through in a TOC item,
> as we do for things like the standard_conforming_strings setting.

Ah, now that I read your patch I see that's exactly what you did.

I fixed up some issues in 0008/0009 (mostly cosmetic, except that
you forgot a server version check in dumpToastCompression) and
pushed that, so we can see if it makes crake happy.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Justin Pryzby
On Sat, Mar 20, 2021 at 01:36:15PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Sat, Mar 20, 2021 at 04:37:53PM +0530, Dilip Kumar wrote:
> >> - And, 0008 and 0009, I think my
> >> 0001-Fixup-dump-toast-compression-method.patch[1] is doing this in a
> >> much simpler way, please have a look and let me know if you think that
> >> has any problems and we need to do the way you are doing here?
> 
> > I tested and saw that your patch doesn't output "SET 
> > default_toast_compression"
> > in non-text dumps (pg_dump -Fc).
> 
> Yeah, _doSetFixedOutputState is the wrong place: that runs on the
> pg_restore side of the fence, and would not have access to the
> necessary info in a separated dump/restore run.
> 
> It might be necessary to explicitly pass the state through in a TOC item,
> as we do for things like the standard_conforming_strings setting.

My patches do this in 0008 and 0009 - I'd appreciate if you'd take a look.
0009 edits parts of 0008, and if that's all correct then they should be
squished together.

https://www.postgresql.org/message-id/20210320074420.GR11765%40telsasoft.com

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tom Lane
Justin Pryzby  writes:
> On Sat, Mar 20, 2021 at 04:37:53PM +0530, Dilip Kumar wrote:
>> - And, 0008 and 0009, I think my
>> 0001-Fixup-dump-toast-compression-method.patch[1] is doing this in a
>> much simpler way, please have a look and let me know if you think that
>> has any problems and we need to do the way you are doing here?

> I tested and saw that your patch doesn't output "SET 
> default_toast_compression"
> in non-text dumps (pg_dump -Fc).

Yeah, _doSetFixedOutputState is the wrong place: that runs on the
pg_restore side of the fence, and would not have access to the
necessary info in a separated dump/restore run.

It might be necessary to explicitly pass the state through in a TOC item,
as we do for things like the standard_conforming_strings setting.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Justin Pryzby
On Sat, Mar 20, 2021 at 04:37:53PM +0530, Dilip Kumar wrote:
> - 0006 is fine but not sure what is the advantage over what we have today?

The advantage is that it's dozens of lines shorter, and automatically includes
a HINT.
 SET default_toast_compression = 'I do not exist compression';  

   
 ERROR:  invalid value for parameter "default_toast_compression": "I do not 
exist compression"  
   
-DETAIL:  Compression method "I do not exist compression" does not exist.   

   
+HINT:  Available values: pglz, lz4.

   

If we use a GUC hook, I think it should be to special case lz4 to say:
"..must be enabled when PG was built".

> - And, 0008 and 0009, I think my
> 0001-Fixup-dump-toast-compression-method.patch[1] is doing this in a
> much simpler way, please have a look and let me know if you think that
> has any problems and we need to do the way you are doing here?

I tested and saw that your patch doesn't output "SET default_toast_compression"
in non-text dumps (pg_dump -Fc).  Also, I think the internal newline should be
removed:

ALTER TABLE public.t ALTER COLUMN b
SET COMPRESSION lz4;

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Justin Pryzby
On Sat, Mar 20, 2021 at 04:13:47PM +0100, Tomas Vondra wrote:
> +++ b/src/backend/access/brin/brin_tuple.c
> @@ -213,10 +213,20 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, 
> BrinMemTuple *tuple,
>   (atttype->typstorage == TYPSTORAGE_EXTENDED ||
>atttype->typstorage == TYPSTORAGE_MAIN))
>   {
> + Datum   cvalue;
> + charcompression = 
> GetDefaultToastCompression();
>   Form_pg_attribute att = 
> TupleDescAttr(brdesc->bd_tupdesc,
>   
>   keyno);
> - Datum   cvalue = 
> toast_compress_datum(value,
> - 
>   att->attcompression);
> +
> + /*
> +  * If the BRIN summary and indexed attribute 
> use the same data
> +  * type, we can the same compression method. 
> Otherwise we have

can *use ?

> +  * to use the default method.
> +  */
> + if (att->atttypid == atttype->type_id)
> + compression = att->attcompression;

It would be more obvious to me if this said here: 
| else: compression = GetDefaultToastCompression

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tomas Vondra


On 3/20/21 11:45 AM, Tomas Vondra wrote:
> 
> 
> On 3/20/21 11:18 AM, Dilip Kumar wrote:
>> On Sat, Mar 20, 2021 at 3:05 PM Tomas Vondra
>>  wrote:
>>>
>>> Hi,
>>>
>>> I think this bit in brin_tuple.c is wrong:
>>>
>>> ...
>>> Form_pg_attribute att = TupleDescAttr(brdesc->bd_tupdesc,
>>>   keyno);
>>> Datum   cvalue = toast_compress_datum(value,
>>>   att->attcompression);
>>>
>>> The problem is that this is looking at the index descriptor (i.e. what
>>> types are indexed) instead of the stored type. For BRIN those may be
>>> only loosely related, which is why the code does this a couple lines above:
>>>
>>> /* We must look at the stored type, not at the index descriptor. */
>>> TypeCacheEntry *atttype
>>> = brdesc->bd_info[keyno]->oi_typcache[datumno];
>>
>> Ok, I was not aware of this.
>>
> 
> Yeah, the BRIN internal structure is not obvious, and the fact that all
> the built-in BRIN variants triggers the issue makes it harder to spot.
> 
>>> For the built-in BRIN opclasses this happens to work, because e.g.
>>> minmax stores two values of the original type. But it may not work for
>>> other out-of-core opclasses, and it certainly doesn't work for the new
>>> BRIN opclasses (bloom and minmax-multi).
>>
>> Okay
>>
>>> Unfortunately, the only thing we have here is the type OID, so I guess
>>> the only option is using GetDefaultToastCompression(). Perhaps we might
>>> include that into BrinOpcInfo too, in the future.
>>
>> Right, I think for now we can use default compression for this case.
>>
> 
> Good. I wonder if we might have "per type" preferred compression in the
> future, which would address this. But for now just using the default
> compression seems fine.
> 

Actually, we can be a bit smarter - when the data types match, we can
use the compression method defined for the attribute. That works fine
for all built-in BRIN opclasses, and it seems quite reasonable - if the
user picked a particular compression method for a column, it's likely
because the data compress better with that method. So why not use that
for the BRIN summary, when possible (even though the BRIN indexes tend
to be tiny).

Attached is a patch doing this. Barring objection I'll push that soon,
so that I can push the BRIN index improvements (bloom etc.).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 7f3b29dae25b08f28cd84dbbeb069d0c7759fafc Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 20 Mar 2021 10:24:00 +0100
Subject: [PATCH] Use valid compression method in brin_form_tuple

When compressing the BRIN summary, we can't simply use the compression
method from the indexed attribute.  The summary may use a different data
type, e.g. fixed-length attribute may have varlena summary, leading to
compression failures.  For the built-in BRIN opclasses this happens to
work, because the summary uses the same data type as the attribute.

When the data types match, we can inherit use the compression method
specified for the attribute (it's copied into the index descriptor).
Otherwise we don't have much choice and have to use the default one.

Author: Tomas Vondra
Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com
---
 src/backend/access/brin/brin_tuple.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 0ab5712c71..279a7e5970 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -213,10 +213,20 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
 (atttype->typstorage == TYPSTORAGE_EXTENDED ||
  atttype->typstorage == TYPSTORAGE_MAIN))
 			{
+Datum	cvalue;
+char	compression = GetDefaultToastCompression();
 Form_pg_attribute att = TupleDescAttr(brdesc->bd_tupdesc,
 	  keyno);
-Datum		cvalue = toast_compress_datum(value,
-		  att->attcompression);
+
+/*
+ * If the BRIN summary and indexed attribute use the same data
+ * type, we can the same compression method. Otherwise we have
+ * to use the default method.
+ */
+if (att->atttypid == atttype->type_id)
+	compression = att->attcompression;
+
+cvalue = toast_compress_datum(value, compression);
 
 if (DatumGetPointer(cvalue) != NULL)
 {
-- 
2.30.2



Re: [HACKERS] Custom compression methods

2021-03-20 Thread Dilip Kumar
On Sat, Mar 20, 2021 at 1:14 PM Justin Pryzby  wrote:
>
> See attached.

I have looked into your patches

- 0001 to 0005 and 0007 look fine to me so maybe you can merge them
all and create a fixup patch.  Thanks for fixing this, these were some
silly mistakes I made in my patch.
- 0006 is fine but not sure what is the advantage over what we have today?
- And, 0008 and 0009, I think my
0001-Fixup-dump-toast-compression-method.patch[1] is doing this in a
much simpler way, please have a look and let me know if you think that
has any problems and we need to do the way you are doing here?

[1] 
https://www.postgresql.org/message-id/CAFiTN-v7EULPqVJ-6J%3DzH6n0%2BkO%3DYFtgpte%2BFTre%3DWrwcWBBTA%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tomas Vondra



On 3/20/21 11:18 AM, Dilip Kumar wrote:
> On Sat, Mar 20, 2021 at 3:05 PM Tomas Vondra
>  wrote:
>>
>> Hi,
>>
>> I think this bit in brin_tuple.c is wrong:
>>
>> ...
>> Form_pg_attribute att = TupleDescAttr(brdesc->bd_tupdesc,
>>   keyno);
>> Datum   cvalue = toast_compress_datum(value,
>>   att->attcompression);
>>
>> The problem is that this is looking at the index descriptor (i.e. what
>> types are indexed) instead of the stored type. For BRIN those may be
>> only loosely related, which is why the code does this a couple lines above:
>>
>> /* We must look at the stored type, not at the index descriptor. */
>> TypeCacheEntry *atttype
>> = brdesc->bd_info[keyno]->oi_typcache[datumno];
> 
> Ok, I was not aware of this.
> 

Yeah, the BRIN internal structure is not obvious, and the fact that all
the built-in BRIN variants triggers the issue makes it harder to spot.

>> For the built-in BRIN opclasses this happens to work, because e.g.
>> minmax stores two values of the original type. But it may not work for
>> other out-of-core opclasses, and it certainly doesn't work for the new
>> BRIN opclasses (bloom and minmax-multi).
> 
> Okay
> 
>> Unfortunately, the only thing we have here is the type OID, so I guess
>> the only option is using GetDefaultToastCompression(). Perhaps we might
>> include that into BrinOpcInfo too, in the future.
> 
> Right, I think for now we can use default compression for this case.
> 

Good. I wonder if we might have "per type" preferred compression in the
future, which would address this. But for now just using the default
compression seems fine.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Dilip Kumar
On Sat, Mar 20, 2021 at 3:05 PM Tomas Vondra
 wrote:
>
> Hi,
>
> I think this bit in brin_tuple.c is wrong:
>
> ...
> Form_pg_attribute att = TupleDescAttr(brdesc->bd_tupdesc,
>   keyno);
> Datum   cvalue = toast_compress_datum(value,
>   att->attcompression);
>
> The problem is that this is looking at the index descriptor (i.e. what
> types are indexed) instead of the stored type. For BRIN those may be
> only loosely related, which is why the code does this a couple lines above:
>
> /* We must look at the stored type, not at the index descriptor. */
> TypeCacheEntry *atttype
> = brdesc->bd_info[keyno]->oi_typcache[datumno];

Ok, I was not aware of this.

> For the built-in BRIN opclasses this happens to work, because e.g.
> minmax stores two values of the original type. But it may not work for
> other out-of-core opclasses, and it certainly doesn't work for the new
> BRIN opclasses (bloom and minmax-multi).

Okay

> Unfortunately, the only thing we have here is the type OID, so I guess
> the only option is using GetDefaultToastCompression(). Perhaps we might
> include that into BrinOpcInfo too, in the future.

Right, I think for now we can use default compression for this case.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Tomas Vondra
Hi,

I think this bit in brin_tuple.c is wrong:

...
Form_pg_attribute att = TupleDescAttr(brdesc->bd_tupdesc,
  keyno);
Datum   cvalue = toast_compress_datum(value,
  att->attcompression);

The problem is that this is looking at the index descriptor (i.e. what
types are indexed) instead of the stored type. For BRIN those may be
only loosely related, which is why the code does this a couple lines above:

/* We must look at the stored type, not at the index descriptor. */
TypeCacheEntry *atttype
= brdesc->bd_info[keyno]->oi_typcache[datumno];

For the built-in BRIN opclasses this happens to work, because e.g.
minmax stores two values of the original type. But it may not work for
other out-of-core opclasses, and it certainly doesn't work for the new
BRIN opclasses (bloom and minmax-multi).

Unfortunately, the only thing we have here is the type OID, so I guess
the only option is using GetDefaultToastCompression(). Perhaps we might
include that into BrinOpcInfo too, in the future.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Andres Freund
On 2021-03-19 15:44:34 -0400, Robert Haas wrote:
> I committed the core patch (0003) with a bit more editing.  Let's see
> what the buildfarm thinks.

Congrats Dilip, Robert, All. The slow toast compression has been a
significant issue for a long time.




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Justin Pryzby
On Fri, Mar 19, 2021 at 04:38:03PM -0400, Robert Haas wrote:
> On Fri, Mar 19, 2021 at 4:20 PM Tom Lane  wrote:
> > Nope ... crake's displeased with your assumption that it's OK to
> > clutter dumps with COMPRESSION clauses.  As am I: that is going to
> > be utterly fatal for cross-version transportation of dumps.
> 
> Regarding your point, that does look like clutter. We don't annotate
> the dump with a storage clause unless it's non-default, so probably we
> should do the same thing here. I think I gave Dilip bad advice here...

On Fri, Mar 19, 2021 at 05:49:37PM -0400, Robert Haas wrote:
> Here's a patch for that. It's a little strange because you're going to
> skip dumping the toast compression based on the default value on the
> source system, but that might not be the default on the system where
> the dump is being restored, so you could fail to recreate the state
> you had. That is avoidable if you understand how things work, but some
> people might not. I don't have a better idea, though, so let me know
> what you think of this.

On Fri, Mar 19, 2021 at 07:22:42PM -0300, Alvaro Herrera wrote:
> Do you mean the column storage strategy, attstorage?  I don't think
> that's really related, because the difference there is not a GUC setting
> but a compiled-in default for the type.  In the case of compression, I'm
> not sure it makes sense to do it like that, but I can see the clutter
> argument: if we dump compression for all columns, it's going to be super
> noisy.
> 
> (At least, for binary upgrade surely you must make sure to apply the
> correct setting regardless of defaults on either system).
> 
> Maybe it makes sense to dump the compression clause if it is different
> from pglz, regardless of the default on the source server.  Then, if the
> target server has chosen lz4 as default, *all* columns are going to end
> up as lz4, and if it hasn't, then only the ones that were lz4 in the
> source server are going to.  That seems reasonable behavior.  Also, if
> some columns are lz4 in source, and target does not have lz4, then
> everything is going to work out to not-lz4 with just a bunch of errors
> in the output.

I think what's missing is dumping the GUC value itself, and then also dump any
columns that differ from the GUC's setting.  An early version of the GUC patch
actually had an "XXX" comment about pg_dump support, and I was waiting for a
review before polishing it.  This was modelled after default_tablespace and
default_table_access_method - I've mentioned that before that there's no
pg_restore --no-table-am, and I have an unpublished patch to add it.  That may
be how I missed this until now.

Then, this will output COMPRESSION on "a" (x)or "b" depending on the current
default:
| CREATE TABLE a(a text compression lz4, b text compression pglz);
When we restore it, we set the default before restoring columns.

I think it may be a good idea to document that dumps of columns with
non-default compression aren't portable to older server versions, or servers
--without-lz4.  This is a consequence of the CREATE command being a big text
blob, so pg_restore can't reasonably elide the COMPRESSION clause.

While looking at this, I realized that someone added the GUC to
postgresql.conf.sample, but not to doc/ - this was a separate patch until
yesterday.

I think since we're not doing catalog access for "pluggable" compression, this
should just be an enum GUC, with #ifdef LZ4.  Then we don't need a hook to
validate it.

ALTER and CREATE are silently accepting bogus compression names.

I can write patches for these later.

-- 
Justin




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Dilip Kumar
On Sat, Mar 20, 2021 at 1:22 PM Dilip Kumar  wrote:
>
> On Sat, Mar 20, 2021 at 8:11 AM Robert Haas  wrote:
> >
> > On Fri, Mar 19, 2021 at 8:25 PM Tom Lane  wrote:
> > > Extrapolating from the way we've dealt with similar issues
> > > in the past, I think the structure of pg_dump's output ought to be:
> > >
> > > 1. SET default_toast_compression = 'source system's value'
> > > in among the existing passel of SETs at the top.  Doesn't
> > > matter whether or not that is the compiled-in value.
> > >
> > > 2. No mention of compression in any CREATE TABLE command.
> > >
> > > 3. For any column having a compression option different from
> > > the default, emit ALTER TABLE SET ... to set that option after
> > > the CREATE TABLE.  (You did implement such a SET, I trust.)
> >
> > Actually, *I* didn't implement any of this. But ALTER TABLE sometab
> > ALTER somecol SET COMPRESSION somealgo works.
> >
> > This sounds like a reasonable approach.
>
> The attached patch implements that.

After sending this, just saw Justin also included patches for this.  I
think the ALTER ..SET COMPRESSION is more or less similar, I just
fetched it from the older version of the patch set.  But SET
default_toast_compression are slightly different.  I will look into
your version and provide my opinion on which one looks better and we
can commit that and feel free to share your thoughts.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-20 Thread Dilip Kumar
On Sat, Mar 20, 2021 at 8:11 AM Robert Haas  wrote:
>
> On Fri, Mar 19, 2021 at 8:25 PM Tom Lane  wrote:
> > Extrapolating from the way we've dealt with similar issues
> > in the past, I think the structure of pg_dump's output ought to be:
> >
> > 1. SET default_toast_compression = 'source system's value'
> > in among the existing passel of SETs at the top.  Doesn't
> > matter whether or not that is the compiled-in value.
> >
> > 2. No mention of compression in any CREATE TABLE command.
> >
> > 3. For any column having a compression option different from
> > the default, emit ALTER TABLE SET ... to set that option after
> > the CREATE TABLE.  (You did implement such a SET, I trust.)
>
> Actually, *I* didn't implement any of this. But ALTER TABLE sometab
> ALTER somecol SET COMPRESSION somealgo works.
>
> This sounds like a reasonable approach.

The attached patch implements that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 40b53e24932b1f9203092a7f6972804af5a7a45b Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Sat, 20 Mar 2021 13:14:39 +0530
Subject: [PATCH] Fixup, dump toast compression method

---
 src/bin/pg_dump/pg_backup.h  |  1 +
 src/bin/pg_dump/pg_backup_archiver.c |  4 +++
 src/bin/pg_dump/pg_dump.c| 66 ++--
 src/bin/pg_dump/t/002_pg_dump.pl | 12 +++
 4 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0296b9b..02019fc 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -210,6 +210,7 @@ typedef struct Archive
 	/* other important stuff */
 	char	   *searchpath;		/* search_path to set during restore */
 	char	   *use_role;		/* Issue SET ROLE to this */
+	char	   *default_toast_compression;
 
 	/* error handling */
 	bool		exit_on_error;	/* whether to exit on SQL errors... */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1f82c64..a25f4d1 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -3152,6 +3152,10 @@ _doSetFixedOutputState(ArchiveHandle *AH)
 	else
 		ahprintf(AH, "SET row_security = off;\n");
 
+	/* Select the dump-time default toast compression */
+	if (AH->public.default_toast_compression)
+		ahprintf(AH, "SET default_toast_compression = '%s';\n",
+ AH->public.default_toast_compression);
 	ahprintf(AH, "\n");
 }
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..7bbcaac 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1248,6 +1248,21 @@ setup_connection(Archive *AH, const char *dumpencoding,
 
 		AH->sync_snapshot_id = get_synchronized_snapshot(AH);
 	}
+
+	/*
+	 * Get default TOAST compression method, but not if the server's too
+	 * old to support the feature or if the user doesn't want to dump that
+	 * information anyway.
+	 */
+	if (AH->remoteVersion >= 14 && !dopt->no_toast_compression)
+	{
+		PGresult   *res;
+
+		res = ExecuteSqlQueryForSingleRow(AH,
+		  "SHOW default_toast_compression");
+		AH->default_toast_compression = pg_strdup(PQgetvalue(res, 0, 0));
+		PQclear(res);
+	}
 }
 
 /* Set up connection for a parallel worker process */
@@ -15905,31 +15920,6 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 		  tbinfo->atttypnames[j]);
 	}
 
-	/*
-	 * Attribute compression
-	 */
-	if (!dopt->no_toast_compression &&
-		tbinfo->attcompression != NULL)
-	{
-		char	   *cmname;
-
-		switch (tbinfo->attcompression[j])
-		{
-			case 'p':
-cmname = "pglz";
-break;
-			case 'l':
-cmname = "lz4";
-break;
-			default:
-cmname = NULL;
-break;
-		}
-
-		if (cmname != NULL)
-			appendPQExpBuffer(q, " COMPRESSION %s", cmname);
-	}
-
 	if (print_default)
 	{
 		if (tbinfo->attgenerated[j] == ATTRIBUTE_GENERATED_STORED)
@@ -16348,6 +16338,32 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
   qualrelname,
   fmtId(tbinfo->attnames[j]),
   tbinfo->attfdwoptions[j]);
+
+			/*
+			 * Dump per-column compression options
+			 */
+			if (!dopt->no_toast_compression && tbinfo->attcompression != NULL)
+			{
+char	   *cmname;
+
+switch (tbinfo->attcompression[j])
+{
+	case 'p':
+		cmname = "pglz";
+		break;
+	case 'l':
+		cmname = "lz4";
+		break;
+	default:
+		cmname = NULL;
+		break;
+}
+
+if (cmname != NULL &&
+	strcmp(cmname, fout->default_toast_compression) != 0)
+	appendPQExpBuffer(q, "ALTER TABLE %s ALTER COLUMN %s\nSET COMPRESSION %s;\n",
+		qualrelname, fmtId(tbinfo->attnames[j]), cmname);
+			}
 		}
 
 		if (ftoptions)
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index bc91bb1..737e464 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ 

Re: [HACKERS] Custom compression methods

2021-03-20 Thread Justin Pryzby
See attached.

One issue is that the pg_dump tests are no longer exercising the COMPRESSION
clause.  I don't know how to improve on that, since lz4 may not be available.

..unless we changed attcompression='\0' to mean (for varlena) "the default
compression".  Rather than "resolving" to the default compression at the time
the table is created, columns without an explicit compression set would "defer"
to the GUC (of course, that only affects newly-inserted data).

Then, I think pg_dump would generate an COMPRESSION clause for columns with any
compression other than a null byte, and then the tests could "SET COMPRESSION
pglz" and check the output, since it's set to a specific compression, not just
inheriting the default.

I'm not sure if that'd be desirable, but I think that's similar to tablespaces,
where (if I recall) reltablespace=0 means "this database's default tblspc".

-- 
Justin
>From 58335318a9e72d0dbdf8c2009ba6d195a6cba862 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 19:12:53 -0500
Subject: [PATCH 1/9] Add docs for default_toast_compression..

bbe0a81db69bd10bd166907c3701492a29aca294
---
 doc/src/sgml/config.sgml | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..5cb851a5eb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8151,6 +8151,29 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  default_toast_compression (string)
+  
+   default_toast_compression configuration parameter
+  
+  
+  
+   
+This parameter specifies the default compression method to use
+when compressing data in TOAST tables.
+It applies only to variable-width data types.
+It may be overriden by compression clauses in the
+CREATE command, or changed after the relation is
+created by ALTER TABLE ... SET COMPRESSION.
+
+The supported compression methods are pglz and
+(if configured at the time PostgreSQL was
+built) lz4.
+The default is pglz.
+   
+  
+ 
+
  
   temp_tablespaces (string)
   
-- 
2.17.0

>From ae52d1db7fa5f7731bc5c1170dfb2ec5c39b111a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 21:52:28 -0500
Subject: [PATCH 2/9] doc: pg_dump --no-toast-compression

---
 doc/src/sgml/ref/pg_dump.sgml | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index bcbb7a25fb..4a521186fb 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -931,6 +931,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-toast-compression
+  
+   
+Do not output commands to set TOASTcompression
+methods.
+With this option, all objects will be created using whichever
+compression method is the default during restore.
+   
+  
+ 
+
  
   --no-unlogged-table-data
   
-- 
2.17.0

>From c47fafc376f847f4463b146f467c6046e44e5afa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 20:23:40 -0500
Subject: [PATCH 3/9] Compression method is an char not an OID

---
 src/backend/commands/tablecmds.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9b2800bf5e..8e756e59d5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7847,6 +7847,7 @@ SetIndexStorageProperties(Relation rel, Relation attrelation,
 		index_close(indrel, lockmode);
 	}
 }
+
 /*
  * ALTER TABLE ALTER COLUMN SET STORAGE
  *
@@ -15070,7 +15071,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	AttrNumber	attnum;
 	char	   *compression;
 	char		typstorage;
-	Oid			cmoid;
+	char		cmethod;
 	Datum		values[Natts_pg_attribute];
 	bool		nulls[Natts_pg_attribute];
 	bool		replace[Natts_pg_attribute];
@@ -15111,9 +15112,9 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	memset(replace, false, sizeof(replace));
 
 	/* get the attribute compression method. */
-	cmoid = GetAttributeCompression(atttableform, compression);
+	cmethod = GetAttributeCompression(atttableform, compression);
 
-	atttableform->attcompression = cmoid;
+	atttableform->attcompression = cmethod;
 	CatalogTupleUpdate(attrel, >t_self, tuple);
 
 	InvokeObjectPostAlterHook(RelationRelationId,
@@ -15123,7 +15124,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	ReleaseSysCache(tuple);
 
 	/* apply changes to the index column as well */
-	SetIndexStorageProperties(rel, attrel, attnum, cmoid, '\0', lockmode);
+	SetIndexStorageProperties(rel, attrel, attnum, cmethod, '\0', lockmode);
 	table_close(attrel, RowExclusiveLock);
 
 	/* make changes visible */
-- 
2.17.0

>From f577d25165815993b0289c75d5395d23409863cc Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 

Re: [HACKERS] Custom compression methods

2021-03-19 Thread Robert Haas
On Fri, Mar 19, 2021 at 6:38 PM Alvaro Herrera  wrote:
> I suggest to add comments to this effect,
> perhaps as the attached (feel free to reword, I think mine is awkward.)

It's not bad, although "the decompressed version of the full datum"
might be a little better. I'd probably say instead: "This method might
decompress the entire datum rather than just a slice, if slicing is
not supported." or something of to that effect. Feel free to commit
something you like.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-19 Thread Robert Haas
On Fri, Mar 19, 2021 at 8:25 PM Tom Lane  wrote:
> Extrapolating from the way we've dealt with similar issues
> in the past, I think the structure of pg_dump's output ought to be:
>
> 1. SET default_toast_compression = 'source system's value'
> in among the existing passel of SETs at the top.  Doesn't
> matter whether or not that is the compiled-in value.
>
> 2. No mention of compression in any CREATE TABLE command.
>
> 3. For any column having a compression option different from
> the default, emit ALTER TABLE SET ... to set that option after
> the CREATE TABLE.  (You did implement such a SET, I trust.)

Actually, *I* didn't implement any of this. But ALTER TABLE sometab
ALTER somecol SET COMPRESSION somealgo works.

This sounds like a reasonable approach.

> There might be scope for a dump option to suppress mention
> of compression altogether (comparable to, eg, --no-tablespaces).
> But I think that's optional.  In any case, we don't want
> to put people in a position where they should have used such
> an option and now they have no good way to recover their
> dump to the system they want to recover to.

The patch already has --no-toast-compression.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2021-03-19 Thread Andrew Dunstan


On 3/19/21 8:25 PM, Tom Lane wrote:
> Alvaro Herrera  writes:
>> On 2021-Mar-19, Robert Haas wrote:
>>> Well, I really do hope that some day in the bright future, pglz will
>>> no longer be the thing we're shipping as the postgresql.conf default.
>>> So we'd just be postponing the noise until then. I think we need a
>>> better idea than that.
>> Hmm, why?  In that future, we can just change the pg_dump behavior to no
>> longer dump the compression clause if it's lz4 or whatever better
>> algorithm we choose.  So I think I'm clarifying my proposal to be "dump
>> the compression clause if it's different from the compiled-in default"
>> rather than "different from the GUC default".
> Extrapolating from the way we've dealt with similar issues
> in the past, I think the structure of pg_dump's output ought to be:
>
> 1. SET default_toast_compression = 'source system's value'
> in among the existing passel of SETs at the top.  Doesn't
> matter whether or not that is the compiled-in value.
>
> 2. No mention of compression in any CREATE TABLE command.
>
> 3. For any column having a compression option different from
> the default, emit ALTER TABLE SET ... to set that option after
> the CREATE TABLE.  (You did implement such a SET, I trust.)
>
> This minimizes the chatter for the normal case where all or most
> columns have the same setting, and more importantly it allows the
> dump to be read by older PG systems (or non-PG systems, or newer
> systems built without --with-lz4) that would fail altogether
> if the CREATE TABLE commands contained compression options.
> To use the dump that way, you do have to be willing to ignore
> errors from the SET and the ALTERs ... but that beats the heck
> out of having to manually edit the dump script to get rid of
> embedded COMPRESSION clauses.
>
> I'm not sure whether we'd still need to mess around beyond
> that to make the buildfarm's existing upgrade tests happy.
> But we *must* do this much in any case, because as it stands
> this patch has totally destroyed some major use-cases for
> pg_dump.
>
> There might be scope for a dump option to suppress mention
> of compression altogether (comparable to, eg, --no-tablespaces).
> But I think that's optional.  In any case, we don't want
> to put people in a position where they should have used such
> an option and now they have no good way to recover their
> dump to the system they want to recover to.
>
>   


I'm fairly sure this prescription would satisfy the buildfarm. It sounds
pretty sane to me - I'd independently come to a very similar conclusion
before reading the above.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [HACKERS] Custom compression methods

2021-03-19 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Mar-19, Robert Haas wrote:
>> Well, I really do hope that some day in the bright future, pglz will
>> no longer be the thing we're shipping as the postgresql.conf default.
>> So we'd just be postponing the noise until then. I think we need a
>> better idea than that.

> Hmm, why?  In that future, we can just change the pg_dump behavior to no
> longer dump the compression clause if it's lz4 or whatever better
> algorithm we choose.  So I think I'm clarifying my proposal to be "dump
> the compression clause if it's different from the compiled-in default"
> rather than "different from the GUC default".

Extrapolating from the way we've dealt with similar issues
in the past, I think the structure of pg_dump's output ought to be:

1. SET default_toast_compression = 'source system's value'
in among the existing passel of SETs at the top.  Doesn't
matter whether or not that is the compiled-in value.

2. No mention of compression in any CREATE TABLE command.

3. For any column having a compression option different from
the default, emit ALTER TABLE SET ... to set that option after
the CREATE TABLE.  (You did implement such a SET, I trust.)

This minimizes the chatter for the normal case where all or most
columns have the same setting, and more importantly it allows the
dump to be read by older PG systems (or non-PG systems, or newer
systems built without --with-lz4) that would fail altogether
if the CREATE TABLE commands contained compression options.
To use the dump that way, you do have to be willing to ignore
errors from the SET and the ALTERs ... but that beats the heck
out of having to manually edit the dump script to get rid of
embedded COMPRESSION clauses.

I'm not sure whether we'd still need to mess around beyond
that to make the buildfarm's existing upgrade tests happy.
But we *must* do this much in any case, because as it stands
this patch has totally destroyed some major use-cases for
pg_dump.

There might be scope for a dump option to suppress mention
of compression altogether (comparable to, eg, --no-tablespaces).
But I think that's optional.  In any case, we don't want
to put people in a position where they should have used such
an option and now they have no good way to recover their
dump to the system they want to recover to.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-19 Thread David Steele

On 3/19/21 8:00 PM, Andres Freund wrote:

On 2021-03-19 15:44:34 -0400, Robert Haas wrote:

I committed the core patch (0003) with a bit more editing.  Let's see
what the buildfarm thinks.


Congrats Dilip, Robert, All. The slow toast compression has been a
significant issue for a long time.


Yes, congratulations! This is a terrific improvement.

Plus, now that lz4 is part of configure it lowers the bar for other 
features that want to use it. I'm guessing there will be a few.


Thanks!
--
-David
da...@pgmasters.net




Re: [HACKERS] Custom compression methods

2021-03-19 Thread Alvaro Herrera
On 2021-Mar-19, Robert Haas wrote:

> On Fri, Mar 19, 2021 at 6:22 PM Alvaro Herrera  
> wrote:

> > (At least, for binary upgrade surely you must make sure to apply the
> > correct setting regardless of defaults on either system).
> 
> It's not critical from a system integrity point of view; the catalog
> state just dictates what happens to new data.

Oh, okay.

> You could argue that if, in a future release, we change the default to
> lz4, it's good for pg_upgrade to migrate users to a set of column
> definitions that will use that for new data.

Agreed, that seems a worthy goal.

> > Maybe it makes sense to dump the compression clause if it is different
> > from pglz, regardless of the default on the source server.
> 
> Well, I really do hope that some day in the bright future, pglz will
> no longer be the thing we're shipping as the postgresql.conf default.
> So we'd just be postponing the noise until then. I think we need a
> better idea than that.

Hmm, why?  In that future, we can just change the pg_dump behavior to no
longer dump the compression clause if it's lz4 or whatever better
algorithm we choose.  So I think I'm clarifying my proposal to be "dump
the compression clause if it's different from the compiled-in default"
rather than "different from the GUC default".

-- 
Álvaro Herrera   Valdivia, Chile
"Para tener más hay que desear menos"




Re: [HACKERS] Custom compression methods

2021-03-19 Thread Robert Haas
On Fri, Mar 19, 2021 at 6:22 PM Alvaro Herrera  wrote:
> Do you mean the column storage strategy, attstorage?  I don't think
> that's really related, because the difference there is not a GUC setting
> but a compiled-in default for the type.  In the case of compression, I'm
> not sure it makes sense to do it like that, but I can see the clutter
> argument: if we dump compression for all columns, it's going to be super
> noisy.

I agree.

> (At least, for binary upgrade surely you must make sure to apply the
> correct setting regardless of defaults on either system).

It's not critical from a system integrity point of view; the catalog
state just dictates what happens to new data. You could argue that if,
in a future release, we change the default to lz4, it's good for
pg_upgrade to migrate users to a set of column definitions that will
use that for new data.

> Maybe it makes sense to dump the compression clause if it is different
> from pglz, regardless of the default on the source server.  Then, if the
> target server has chosen lz4 as default, *all* columns are going to end
> up as lz4, and if it hasn't, then only the ones that were lz4 in the
> source server are going to.  That seems reasonable behavior.  Also, if
> some columns are lz4 in source, and target does not have lz4, then
> everything is going to work out to not-lz4 with just a bunch of errors
> in the output.

Well, I really do hope that some day in the bright future, pglz will
no longer be the thing we're shipping as the postgresql.conf default.
So we'd just be postponing the noise until then. I think we need a
better idea than that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




  1   2   3   4   >