Re: PATCH: add "--config-file=" option to pg_rewind

2022-04-07 Thread Gunnar "Nick" Bluth
Am 07.04.22 um 01:54 schrieb Michael Paquier:
> On Wed, Apr 06, 2022 at 02:32:44PM +0200, Gunnar "Nick" Bluth wrote:
>> Ta! Sorry, been ridiculously busy these days, and I do confess that I've
>> been struggling a bit with those tests before ;-)
> 
> No problem.  Double-checked this morning and applied, then.
> --
> Michael

Wonderful!

So, @cyberdemn, who would have thought we get this fixed so quickly?
Time to update the Patroni PR once more! ;-)

Thanks & best regards,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: [PATCH] pg_stat_toast

2022-04-07 Thread Gunnar "Nick" Bluth
Am 06.04.22 um 17:49 schrieb Alvaro Herrera:
> On 2022-Apr-06, Robert Haas wrote:
> 
>> Now if we're only incurring that overhead when this feature is
>> enabled, then in fairness that problem is a lot less of an issue,
>> especially if this is also disabled by default. People who want the
>> data can get it and pay the cost, and others aren't much impacted.
>> However, experience has taught me that a lot of skepticism is
>> warranted when it comes to claims about how cheap extensions to the
>> statistics system will be.
> 
> Maybe this feature should provide a way to be enabled for tables
> individually, so you pay the overhead only where you need it and don't
> swamp the system with stats for uninteresting tables.
> 
That would obviously be very nice (and Georgios pushed heavily in that
direction ;-).

However, I intentionally bound those stats to the database level (see my
very first mail).

The changes to get them bound to attributes (or tables) would have
required mangling with quite a lot of very elemental stuff, (e.g.
attribute stats only get refreshed by ANALYZE and their structure would
have to be changed significantly, bloating them even if the feature is
inactive).

It also would have made the stats updates synchronous (at TX end), would
have been "blind" for all TOAST efforts done by rolled back TXs etc.

What you can do is of course (just like track_functions):
  ALTER DATABASE under_surveillance SET track_toast = [on|off];

Best,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: [PATCH] pg_stat_toast

2022-04-07 Thread Gunnar "Nick" Bluth
Am 06.04.22 um 18:55 schrieb Andres Freund:
> Hi,
> 
> On 2022-04-06 12:24:20 -0400, Robert Haas wrote:
>> On Wed, Apr 6, 2022 at 12:01 PM Gunnar "Nick" Bluth
>>  wrote:
>>> Fair enough. At that point, a lot of things become unexpectedly painful.
>>> How many % of the installed base may that be though?
>>
>> I don't have statistics on that, but it's large enough that the
>> expense associated with the statistics collector is a reasonably
>> well-known pain point, and for some users, a really severe one.
> 
> Yea. I've seen well over 100MB/s of write IO solely due to stats files writes
> on production systems, years ago.

Wow. Yeah, I tend to forget there's systems like ads' out there ;-)


>> I'm fairly sure it's not going to make things so cheap that we can afford to
>> add all the statistics anybody wants, but it's so painful that even modest
>> relief would be more than welcome.
> 
> It definitely doesn't make stats free. But I'm hopefull that avoiding the
> regular writing out / readin back in, and the ability to only optionally store
> some stats (by varying allocation size or just having different kinds of
> stats), will reduce the cost sufficiently that we can start keeping more
> stats.

Knock on wood!


> Which is not to say that these stats are the right ones (nor that they're the
> wrong ones).

;-)


> I think if I were to tackle providing more information about toasting, I'd
> start not by adding a new stats view, but by adding a function to pgstattuple
> that scans the relation and collects stats for each toasted column. An SRF
> returning one row for each toastable column. With information like
> 
> - column name
> - #inline datums
> - #compressed inline datums
> - sum(uncompressed inline datum size)
> - sum(compressed inline datum size)
> - #external datums
> - #compressed external datums
> - sum(uncompressed external datum size)
> - sum(compressed external datum size)
> 
> IIRC this shouldn't require visiting the toast table itself.

But it would still require a seqscan and quite some cycles. However,
sure, something like that is an option.

> Perhaps also an SRF that returns information about each compression method
> separately (i.e. collect above information, but split by compression method)?
> Perhaps even with the ability to measure how big the gains of recompressing
> into another method would be?

Even more of the above, but yeah, sound nifty.

>>>> However, experience has taught me that a lot of skepticism is
>>>> warranted when it comes to claims about how cheap extensions to the
>>>> statistics system will be.
>>>
>>> Again, fair enough!
>>> Maybe we first need statistics about statistics collection and handling? ;-)
>>
>> Heh.
> 
> I've wondered about adding pg_stat_stats the other day, actually :)
> https://postgr.es/m/20220404193435.hf3vybaajlpfmbmt%40alap3.anarazel.de

OMG LOL!

-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: [PATCH] pg_stat_toast

2022-04-06 Thread Gunnar "Nick" Bluth
Am 06.04.22 um 17:49 schrieb Robert Haas:

> I feel like if you want to know whether externalization made a
> difference, you can look at the size of the TOAST table. And by
> selecting directly from that table, you can even see how many chunks
> it contains, and how many are full-sized chunks vs. partial chunks,
> and stuff like that. And for compression, how about looking at
> pg_column_size(col1) vs. pg_column_size(col1||'') or something like
> that? You might get a 1-byte varlena header on the former and a 4-byte
> varlena header on the latter even if there's no compression, but any
> gains beyond 3 bytes have to be due to compression.

I'll probably give that a shot!

It does feel a bit like noting your mileage on fuel receipts though, as
I've done until I got my first decent car; works and will work perfectly
well up to the day, but  certainly is a bit out-of-time (and requires
some basic math skills ;-).

Best,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: [PATCH] pg_stat_toast

2022-04-06 Thread Gunnar "Nick" Bluth
Am 06.04.22 um 17:22 schrieb Robert Haas:
> On Tue, Apr 5, 2022 at 10:34 PM Andres Freund  wrote:
>>> Anyway, my (undisputed up to now!) understanding still is that only
>>> backends _looking_ at these stats (so, e.g., accessing the pg_stat_toast
>>> view) actually read the data. So, the 10-15% more space used for pg_stat
>>> only affect the stats collector and _some few_ backends.
>>
>> It's not so simple. That stats collector constantly writes these stats out to
>> disk. And disk bandwidth / space is of course a shared resource.
> 
> Yeah, and just to make it clear, this really becomes an issue if you
> have hundreds of thousands or even millions of tables. It's a lot of
> extra data to be writing, and in some cases we're rewriting it all,
> like, once per second.

Fair enough. At that point, a lot of things become unexpectedly painful.
How many % of the installed base may that be though?

I'm far from done reading the patch and mail thread Andres mentioned,
but I think the general idea is to move the stats to shared memory, so
that reading (and thus, persisting) pg_stats is required far less often,
right?

> Now if we're only incurring that overhead when this feature is
> enabled, then in fairness that problem is a lot less of an issue,
> especially if this is also disabled by default. People who want the
> data can get it and pay the cost, and others aren't much impacted.

That's the idea, yes. I reckon folks with millions of tables will scan
through the docs (and postgresql.conf) very thoroughly anyway. Hence the
note there.

> However, experience has taught me that a lot of skepticism is
> warranted when it comes to claims about how cheap extensions to the
> statistics system will be.

Again, fair enough!
Maybe we first need statistics about statistics collection and handling? ;-)

Best,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: PATCH: add "--config-file=" option to pg_rewind

2022-04-06 Thread Gunnar "Nick" Bluth
Am 06.04.22 um 14:04 schrieb Michael Paquier:
> On Tue, Apr 05, 2022 at 11:10:30AM +0900, Michael Paquier wrote:
>> In the tests, the only difference between the modes "archive_cli" and
>> "archive" is the extra option given to the pg_rewind command, and
>> that's a simple redirect to what pg_rewind would choose by default
>> anyway.  A more elegant solution would be to have an extra option in
>> RewindTest::run_pg_rewind(), where any caller of the routine can feed
>> extra options to the pg_rewind command.  Now, in this case, we are
>> not going to lose any coverage if the existing "archive" mode is
>> changed so as we move away postgresql.conf from the target data folder
>> and just use --config-file by default, no?  I would make the choice of
>> simplicity, by giving up on "archive_cli" and using
>> primary-postgresql.conf.tmp as value for --config-file.  There could
>> be an argument for using --config-file for all the modes, as well.
> 
> The clock is ticking, so I have looked at this patch by myself.  I

Ta! Sorry, been ridiculously busy these days, and I do confess that I've
been struggling a bit with those tests before ;-)

> have finished by tweaking the docs, the code and the tests as of the 
> attached.  One thing that I found annoying is the lack of description
> about the fact that this option affects pg_rewind when it internally
> starts the target cluster, as of when we retrieve restore_command and
> when we enforce crash recovery to have a target cluster with a clean
> shutdown state.  The tests are heavily simplified, having no impact on
> the run-time while improving the coverage for the code paths changed
> here.
> --
> Michael

LGTM!

Thx again!
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: [PATCH] pg_stat_toast

2022-04-05 Thread Gunnar "Nick" Bluth
Am 05.04.22 um 18:17 schrieb Robert Haas:
> On Thu, Mar 31, 2022 at 9:16 AM Gunnar "Nick" Bluth
>  wrote:
>> That was meant to say "v10", sorry!
> 
> Hi,

Hi Robert,

and thx for looking at this.

> From my point of view, at least, it would be preferable if you'd stop
> changing the subject line every time you post a new version.

Terribly sorry, I believed to do the right thing! I removed the "suffix"
now for good.

> Based on the test results in
> http://postgr.es/m/42bfa680-7998-e7dc-b50e-480cdd986...@pro-open.de
> and the comments from Andres in
> https://www.postgresql.org/message-id/20211212234113.6rhmqxi5uzgipwx2%40alap3.anarazel.de
> my judgement would be that, as things stand today, this patch has no
> chance of being accepted, due to overhead. Now, Andres is currently
> working on an overhaul of the statistics collector and perhaps that
> would reduce the overhead of something like this to an acceptable
> level. If it does, that would be great news; I just don't know whether
> that's the case.

AFAICT, Andres' work is more about the structure (e.g.
13619598f1080d7923454634a2570ca1bc0f2fec). Or I've missed something...?

The attached v11 incorporates the latest changes in the area, btw.


Anyway, my (undisputed up to now!) understanding still is that only
backends _looking_ at these stats (so, e.g., accessing the pg_stat_toast
view) actually read the data. So, the 10-15% more space used for pg_stat
only affect the stats collector and _some few_ backends.

And those 10-15% were gathered with 10.000 tables containing *only*
TOASTable attributes. So the actual percentage would probably go down
quite a bit once you add some INTs or such.
Back then, I was curious myself on the impact and just ran a few
syntetic tests quickly hacked together. I'll happily go ahead and run
some tests on real world schemas if that helps clarifying matters!

> As far as the statistics themselves are concerned, I am somewhat
> skeptical about whether it's really worth adding code for this.
> According to the documentation, the purpose of the patch is to allow
> you to assess choice of storage and compression method settings for a
> column and is not intended to be enabled permanently. However, it

TBTH, the wording there is probably a bit over-cautious. I very much
respect Andres and thus his reservations, and I know how careful the
project is about regressions of any kind (see below on some elobarations
on the latter).
I alleviated the  part a bit for v11.

> seems to me that you could assess that pretty easily without this
> patch: just create a couple of different tables with different
> settings, load up the same data via COPY into each one, and see what
> happens. Now you might answer that with the patch you would get more
> detailed and accurate statistics, and I think that's true, but it
> doesn't really look like the additional level of detail would be
> critical to have in order to make a proper assessment. You might also
> say that creating multiple copies of the table and loading the data
> multiple times would be expensive, and that's also true, but you don't
> really need to load it all. A representative sample of 1GB or so would
> probably suffice in most cases, and that doesn't seem likely to be a
> huge load on the system.

At the end of the day, one could argue like you did there for almost all
(non-attribute) stats. "Why track function execution times? Just set up
a benchmark and call the function 1 mio times and you'll know how long
it takes on average!". "Why track IO timings? Run a benchmark on your
system and ..." etc. pp.

I maintain a couple of DBs that house TBs of TOASTable data (mainly XML
containing encrypted payloads). In just a couple of columns per cluster.
I'm completely clueless if TOAST compression makes a difference there.
Or externalization.
And I'm not allowed to copy that data anywhere outside production
without unrolling a metric ton of red tape.
Guess why I started writing this patch ;-)
*I* would certainly leave the option on, just to get an idea of what's
happening...

> Also, as we add more compression options, it's going to be hard to
> assess this sort of thing without trying stuff anyway. For example if
> you can set the lz4 compression level, you're not going to know which
> level is actually going to work best without trying out a bunch of
> them and seeing what happens. If we allow access to other sorts of
> compression parameters like zstd's "long" option, similarly, if you
> really care, you're going to have to try it.

Funny that you mention it. When writing the first version, I was
thinking about the LZ4 patch authors and was wondering how they
tested/benchmarked all of it and why they didn't implement something
like this patch for their tests ;-)

Yes, you're gonna try it. And you're go

Re: [PATCH] pg_stat_toast v10

2022-03-31 Thread Gunnar "Nick" Bluth
Am 31.03.22 um 15:14 schrieb Gunnar "Nick" Bluth:
> Am 22.03.22 um 12:23 schrieb Gunnar "Nick" Bluth:
>> Am 22.03.22 um 02:17 schrieb Andres Freund:
>>> Hi,
>>>
>>> On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote:
>>>> v8 (applies cleanly to today's HEAD/master) attached.
>>>
>>> This doesn't apply anymore, likely due to my recent pgstat changes - which
>>> you'd need to adapt to...
>>
>> Now, that's been quite an overhaul... kudos!
>>
>>
>>> http://cfbot.cputube.org/patch_37_3457.log
>>>
>>> Marked as waiting on author.
>>
>> v9 attached.
>>
>> TBTH, I don't fully understand all the external/static stuff, but it
>> applies to HEAD/master, compiles and passes all tests, so... ;-)
> 
> And v10 catches up to master once again.
> 
> Best,

That was meant to say "v10", sorry!

-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato doc/src/sgml/config.sgml  |  26 
 doc/src/sgml/monitoring.sgml  | 163 ++
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  40 +++
 src/backend/catalog/system_views.sql  |  20 
 src/backend/postmaster/pgstat.c   | 161 -
 src/backend/utils/activity/Makefile   |   1 +
 src/backend/utils/activity/pgstat_toast.c | 157 +
 src/backend/utils/adt/pgstatfuncs.c   |  72 
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 
 src/include/pgstat.h  | 110 -
 src/include/utils/pgstat_internal.h   |   1 +
 src/test/regress/expected/rules.out   |  17 +++
 src/test/regress/expected/stats.out   |  62 ++
 src/test/regress/sql/stats.sql|  28 +
 17 files changed, 897 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 43e4ade83e..e6f0768472 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7935,6 +7935,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for a column.
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3b9172f65b..cd0a5bea35 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4946,6 +4957,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+

Re: [PATCH] pg_stat_toast v9

2022-03-31 Thread Gunnar "Nick" Bluth
Am 22.03.22 um 12:23 schrieb Gunnar "Nick" Bluth:
> Am 22.03.22 um 02:17 schrieb Andres Freund:
>> Hi,
>>
>> On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote:
>>> v8 (applies cleanly to today's HEAD/master) attached.
>>
>> This doesn't apply anymore, likely due to my recent pgstat changes - which
>> you'd need to adapt to...
> 
> Now, that's been quite an overhaul... kudos!
> 
> 
>> http://cfbot.cputube.org/patch_37_3457.log
>>
>> Marked as waiting on author.
> 
> v9 attached.
> 
> TBTH, I don't fully understand all the external/static stuff, but it
> applies to HEAD/master, compiles and passes all tests, so... ;-)

And v10 catches up to master once again.

Best,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato doc/src/sgml/config.sgml  |  26 
 doc/src/sgml/monitoring.sgml  | 163 ++
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  40 +++
 src/backend/catalog/system_views.sql  |  20 
 src/backend/postmaster/pgstat.c   | 161 -
 src/backend/utils/activity/Makefile   |   1 +
 src/backend/utils/activity/pgstat_toast.c | 157 +
 src/backend/utils/adt/pgstatfuncs.c   |  72 
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 
 src/include/pgstat.h  | 110 -
 src/include/utils/pgstat_internal.h   |   1 +
 src/test/regress/expected/rules.out   |  17 +++
 src/test/regress/expected/stats.out   |  62 ++
 src/test/regress/sql/stats.sql|  28 +
 17 files changed, 897 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 43e4ade83e..e6f0768472 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7935,6 +7935,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for a column.
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3b9172f65b..cd0a5bea35 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4946,6 +4957,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+   Attribute (column) number in the relation
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+   Name of the relation
+  
+ 
+
+ 
+  
+   attname name
+  
+  
+  

Re: [PATCH] pg_stat_toast v9

2022-03-22 Thread Gunnar "Nick" Bluth
Am 22.03.22 um 02:17 schrieb Andres Freund:
> Hi,
> 
> On 2022-03-08 19:32:03 +0100, Gunnar "Nick" Bluth wrote:
>> v8 (applies cleanly to today's HEAD/master) attached.
> 
> This doesn't apply anymore, likely due to my recent pgstat changes - which
> you'd need to adapt to...

Now, that's been quite an overhaul... kudos!


> http://cfbot.cputube.org/patch_37_3457.log
> 
> Marked as waiting on author.

v9 attached.

TBTH, I don't fully understand all the external/static stuff, but it
applies to HEAD/master, compiles and passes all tests, so... ;-)

Best regards,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato doc/src/sgml/config.sgml  |  26 
 doc/src/sgml/monitoring.sgml  | 163 ++
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  40 +++
 src/backend/catalog/system_views.sql  |  20 
 src/backend/postmaster/pgstat.c   | 161 -
 src/backend/utils/activity/Makefile   |   1 +
 src/backend/utils/activity/pgstat_toast.c | 157 +
 src/backend/utils/adt/pgstatfuncs.c   |  72 
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 
 src/include/pgstat.h  | 110 -
 src/include/utils/pgstat_internal.h   |   1 +
 src/test/regress/expected/rules.out   |  17 +++
 src/test/regress/expected/stats.out   |  62 ++
 src/test/regress/sql/stats.sql|  28 +
 17 files changed, 897 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a48973b3c..64b4219ded 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7892,6 +7892,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for a column.
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 35b2923c5e..7bbacd67fb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4942,6 +4953,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+   Attribute (column) number in the relation
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+   Name of the relation
+  
+ 
+
+ 
+  
+   attname name
+  
+  
+   Name of the attribute (column)
+  
+ 
+
+ 
+  
+   storagemethod char
+  
+  
+   Storage method of the attribute
+  
+ 
+
+ 
+  

Re: PATCH: add "--config-file=" option to pg_rewind

2022-03-10 Thread Gunnar "Nick" Bluth
Am 10.03.22 um 14:43 schrieb Michael Banck:
> Heya,
> 
> some minor comments, I didn't look at the added test and I did not test
> the patch at all, but (as part of the Debian/Ubuntu packaging team) I
> think this patch is really important:

Much appreciated!

[...]

> I think we usually just say "data directory"; pgdata was previously only
> used as part of the option name, not like this in the text.

Fixed.

[...]

> target-pgdata is the name of the option, but maybe it is useful to spell
> out "target data directory" here, even if that means we spill over to
> two lines (which we might have to do anyway, that new line is pretty
> long).

Fixed.

> 
>> @@ -121,6 +123,7 @@ main(int argc, char **argv)
>>  {"no-sync", no_argument, NULL, 'N'},
>>  {"progress", no_argument, NULL, 'P'},
>>  {"debug", no_argument, NULL, 3},
>> +{"config-file", required_argument, NULL, 5},
> 
> Not sure what our policy is, but the way the numbered options are 1, 2,
> 4, 3, 5 after this is weird; this was already off before this patch
> though.

That one has been triggering my inner Monk too ;-)

Fixed!


[...]

> You removed an empty line here. Maybe rather prepend this with a comment
> on what's going on, and have en empty line before and afterwards.

> Kinde same here.

It does smell a bit like "stating the obvious", but alas! Both fixed.

Cheers,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato doc/src/sgml/ref/pg_rewind.sgml   |  12 +
 src/bin/pg_rewind/pg_rewind.c |  24 -
 src/bin/pg_rewind/t/001_basic.pl  |   1 +
 src/bin/pg_rewind/t/RewindTest.pm | 105 +++---
 4 files changed, 101 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..62fcb71825 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+When using the -c / --restore-target-wal option, the restore_command
+is extracted from the configuration of the target cluster. If the postgresql.conf
+of that cluster is not in the target data directory (or if you want to use an alternative config),
+use this option to provide a (relative or absolute) path to the postgresql.conf to be used.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index b39b5c1aac..2e83c7ee8e 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -61,6 +61,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -87,6 +88,8 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve WAL files from archives\n"));
+	printf(_("  --config-file=FILE path to postgresql.conf if it resides outside\n"));
+	printf(_(" the target data directory (for -c)\n"));
 	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
 	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
 	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
@@ -114,13 +117,14 @@ main(int argc, char **argv)
 		{"write-recovery-conf", no_argument, NULL, 'R'},
 		{"source-pgdata", required_argument, NULL, 1},
 		{"source-server", required_argument, NULL, 2},
+		{"debug", no_argument, NULL, 3},
 		{"no-ensure-shutdown", no_argument, NULL, 4},
+		{"config-file", required_argument, NULL, 5},
 		{"version", no_argument, NULL, 'V'},
 		{"restore-target-wal", no_argument, NULL, 'c'},
 		{"dry-run", no_argument, NULL, 'n'},
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
-		{"debug", no_argument, NULL, 3},
 		{NULL, 0, NULL, 0}
 	};
 	int			option_index;
@@ -205,6 +209,10 @@ main(int argc, char **argv)
 			case 4:
 no_ensure_shutdown = true;
 break;
+
+			case 5:
+config_file = pg_strdup(optarg);
+break;
 		}
 	}
 
@@ -1061,6 +1069,13 @@ getRestoreCommand(c

Re: [PATCH] pg_stat_toast v8

2022-03-08 Thread Gunnar "Nick" Bluth
Am 04.01.22 um 12:29 schrieb Gunnar "Nick" Bluth:
> Am 03.01.22 um 22:23 schrieb Alvaro Herrera:
>> Overall I think this is a good feature to have; assessing the need for
>> compression is important for tuning, so +1 for the goal of the patch.
> 
> Much appreciated!
> 
> 
>> I didn't look into the patch carefully, but here are some minor
>> comments:
>>
>> On 2022-Jan-03, Gunnar "Nick" Bluth wrote:
>>
>>> @@ -229,7 +230,9 @@ toast_tuple_try_compression(ToastTupleContext
>>> *ttc, int attribute)
>>>   Datum   *value = >ttc_values[attribute];
>>>   Datum    new_value;
>>>   ToastAttrInfo *attr = >ttc_attr[attribute];
>>> +    instr_time    start_time;
>>>   +    INSTR_TIME_SET_CURRENT(start_time);
>>>   new_value = toast_compress_datum(*value, attr->tai_compression);
>>>     if (DatumGetPointer(new_value) != NULL)
>>
>> Don't INSTR_TIME_SET_CURRENT unconditionally; in some systems it's an
>> expensive syscall.  Find a way to only do it if the feature is enabled.
> 
> Yeah, I was worried about that (and asking if it would be required)
> already.
> Adding the check was easier than I expected, though I'm absolutely
> clueless if I did it right!
> 
> #include "pgstat.h"
> extern PGDLLIMPORT bool pgstat_track_toast;
> 

As it works and nobody objected, it seems to be the right way...

v8 (applies cleanly to today's HEAD/master) attached.

Any takers?

-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato doc/src/sgml/config.sgml  |  26 +++
 doc/src/sgml/monitoring.sgml  | 163 ++
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  40 
 src/backend/catalog/system_views.sql  |  20 ++
 src/backend/postmaster/pgstat.c   | 302 +-
 src/backend/utils/adt/pgstatfuncs.c   |  72 ++
 src/backend/utils/misc/guc.c  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 +++
 src/include/pgstat.h  | 107 -
 src/test/regress/expected/rules.out   |  17 ++
 src/test/regress/expected/stats.out   |  62 ++
 src/test/regress/sql/stats.sql|  28 +++
 14 files changed, 877 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..adbd516373 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7889,6 +7889,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for a column.
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9fb62fec8e..06258e777a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4933,6 +4944,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or n

Re: PATCH: add "--config-file=" option to pg_rewind

2022-03-01 Thread Gunnar "Nick" Bluth

Am 01.03.22 um 05:18 schrieb Michael Paquier:

On Mon, Feb 28, 2022 at 08:48:23PM +0100, Gunnar "Nick" Bluth wrote:

So, how should we call the global "find the * 'postgres' executable and
boil out if that fails" function?
 char postgres_exec_path[MAXPGPATH] = findPostgresExec();
?


That would mean only a couple of lines gained, and the readability
gained is minimal, so I'd be fine to keep the code as-is.  I am not
sure about the full patch set yet, but the refactoring of the commands
to use PQExpBuffer is good by itself, so I have extracted this part of
the patch and applied that for now.


Which made the remaining patch v5 (which also appends the --config-file 
for ensureCleanShutdown) much cleaner :)


Cheers,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..d0278e9b46 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+When using the -c / --restore-target-wal option, the restore_command
+is extracted from the configuration of the target cluster. If the postgresql.conf
+of that cluster is not in the target pgdata directory (or if you want to use an alternative config),
+use this option to provide a (relative or absolute) path to the postgresql.conf to be used.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index b39b5c1aac..294353a9d5 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -61,6 +61,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -87,6 +88,7 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve WAL files from archives\n"));
+	printf(_("  --config-file=FILE path to postgresql.conf if outside target-pgdata (for -c)\n"));
 	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
 	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
 	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
@@ -121,6 +123,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
 		{"debug", no_argument, NULL, 3},
+		{"config-file", required_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 	int			option_index;
@@ -205,6 +208,10 @@ main(int argc, char **argv)
 			case 4:
 no_ensure_shutdown = true;
 break;
+
+			case 5:
+config_file = pg_strdup(optarg);
+break;
 		}
 	}
 
@@ -1060,7 +1067,11 @@ getRestoreCommand(const char *argv0)
 	/* add -D switch, with properly quoted data directory */
 	appendPQExpBufferStr(postgres_cmd, " -D ");
 	appendShellString(postgres_cmd, datadir_target);
-
+	if (config_file != NULL)
+	{
+		appendPQExpBufferStr(postgres_cmd, " --config_file=");
+		appendShellString(postgres_cmd, config_file);
+	}
 	/* add -C switch, for restore_command */
 	appendPQExpBufferStr(postgres_cmd, " -C restore_command");
 
@@ -1138,6 +1149,11 @@ ensureCleanShutdown(const char *argv0)
 	/* add set of options with properly quoted data directory */
 	appendPQExpBufferStr(postgres_cmd, " --single -F -D ");
 	appendShellString(postgres_cmd, datadir_target);
+	if (config_file != NULL)
+	{
+		appendPQExpBufferStr(postgres_cmd, " --config_file=");
+		appendShellString(postgres_cmd, config_file);
+	}
 
 	/* finish with the database name, and a properly quoted redirection */
 	appendPQExpBufferStr(postgres_cmd, " template1 < ");
diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index db9201f38e..3c395ece12 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -190,5 +190,6 @@ in primary, before promotion
 run_test('local');
 run_test('remote');
 run_test('archive');
+run_test('archive_cli');
 
 done_testing();
diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index 5651602858..f93e138b35 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -208,6 +208,68 @@ sub promote_standby
 	return;
 }
 
+

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-28 Thread Gunnar &quot;Nick&quot; Bluth

Am 28.02.22 um 12:56 schrieb Michael Paquier:

On Sun, Feb 27, 2022 at 02:30:33PM +0100, Gunnar "Nick" Bluth wrote:

That's universally true ;-)


-# Internal routine to enable archive recovery command on a standby node
+# Internal routine to enable archive recovery command on a standby node.
+# Returns generated restore_command.
  sub enable_restoring
  {
my ($self, $root_node, $standby) = @_;
@@ -1103,7 +1104,7 @@ restore_command = '$copy_command'
{
$self->set_recovery_mode();
}
-   return;
+   return $copy_command;

I don't like this change.  This makes an existing routine do some
extra work for something it is not designed for.  You could get this
information with a postgres -C command, for example.  Now, you cannot
use postgres -C because of..  Reasons (1a9d802).  But you could use a
pg_ctl command for the same purpose.  I guess that we'd better
refactor this into a new routine of Cluster.pm where we execute a
pg_ctl command and return both stdout and stderr back to the caller?


Turns out this change isn't needed anyway (it was copied from the other 
patch). Afterall, the patch is about extracting the restore_command from 
the config, so...



The TAP part could be refactored on its own.


Agreed. I've struggled quite a bit even understanding what's happening 
in there ;-)




+In case the postgresql.conf of your
target cluster is not in the
+target pgdata and you want to use the -c /
--restore-target-wal option,
+provide a (relative or absolute) path to the
postgresql.conf using this option.

This could do a better job to explain in which cases this option can
be used for.  You could say something like that:
"This option specifies a path to a configuration file to be used when
starting the target cluster.  This makes easier the use of
-c/--restore-target-wal by setting restore_command in the extra
configuration file specified via this option."


I reviewed the wording and think it is pretty universal now.



Hmm.  What about the target cluster started in --single mode as of
ensureCleanShutdown()?  Should we try to apply this option also in
this case?


I'd say so; as far as I can tell, "postgres" would deny starting a 
(Debian/non-standard) cluster the way the code is right now.


Which led me to realize we have

/* find postgres executable */
rc = find_other_exec(argv0, "postgres",
 PG_BACKEND_VERSIONSTR,
 postgres_exec_path);
in getRestoreCommand _and_

/* locate postgres binary */
if ((ret = find_other_exec(argv0, "postgres", 


PG_BACKEND_VERSIONSTR,
exec_path)) < 0)
in ensureCleanShutdown.
Both with the same error messages, AFAICS. D'oh... can of worms.

So, how should we call the global "find the * 'postgres' executable 
and boil out if that fails" function?

char postgres_exec_path[MAXPGPATH] = findPostgresExec();
?


Anyway, v4 attached so we can settle on the tests and wording...

--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..d0278e9b46 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+When using the -c / --restore-target-wal option, the restore_command
+is extracted from the configuration of the target cluster. If the postgresql.conf
+of that cluster is not in the target pgdata directory (or if you want to use an alternative config),
+use this option to provide a (relative or absolute) path to the postgresql.conf to be used.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index efb82a4034..b8c92c5590 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -23,6 +23,7 @@
 #include "common/restricted_token.h"
 #include "common/string.h"
 #include "fe_utils/recovery_gen.h"
+#include "fe_utils/string_utils.h"
 #include "file_ops.h"
 #include "filemap.h"
 #include "getopt_long.h"
@@ -60,6 +61,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -86,6 +88,7 @@ usage(const char *progname)
 	printf(_(&q

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-27 Thread Gunnar &quot;Nick&quot; Bluth

Am 27.02.22 um 13:06 schrieb Michael Paquier:

On Sat, Feb 26, 2022 at 09:55:20AM +0100, Gunnar "Nick" Bluth wrote:

Am 26.02.22 um 06:51 schrieb Michael Paquier:

Shouldn't this one use appendShellString() on config_file?


It probably should, yes. I don't fancy this repetitive code myself.
But then, pg_rewind as a whole could use an overhaul. I don't see any use of
PQExpBuffer in it, but a lot of char* ...


Having a lot of char* does not necessarily mean that all of them have
to be changed to accomodate with PQExpBuffer.  My guess that this is a
case-by-case, where we should apply that in places where it makes the
code cleaner to follow.  In the case of your patch though, the two
areas changed would make the logic correct, instead, because we have
to apply correct quoting rules to any commands executed.


Alas! v3 attached.



GSOC project? ;-)


It does not seem so, though there are surely more areas that could
gain in clarity.


That's universally true ;-)


Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..31a1a1dedb 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+In case the postgresql.conf of your target cluster is not in the 
+target pgdata and you want to use the -c / --restore-target-wal option,
+provide a (relative or absolute) path to the postgresql.conf using this option.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index efb82a4034..b8c92c5590 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -23,6 +23,7 @@
 #include "common/restricted_token.h"
 #include "common/string.h"
 #include "fe_utils/recovery_gen.h"
+#include "fe_utils/string_utils.h"
 #include "file_ops.h"
 #include "filemap.h"
 #include "getopt_long.h"
@@ -60,6 +61,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -86,6 +88,7 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve WAL files from archives\n"));
+	printf(_("  --config-file=FILE path to postgresql.conf if outside target-pgdata (for -c)\n"));
 	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
 	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
 	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
@@ -120,6 +123,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
 		{"debug", no_argument, NULL, 3},
+		{"config-file", required_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 	int			option_index;
@@ -204,6 +208,10 @@ main(int argc, char **argv)
 			case 4:
 no_ensure_shutdown = true;
 break;
+
+			case 5:
+config_file = pg_strdup(optarg);
+break;
 		}
 	}
 
@@ -1016,8 +1024,8 @@ getRestoreCommand(const char *argv0)
 {
 	int			rc;
 	char		postgres_exec_path[MAXPGPATH],
-postgres_cmd[MAXPGPATH],
 cmd_output[MAXPGPATH];
+	PQExpBuffer	postgres_cmd;
 
 	if (!restore_wal)
 		return;
@@ -1051,11 +1059,21 @@ getRestoreCommand(const char *argv0)
 	 * Build a command able to retrieve the value of GUC parameter
 	 * restore_command, if set.
 	 */
-	snprintf(postgres_cmd, sizeof(postgres_cmd),
-			 "\"%s\" -D \"%s\" -C restore_command",
-			 postgres_exec_path, datadir_target);
+	postgres_cmd = createPQExpBuffer();
 
-	if (!pipe_read_line(postgres_cmd, cmd_output, sizeof(cmd_output)))
+	/* path to postgres, properly quoted */
+	appendShellString(postgres_cmd, postgres_exec_path);
+
+	appendPQExpBufferStr(postgres_cmd, " -D ");
+	appendShellString(postgres_cmd, datadir_target);
+	if (config_file != NULL)
+	{
+		appendPQExpBufferStr(postgres_cmd, " --config_file=");
+		appendShellString(postgres_cmd, config_file);
+	}
+	appendPQExpBufferStr(postgres_cmd, " -C restore_command");
+
+	if (!pipe_read_line(postgres_cmd->data, cmd_output, sizeof(cmd_output)))
 		exit(1);
 
 	(void) pg_strip_crlf(cmd_output);
diff --g

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-26 Thread Gunnar &quot;Nick&quot; Bluth

Am 26.02.22 um 06:51 schrieb Michael Paquier:

On Fri, Feb 25, 2022 at 10:35:49AM +0100, Gunnar "Nick" Bluth wrote:

Am 24.02.22 um 14:46 schrieb Daniel Gustafsson:

Actually, I think this looks like a saner approach.  Putting a config setting
in two place (postgresql.conf and on the commandline for pg_rewind) is a recipe
for them diverging.


FWIW, I have a bad feeling about passing down directly a command
through an option itself part of a command, so what's discussed on
this thread is refreshing.


Ta.



+   } else {
+   snprintf(postgres_cmd, sizeof(postgres_cmd),
+"\"%s\" -D \"%s\" --config_file=\"%s\" -C 
restore_command",
+postgres_exec_path, datadir_target, config_file);
+   }

Shouldn't this one use appendShellString() on config_file?


It probably should, yes. I don't fancy this repetitive code myself.
But then, pg_rewind as a whole could use an overhaul. I don't see any 
use of PQExpBuffer in it, but a lot of char* ...


I myself am not a C hacker (as you can probably tell already) and don't 
really feel fit (enough) to rewrite pg_rewind.c to use 
fe_utils/string_utils.h :/


I might give it a try anyhow, but that may be a bit... heavy,,, just to 
add one option?


While we're at it (I'm really good at opening cans of worms):
09:47 $ grep -r -l --include \*.c fe_utils/string_utils.h src/bin/ | wc -l
28
09:48 $ grep -r -L --include \*.c fe_utils/string_utils.h src/bin/ | wc -l
66

GSOC project? ;-)

Best,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-25 Thread Gunnar &quot;Nick&quot; Bluth

Am 24.02.22 um 14:46 schrieb Daniel Gustafsson:

On 24 Feb 2022, at 14:43, Gunnar Nick Bluth  wrote:



That looks just as good to me, and it already has tests, so I'll happily step 
down!


Actually, I think this looks like a saner approach.  Putting a config setting
in two place (postgresql.conf and on the commandline for pg_rewind) is a recipe
for them diverging.


Well, well,

reverted https://commitfest.postgresql.org/37/3562/ back to "needs 
review" and moved it to "Replication & Recovery".


Patch v2 includes:
* doc improvements as suggested
* tests shamelessly copied & adapted from 
https://commitfest.postgresql.org/37/3213/


Since the need is pretty obvious (I *think* the Debian-ish userbase is 
quite significant alone), I'd love to see one of the approaches making 
it into 15!


Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..31a1a1dedb 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+In case the postgresql.conf of your target cluster is not in the 
+target pgdata and you want to use the -c / --restore-target-wal option,
+provide a (relative or absolute) path to the postgresql.conf using this option.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index efb82a4034..d57a3a6852 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -60,6 +60,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -86,6 +87,7 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve WAL files from archives\n"));
+	printf(_("  --config-file=FILE path to postgresql.conf if outside target-pgdata (for -c)\n"));
 	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
 	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
 	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
@@ -120,6 +122,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
 		{"debug", no_argument, NULL, 3},
+		{"config-file", required_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 	int			option_index;
@@ -204,6 +207,10 @@ main(int argc, char **argv)
 			case 4:
 no_ensure_shutdown = true;
 break;
+
+			case 5:
+config_file = pg_strdup(optarg);
+break;
 		}
 	}
 
@@ -1051,9 +1058,16 @@ getRestoreCommand(const char *argv0)
 	 * Build a command able to retrieve the value of GUC parameter
 	 * restore_command, if set.
 	 */
-	snprintf(postgres_cmd, sizeof(postgres_cmd),
+	if (config_file == NULL)
+	{
+		snprintf(postgres_cmd, sizeof(postgres_cmd),
 			 "\"%s\" -D \"%s\" -C restore_command",
 			 postgres_exec_path, datadir_target);
+	} else {
+		snprintf(postgres_cmd, sizeof(postgres_cmd),
+			 "\"%s\" -D \"%s\" --config_file=\"%s\" -C restore_command",
+			 postgres_exec_path, datadir_target, config_file);
+	}
 
 	if (!pipe_read_line(postgres_cmd, cmd_output, sizeof(cmd_output)))
 		exit(1);
diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index db9201f38e..3c395ece12 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -190,5 +190,6 @@ in primary, before promotion
 run_test('local');
 run_test('remote');
 run_test('archive');
+run_test('archive_cli');
 
 done_testing();
diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index 5651602858..94d413dcb6 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -208,6 +208,69 @@ sub promote_standby
 	return;
 }
 
+sub run_pg_rewind_archive
+{
+	my $test_mode = shift;
+	my $restore_command;
+	my $primary_pgdata  = $node_primary->data_dir;
+
+	# Remove the existing archive directory and move all WAL
+	# segments from the old primary to the archives.  These
+	# will be used by pg_rewind.
+	rmtree($node_primary->archive_dir);
+	PostgreSQL::Test::RecursiveCopy::copypath($node_primary-&

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-24 Thread Gunnar &quot;Nick&quot; Bluth

Am 24.02.2022 um 14:08 schrieb Daniel Gustafsson:

On 24 Feb 2022, at 10:27, Alexander Kukushkin  wrote:

Hello Gunnar,

On Thu, 24 Feb 2022 at 10:03, Aleksander Alekseev  
wrote:
  

wants to use the "-c" option on a typical Debian/Ubuntu installation
(where the config resides below /etc/postgresql/), pg_rewind needs a way
to be told where the postgresql.conf actually is.

The attached patch implements this as an option.

[...]


Good catch!

Yeah, it is a known problem, and there was already another patch trying to 
address it [1].


There is yet another related patch which provides a parameter to pass in
restore_command in cases where postgresq.conf isn't reachable:

   https://commitfest.postgresql.org/37/3213/


That looks just as good to me, and it already has tests, so I'll happily 
step down!


(Note to myself: check the CF first next time ;-)
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




PATCH: add "--config-file=" option to pg_rewind

2022-02-23 Thread Gunnar &quot;Nick&quot; Bluth

Hi!

During a Patroni PR discussion 
(https://github.com/zalando/patroni/pull/2225), we realised that if one 
wants to use the "-c" option on a typical Debian/Ubuntu installation 
(where the config resides below /etc/postgresql/), pg_rewind needs a way 
to be told where the postgresql.conf actually is.


The attached patch implements this as an option.

I'm extremely unhappy that I called it "--config-file" here, while 
"postgres" itself wants "--config_file". But as the other dashed options 
of pg_rewind are, well, dashed and not underscored, it seemed to be 
better that the other way...


As the "-c" feature appeared in 12 and existing Debian installations are 
unable to use it right now, I suggest to even backport the patch.


Oh, and I'm still not subscribed to -hackers, so the usual "please CC 
me" applies!


Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..f80adbf5ae 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+In case the postgresql.conf of your target cluster is not in the 
+target pgdata and you want to use the -c option,
+provide a (relative or absolute) path to the postgresql.conf using this option.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index efb82a4034..d57a3a6852 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -60,6 +60,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -86,6 +87,7 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve WAL files from archives\n"));
+	printf(_("  --config-file=FILE path to postgresql.conf if outside target-pgdata (for -c)\n"));
 	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
 	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
 	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
@@ -120,6 +122,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
 		{"debug", no_argument, NULL, 3},
+		{"config-file", required_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 	int			option_index;
@@ -204,6 +207,10 @@ main(int argc, char **argv)
 			case 4:
 no_ensure_shutdown = true;
 break;
+
+			case 5:
+config_file = pg_strdup(optarg);
+break;
 		}
 	}
 
@@ -1051,9 +1058,16 @@ getRestoreCommand(const char *argv0)
 	 * Build a command able to retrieve the value of GUC parameter
 	 * restore_command, if set.
 	 */
-	snprintf(postgres_cmd, sizeof(postgres_cmd),
+	if (config_file == NULL)
+	{
+		snprintf(postgres_cmd, sizeof(postgres_cmd),
 			 "\"%s\" -D \"%s\" -C restore_command",
 			 postgres_exec_path, datadir_target);
+	} else {
+		snprintf(postgres_cmd, sizeof(postgres_cmd),
+			 "\"%s\" -D \"%s\" --config_file=\"%s\" -C restore_command",
+			 postgres_exec_path, datadir_target, config_file);
+	}
 
 	if (!pipe_read_line(postgres_cmd, cmd_output, sizeof(cmd_output)))
 		exit(1);


Re: [PATCH] pg_stat_toast v6

2022-01-04 Thread Gunnar &quot;Nick&quot; Bluth

Am 03.01.22 um 22:23 schrieb Alvaro Herrera:

Overall I think this is a good feature to have; assessing the need for
compression is important for tuning, so +1 for the goal of the patch.


Much appreciated!



I didn't look into the patch carefully, but here are some minor
comments:

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:


@@ -229,7 +230,9 @@ toast_tuple_try_compression(ToastTupleContext *ttc, int 
attribute)
Datum  *value = >ttc_values[attribute];
Datum   new_value;
ToastAttrInfo *attr = >ttc_attr[attribute];
+   instr_time  start_time;
  
+	INSTR_TIME_SET_CURRENT(start_time);

new_value = toast_compress_datum(*value, attr->tai_compression);
  
  	if (DatumGetPointer(new_value) != NULL)


Don't INSTR_TIME_SET_CURRENT unconditionally; in some systems it's an
expensive syscall.  Find a way to only do it if the feature is enabled.


Yeah, I was worried about that (and asking if it would be required) already.
Adding the check was easier than I expected, though I'm absolutely 
clueless if I did it right!


#include "pgstat.h"
extern PGDLLIMPORT bool pgstat_track_toast;



This also suggests that perhaps it'd be a good idea to allow this to be
enabled for specific tables only, rather than system-wide.  (Maybe in
order for stats to be collected, the user should have to both set the
GUC option *and* set a per-table option?  Not sure.)


That would of course be nice, but I seriously doubt the required 
additional logic would be justified. The patch currently tampers with as 
few internal structures as possible, and for good reason... ;-)



@@ -82,10 +82,12 @@ typedef enum StatMsgType
PGSTAT_MTYPE_DEADLOCK,
PGSTAT_MTYPE_CHECKSUMFAILURE,
PGSTAT_MTYPE_REPLSLOT,
+   PGSTAT_MTYPE_CONNECTION,


I think this new enum value doesn't belong in this patch.


Yeah, did I mention I'm struggling with rebasing? ;-|



+/* --
+ * PgStat_ToastEntry   Per-TOAST-column info in a MsgFuncstat
+ * --


Not in "a MsgFuncstat", right?


Obviously... fixed!




+-- function to wait for counters to advance
+create function wait_for_stats() returns void as $$


I don't think we want a separate copy of wait_for_stats; see commit
fe60b67250a3 and the discussion leading to it.  Maybe you'll want to
move the test to stats.sql.  I'm not sure what to say about relying on


Did so.


LZ4; maybe you'll want to leave that part out, and just verify in an
LZ4-enabled build that some 'l' entry exists.  BTW, don't we have any
decent way to turn that 'l' into a more reasonable, descriptive string?



Also, perhaps make the view-defining query turn an empty compression
method into whatever the default is.


I'm not even sure that having it in there is useful at all. It's simply 
JOINed in from pg_attribute.

Which is where I'd see that "make it look nicer" change happening, tbth. ;-)


Or even better, stats collection
should store the real compression method used rather than empty string,
to avoid confusing things when some stats are collected, then the
default is changed, then some more stats are collected.


I was thinking about that already, but came to the conclusion that it a) 
would blow up the size of these statistics by quite a bit and b) would 
be quite tricky to display in a useful way.


I mean, the use case of track_toast is pretty limited anyway; you'll 
probably turn this feature on with a specific column in mind, of which 
you'll probably know which compression method is used at the time.


Thanks for the feedback!
v7 attached.
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - CatoFrom f07213d68c646ba64757e551e3587aab0ff221df Mon Sep 17 00:00:00 2001
From: "Gunnar \"Nick\" Bluth" 
Date: Tue, 4 Jan 2022 12:08:32 +0100
Subject: [PATCH v7] pg_stat_toast v7

---
 doc/src/sgml/config.sgml  |  26 ++
 doc/src/sgml/monitoring.sgml  | 163 ++
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  44 ++-
 src/backend/catalog/system_views.sql  |  20 ++
 src/backend/postmaster/pgstat.c   | 305 +-
 src/backend/utils/adt/pgstatfuncs.c   |  72 +
 src/backend/utils/misc/guc.c  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 ++
 src/include/pgstat.h  | 107 ++
 src/test/regress/expected/rules.out   |  17 +
 src/test/regress/expected/stats.out   |  62 
 src/test/regress/sql/stats.sql|  28 ++
 14 files changed, 882 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/

Re: [PATCH] pg_stat_toast v6

2022-01-04 Thread Gunnar &quot;Nick&quot; Bluth

Am 03.01.22 um 22:03 schrieb Justin Pryzby:

+pgstat_report_toast_activity(Oid relid, int attr,
+   bool externalized,
+   bool compressed,
+   int32 old_size,
+   int32 new_size,

...

+   if (new_size)
+   {
+   htabent->t_counts.t_size_orig+=old_size;
+   if (new_size)
+   {


I guess one of these is supposed to say old_size?


Didn't make a difference, tbth, as they'd both be 0 or have a value. 
Streamlined the whole block now.




+CREATE TABLE toast_test (cola TEXT, colb TEXT COMPRESSION lz4, colc TEXT , 
cold TEXT, cole TEXT);


Is there a reason this uses lz4 ?


I thought it might help later on, but alas! the LZ4 column mainly broke 
things, so I removed it for the time being.



If that's needed for stable results, I think you should use pglz, since that's
what's guaranteed to exist.  I imagine LZ4 won't be required any time soon,
seeing as zlib has never been required.


Yeah. It didn't prove anything whatsoever.


+Be aware that this feature, depending on the amount of TOASTable 
columns in
+your databases, may significantly increase the size of the statistics 
files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage 
method
+for (a) column(s).


saying "a column" is fine


Changed.




+   schemaname name
+   Attribute (column) number in the relation
+   relname name



+  
+   compressmethod char
+  
+  
+   Compression method of the attribute (empty means default)


One thing to keep in mind is that the current compression method is only used
for *new* data - old data can still use the old compression method.  It
probably doesn't need to be said here, but maybe you can refer to the docs
about that in alter_table.


+   Number of times the compression was successful (gained a size reduction)


It's more clear to say "was reduced in size"


Changed the wording a bit, I guess it is clear enough now.
The question is if the column should be there at all, as it's simply 
fetched from pg_attribute...





+   /* we assume this inits to all zeroes: */
+   static const PgStat_ToastCounts all_zeroes;


You don't have to assume; static/global allocations are always zero unless
otherwise specified.


Copy-pasta ;-)
Removed.

Thx for looking into this!
Patch v7 will be in the next mail.

--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: [PATCH] pg_stat_toast v6

2022-01-03 Thread Gunnar &quot;Nick&quot; Bluth

Am 03.01.22 um 20:11 schrieb Alvaro Herrera:

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:


9:38 $ git format-patch PGDG/master -v5 -o ..
../v5-0001-ping-pong-of-thougths.patch
../v5-0002-ping-pong-of-thougths.patch
../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch
  ...


Hmm, in such cases I would suggest to create a separate branch and then
"git merge --squash" for submission.  You can keep your development
branch separate, with other merges if you want.

I've found this to be easier to manage, though I don't always follow
that workflow myself.



Using --stdout does help ;-)

I wonder why "track_toast.sql" test fails on Windows (with "ERROR: 
compression method lz4 not supported"), but "compression.sql" doesn't.

Any hints?

Anyway, I shamelessly copied "wait_for_stats()" from the "stats.sql" 
file and the tests _should_ now work at least on the platforms with lz4.


v6 attached!
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - CatoFrom e743587fbd8f6592bbfa15f53733f79c405000e2 Mon Sep 17 00:00:00 2001
From: "Gunnar \"Nick\" Bluth" 
Date: Mon, 3 Jan 2022 20:35:05 +0100
Subject: [PATCH v6] pg_stat_toast v6

---
 doc/src/sgml/config.sgml  |  26 ++
 doc/src/sgml/monitoring.sgml  | 163 +
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  24 ++
 src/backend/catalog/system_views.sql  |  20 ++
 src/backend/postmaster/pgstat.c   | 309 +-
 src/backend/utils/adt/pgstatfuncs.c   |  72 
 src/backend/utils/misc/guc.c  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 ++
 src/include/pgstat.h  | 108 ++
 src/test/regress/expected/rules.out   |  17 +
 src/test/regress/expected/track_toast.out | 102 ++
 src/test/regress/parallel_schedule|   2 +-
 src/test/regress/sql/track_toast.sql  |  64 
 15 files changed, 946 insertions(+), 8 deletions(-)
 create mode 100644 src/test/regress/expected/track_toast.out
 create mode 100644 src/test/regress/sql/track_toast.sql

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..fa40befc16 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7668,6 +7668,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for (a) column(s).
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..32d7818096 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4969,6 +4980,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+  

Re: [PATCH] pg_stat_toast v0.4

2022-01-03 Thread Gunnar &quot;Nick&quot; Bluth

Am 03.01.22 um 19:30 schrieb Alvaro Herrera:

On 2022-Jan-03, Gunnar "Nick" Bluth wrote:


Am 03.01.22 um 17:50 schrieb Justin Pryzby:



Soon you will think this is fun :)


As long as you're happy with plain patches like the attached one, I may ;-)


Well, with a zero-byte patch, not very much ...


D'oh



BTW why do you number with a "0." prefix?  It could just be "4" and "5"
and so on.  There's no value in two-part version numbers for patches.
Also, may I suggest that "git format-patch -vN" with varying N is an
easier way to generate patches to attach?


Not when you have a metric ton of commits in the history... I'll 
hopefully find a way to start over soon :/


9:38 $ git format-patch PGDG/master -v5 -o ..
../v5-0001-ping-pong-of-thougths.patch
../v5-0002-ping-pong-of-thougths.patch
../v5-0003-adds-some-debugging-messages-in-toast_helper.c.patch
../v5-0004-adds-some-groundwork-for-pg_stat_toast-to-pgstat..patch
../v5-0005-fixes-wrong-type-for-pgstat_track_toast-GUC.patch
../v5-0006-introduces-PgStat_BackendAttrIdentifier-OID-attr-.patch
../v5-0007-implements-and-calls-pgstat_report_toast_activity.patch
../v5-0008-Revert-adds-some-debugging-messages-in-toast_help.patch
../v5-0009-adds-more-detail-to-logging.patch
../v5-0010-adds-toastactivity-to-table-stats-and-many-helper.patch
../v5-0011-fixes-missed-replacement-in-comment.patch
../v5-0012-adds-SQL-support-functions.patch
../v5-0013-Add-SQL-functions.patch
../v5-0014-reset-to-HEAD.patch
../v5-0015-makes-DEBUG2-messages-more-precise.patch
../v5-0016-adds-timing-information-to-stats-and-view.patch
../v5-0017-adds-a-basic-set-of-tests.patch
../v5-0018-adds-a-basic-set-of-tests.patch
../v5-0019-chooses-a-PGSTAT_TOAST_HASH_SIZE-of-64-changes-ha.patch
../v5-0020-removes-whitespace-trash.patch
../v5-0021-returns-to-PGDG-master-.gitignore.patch
../v5-0022-pg_stat_toast_v0.5.patch
../v5-0023-moves-tests-to-regress.patch

But alas! INT versioning it is from now on!
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..fa40befc16 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7668,6 +7668,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for (a) column(s).
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..32d7818096 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4969,6 +4980,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+   Attribute (column) number in the relation
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+ 

Re: [PATCH] pg_stat_toast v0.4

2022-01-03 Thread Gunnar &quot;Nick&quot; Bluth

Am 03.01.22 um 17:50 schrieb Justin Pryzby:

On Mon, Jan 03, 2022 at 05:00:45PM +0100, Gunnar "Nick" Bluth wrote:

Am 03.01.22 um 16:52 schrieb Gunnar "Nick" Bluth:


pg_stat_toast_v0.4.patch attached.


Note that the cfbot says this fails under windows


Thanks for the heads up!




http://cfbot.cputube.org/gunnar-quotnickquot-bluth.html
...
[16:47:05.347] Could not determine contrib module type for track_toast
[16:47:05.347]  at src/tools/msvc/mkvcbuild.pl line 31.


Not only Window$... as it turns out, one of the checks was pretty bogus. 
Kicked that one and instead wrote two (hopefully) meaningful ones.


Also, I moved the tests to regress/, as they're not really for a module 
anyway.


Let's see how this fares!


Aaaand I attached a former version of the patch file... sorry, I'm kind of
struggling with all the squashing/rebasing...


Soon you will think this is fun :)


As long as you're happy with plain patches like the attached one, I may ;-)

All the best,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato

Re: [PATCH] pg_stat_toast v0.4

2022-01-03 Thread Gunnar &quot;Nick&quot; Bluth

Am 03.01.22 um 16:52 schrieb Gunnar "Nick" Bluth:


pg_stat_toast_v0.4.patch attached.


Aaaand I attached a former version of the patch file... sorry, I'm kind 
of struggling with all the squashing/rebasing...

--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - CatoFrom 560cfb4082804709d952b3f741b505025f400f97 Mon Sep 17 00:00:00 2001
From: "Gunnar \"Nick\" Bluth" 
Date: Mon, 3 Jan 2022 16:18:36 +0100
Subject: [PATCH] pg_stat_toast_v0.4

---
 doc/src/sgml/config.sgml  |  26 ++
 doc/src/sgml/monitoring.sgml  | 163 +
 doc/src/sgml/storage.sgml |  12 +-
 src/backend/access/table/toast_helper.c   |  24 ++
 src/backend/catalog/system_views.sql  |  20 ++
 src/backend/postmaster/pgstat.c   | 309 +-
 src/backend/utils/adt/pgstatfuncs.c   |  72 
 src/backend/utils/misc/guc.c  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |  25 ++
 src/include/pgstat.h  | 108 ++
 src/test/modules/Makefile |   1 +
 src/test/modules/track_toast/Makefile |  18 +
 .../track_toast/expected/track_toast.out  |  55 
 .../modules/track_toast/sql/track_toast.sql   |  20 ++
 src/test/modules/track_toast/track_toast.conf |   1 +
 src/test/regress/expected/rules.out   |  17 +
 17 files changed, 874 insertions(+), 7 deletions(-)
 create mode 100644 src/test/modules/track_toast/Makefile
 create mode 100644 src/test/modules/track_toast/expected/track_toast.out
 create mode 100644 src/test/modules/track_toast/sql/track_toast.sql
 create mode 100644 src/test/modules/track_toast/track_toast.conf

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..fa40befc16 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7668,6 +7668,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for (a) column(s).
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..32d7818096 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4969,6 +4980,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+   Attribute (column) number in the relation
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+   Name of the relation
+  
+ 
+
+ 
+  
+   attname name
+  
+  
+   Name of the attribute (column)
+  
+ 
+
+ 
+  
+   storagemethod char
+  
+  
+   Storage method of the attribute
+  
+

Re: [PATCH] pg_stat_toast v0.4

2022-01-03 Thread Gunnar &quot;Nick&quot; Bluth

Am 21.12.21 um 13:51 schrieb kuroda.hay...@fujitsu.com:

Dear Gunnar,


Hayato-san, all,

thanks for the feedback!


* gathering timing information


Here is a minor comment:
I'm not sure when we should start to measure time.
If you want to record time spent for compressing, toast_compress_datum() should 
be
sandwiched by INSTR_TIME_SET_CURRENT() and caclurate the time duration.
Currently time_spent is calcurated in the pgstat_report_toast_activity(),
but this have a systematic error.
If you want to record time spent for inserting/updating, however,
I think we should start measuring at the upper layer, maybe 
heap_toast_insert_or_update().


Yes, both toast_compress_datum() and toast_save_datum() are sandwiched 
the way you mentioned, as that's exactly what we want to measure (time 
spent doing compression and/or externalizing data).


Implementation-wise, I (again) took "track_functions" as a template 
there, assuming that jumping into pgstat_report_toast_activity() and 
only then checking if "track_toast = on" isn't too costly (we call 
pgstat_init_function_usage() and pgstat_end_function_usage() a _lot_).


I did miss though that
INSTR_TIME_SET_CURRENT(time_spent);
should be called right after entering pgstat_report_toast_activity(), as 
that might need additional clock ticks for setting up the hash etc.

That's fixed now.

What I can't assess is the cost of the unconditional call to 
INSTR_TIME_SET_CURRENT(start_time) in both toast_tuple_try_compression() 
and toast_tuple_externalize().


Would it be wise (cheaper) to add a check like
if (pgstat_track_toast)
before querying the system clock?



Any hints on how to write meaningful tests would be much appreciated!
I.e., will a test


I searched hints from PG sources, and I thought that modules/ subdirectory can 
be used.
Following is the example:
https://github.com/postgres/postgres/tree/master/src/test/modules/commit_ts

I attached a patch to create a new test. Could you rewrite the sql and the 
output file?


Thanks for the kickstart ;-)

Some tests (as meaningful as they may get, I'm afraid) are now in
src/test/modules/track_toast/.
"make check-world" executes them successfully, although only after I 
introduced a "SELECT pg_sleep(1);" to them.



pg_stat_toast_v0.4.patch attached.

Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/.gitignore b/.gitignore
index 794e35b73c..ca2ee84742 100644
--- a/.gitignore
+++ b/.gitignore
@@ -31,6 +31,8 @@ win32ver.rc
 *.exe
 lib*dll.def
 lib*.pc
+tags
+/.vscode
 
 # Local excludes in root directory
 /GNUmakefile
@@ -42,3 +44,5 @@ lib*.pc
 /Debug/
 /Release/
 /tmp_install/
+.gitignore
+pg_stat_toast.*
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..fa40befc16 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7668,6 +7668,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for (a) column(s).
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..32d7818096 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4969,6 +4980,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variab

[PATCH] pg_stat_toast v0.3

2021-12-20 Thread Gunnar &quot;Nick&quot; Bluth

Am 12.12.21 um 17:20 schrieb Gunnar "Nick" Bluth:

Hello -hackers!

Please have a look at the attached patch, which implements some 
statistics for TOAST.


The attached v0.3 includes
* a proper GUC "track_toast" incl. postgresql.conf.sample line
* gathering timing information
* the system view "pg_stat_toast"
 * naming improvements more than welcome!
 * columns "storagemethod" and "compressmethod" added as per Hayato 
Kuroda's suggestion
* documentation (pointing out the potential impacts as per Andres 
Freund's reservations)


Any hints on how to write meaningful tests would be much appreciated!
I.e., will a test

INSERTing some long text to a column raises the view counts and 
"compressedsize" is smaller than "originalsize"


suffice?!?

Cheers & best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - CatoFrom e4f6b8a9b398ff08f76e0c02ca7a1201062beb78 Mon Sep 17 00:00:00 2001
From: "Gunnar \"Nick\" Bluth" 
Date: Mon, 20 Dec 2021 14:09:36 +0100
Subject: [PATCH] pg_stat_toast v0.3


* adds timing information to stats and view
* adds system view
* adds documentation
* adds GUC parameter to postgresql.conf

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..fa40befc16 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7668,6 +7668,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for (a) column(s).
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..32d7818096 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4969,6 +4980,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+   Attribute (column) number in the relation
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+   Name of the relation
+  
+ 
+
+ 
+  
+   attname name
+  
+  
+   Name of the attribute (column)
+  
+ 
+
+ 
+  
+   storagemethod char
+  
+  
+   Storage method of the attribute
+  
+ 
+
+ 
+  
+   externalized bigint
+  
+  
+   Number of times this attribute was externalized (pushed to TOAST relation)
+  
+ 
+
+ 
+  
+   compressmethod char
+  
+  
+   Compression method of the attribute (empty means default)
+  
+ 
+
+ 
+  
+   compressattempts bigint
+  
+  
+   Number of times this attribute was compressed
+  
+ 
+
+ 
+  
+   compresssuccesses bigint
+  
+  
+   Number of times the 

Re: [PATCH] pg_stat_toast

2021-12-20 Thread Gunnar &quot;Nick&quot; Bluth

Am 20.12.2021 um 04:20 schrieb kuroda.hay...@fujitsu.com:

Dear Gunnar,


Hi Kuroda-San!


postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test  SELECT
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
generate_series(0,10) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]+--
schemaname   | public
reloid   | 16829
attnum   | 2
relname  | test
attname  | lz4
externalizations | 0
compressions | 11
compressionsuccesses | 11
compressionsizesum   | 6299710
originalsizesum  | 320403204
-[ RECORD 2 ]+--
schemaname   | public
reloid   | 16829
attnum   | 3
relname  | test
attname  | std
externalizations | 0
compressions | 11
compressionsuccesses | 11
compressionsizesum   | 8198819
originalsizesum  | 320403204


I'm not sure about TOAST, but currently compressions are configurable:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294

How about adding a new attribute "method" to pg_stat_toast?
ToastAttrInfo *attr->tai_compression represents how compress the data,
so I think it's easy to add.
Or, is it not needed because pg_attr has information?


That information could certainly be included in the view, grabbing the 
information from pg_attribute.attcompression. It probably should!


I guess the next step will be to include that view in the catalog 
anyway, so I'll do that next.


Thx for the feedback!
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: [PATCH] pg_stat_toast

2021-12-13 Thread Gunnar &quot;Nick&quot; Bluth

Am 13.12.21 um 00:41 schrieb Andres Freund:

Hi,

On 2021-12-13 00:00:23 +0100, Gunnar "Nick" Bluth wrote:

Regarding stats size; it adds one PgStat_BackendToastEntry
(PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 bytes or
something in that ballpark) per TOASTable attribute, I can't see that make
any system break sweat ;-)


That's actually a lot. The problem is that all the stats data for a database
is loaded into private memory for each connection to that database, and that
the stats collector regularly writes out all the stats data for a database.


My understanding is that the stats file is only pulled into the backend 
when the SQL functions (for the view) are used (see 
"pgstat_fetch_stat_toastentry()").


Otherwise, a backend just initializes an empty hash, right?

Of which I reduced the initial size from 512 to 32 for the below tests 
(I guess the "truth" lies somewhere in between here), along with making 
the GUC parameter an actual GUC parameter and disabling the elog() calls 
I scattered all over the place ;-) for the v0.2 patch attached.



A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with and
without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 ms. So
at least the call overhead seems to be neglectible.


Yea, you'd probably need a few more tables and a few more connections for it
to have a chance of mattering meaningfully.


So, I went ahead and
* set up 2 clusters with "track_toast" off and on resp.
* created 100 DBs
 * each with 100 tables
 * with one TOASTable column in each table
 * filling those with 32000 bytes of md5 garbage

These clusters sum up to ~ 2GB each, so differences should _start to_ 
show up, I reckon.


$ du -s testdb*
2161208 testdb
2163240 testdb_tracking

$ du -s testdb*/pg_stat
4448testdb/pg_stat
4856testdb_tracking/pg_stat

The db_*.stat files are 42839 vs. 48767 bytes each (so confirmed, the 
differences do show).



No idea if this is telling us anything, tbth, but the 
/proc//smaps_rollup for a backend serving one of these DBs look 
like this ("0 kB" lines omitted):


track_toast OFF
===
Rss:   12428 kB
Pss:5122 kB
Pss_Anon:   1310 kB
Pss_File:   2014 kB
Pss_Shmem:  1797 kB
Shared_Clean:   5864 kB
Shared_Dirty:   3500 kB
Private_Clean:  1088 kB
Private_Dirty:  1976 kB
Referenced:11696 kB
Anonymous:  2120 kB

track_toast ON (view not called yet):
=
Rss:   12300 kB
Pss:4883 kB
Pss_Anon:   1309 kB
Pss_File:   1888 kB
Pss_Shmem:  1685 kB
Shared_Clean:   6040 kB
Shared_Dirty:   3468 kB
Private_Clean:   896 kB
Private_Dirty:  1896 kB
Referenced:11572 kB
Anonymous:  2116 kB

track_toast ON (view called):
=
Rss:   15408 kB
Pss:7482 kB
Pss_Anon:   2083 kB
Pss_File:   2572 kB
Pss_Shmem:  2826 kB
Shared_Clean:   6616 kB
Shared_Dirty:   3532 kB
Private_Clean:  1472 kB
Private_Dirty:  3788 kB
Referenced:14704 kB
Anonymous:  2884 kB

That backend used some memory for displaying the result too, of course...

A backend with just two TOAST columns in one table (filled with 
1.000.001 rows) looks like this before and after calling the 
"pg_stat_toast" view:

Rss:  146208 kB
Pss:  116181 kB
Pss_Anon:   2050 kB
Pss_File:   2787 kB
Pss_Shmem:111342 kB
Shared_Clean:   6636 kB
Shared_Dirty:  45928 kB
Private_Clean:  1664 kB
Private_Dirty: 91980 kB
Referenced:   145532 kB
Anonymous:  2844 kB

Rss:  147736 kB
Pss:  103296 kB
Pss_Anon:   2430 kB
Pss_File:   3147 kB
Pss_Shmem: 97718 kB
Shared_Clean:   6992 kB
Shared_Dirty:  74056 kB
Private_Clean:  1984 kB
Private_Dirty: 64704 kB
Referenced:   147092 kB
Anonymous:  3224 kB

After creating 10.000 more tables (view shows 10.007 rows now), before 
and after calling "TABLE pg_stat_toast":

Rss:   13816 kB
Pss:4898 kB
Pss_Anon:   1314 kB
Pss_File:   1755 kB
Pss_Shmem:  1829 kB
Shared_Clean:   5972 kB
Shared_Dirty:   5760 kB
Private_Clean:   832 kB
Private_Dirty:  1252 kB
Referenced:13088 kB
Anonymous:  2124 kB

Rss:  126816 kB
Pss:   55213 kB
Pss_Anon:   5383 kB
Pss_File:   2615 kB
Pss_Shmem: 47215 kB
Shared_Clean:   6460 kB
Shared_Dirty: 113028 kB
Private_Clean:  1600 kB
Private_Dirty:  5728 kB
Referenced:   126112 kB
Anonymous:  6184 kB


That DB's stat-file is now 4.119.254 bytes (3.547.439 without track_toast).

After VACUUM ANALYZE, the size goes up to 5.919.812 (5

Re: [PATCH] pg_stat_toast

2021-12-12 Thread Gunnar &quot;Nick&quot; Bluth

Am 12.12.21 um 22:52 schrieb Andres Freund:

Hi,

On 2021-12-12 17:20:58 +0100, Gunnar "Nick" Bluth wrote:

Please have a look at the attached patch, which implements some statistics
for TOAST.

The idea (and patch) have been lurking here for quite a while now, so I
decided to dust it off, rebase it to HEAD and send it out for review today.

A big shoutout to Georgios Kokolatos, who gave me a crash course in PG
hacking, some very useful hints and valueable feedback early this year.

I'd like to get some feedback about the general idea, approach, naming etc.
before refining this further.


I'm worried about the additional overhead this might impose. For some workload
it'll substantially increase the amount of stats traffic. Have you tried to
qualify the overheads? Both in stats size and in stats management overhead?


I'd lie if I claimed so...

Regarding stats size; it adds one PgStat_BackendToastEntry 
(PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 
bytes or something in that ballpark) per TOASTable attribute, I can't 
see that make any system break sweat ;-)


A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with 
and without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 
ms. So at least the call overhead seems to be neglectible.


Obviously, this was really a quick run and doesn't reflect real life.
I'll have the machine run some reasonable tests asap, also looking at 
stat size, of course!


Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




[PATCH] pg_stat_toast

2021-12-12 Thread Gunnar &quot;Nick&quot; Bluth

Hello -hackers!

Please have a look at the attached patch, which implements some 
statistics for TOAST.


The idea (and patch) have been lurking here for quite a while now, so I 
decided to dust it off, rebase it to HEAD and send it out for review today.


A big shoutout to Georgios Kokolatos, who gave me a crash course in PG 
hacking, some very useful hints and valueable feedback early this year.


I'd like to get some feedback about the general idea, approach, naming 
etc. before refining this further.


I'm not a C person and I s**k at git, so please be kind with me! ;-)
Also, I'm not subscribed here, so a CC would be much appreciated!


Why gather TOAST statistics?

TOAST is transparent and opaque at the same time.
Whilst we know that it's there and we know _that_ it works, we cannot 
generally tell _how well_ it works.


What we can't answer (easily) are questions like e.g.
- how many datums have been externalized?
- how many datums have been compressed?
- how often has a compression failed (resulted in no space saving)?
- how effective is the compression algorithm used on a column?
- how much time did the DB spend compressing/decompressing TOAST values?

The patch adds some functionality that will eventually be able to answer 
these (and probably more) questions.


Currently, #1 - #4 can be answered based on the view contained in 
"pg_stats_toast.sql":


postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test  SELECT 
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM 
generate_series(0,10) x(i);

postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]+--
schemaname   | public
reloid   | 16829
attnum   | 2
relname  | test
attname  | lz4
externalizations | 0
compressions | 11
compressionsuccesses | 11
compressionsizesum   | 6299710
originalsizesum  | 320403204
-[ RECORD 2 ]+--
schemaname   | public
reloid   | 16829
attnum   | 3
relname  | test
attname  | std
externalizations | 0
compressions | 11
compressionsuccesses | 11
compressionsizesum   | 8198819
originalsizesum  | 320403204


Implementation
==
I added some callbacks in backend/access/table/toast_helper.c to 
"pgstat_report_toast_activity" in backend/postmaster/pgstat.c.


The latter (and the other additions there) are essentially 1:1 copies of 
the function statistics.


Those were the perfect template, as IMHO the TOAST activities (well, 
what we're interested in at least) are very much comparable to function 
calls:
a) It doesn't really matter if the TOASTed data was committed, as "the 
damage is done" (i.e. CPU cycles were used) anyway
b) The information can (thus/best) be stored on DB level, no need to 
touch the relation or attribute statistics


I didn't find anything that could have been used as a hash key, so the
PgStat_StatToastEntry
uses the shiny new
PgStat_BackendAttrIdentifier
(containing relid Oid, attr int).

For persisting in the statsfile, I chose the identifier 'O' (as 'T' was 
taken).



What's working?
===
- Gathering of TOAST externalization and compression events
- collecting the sizes before and after compression
- persisting in statsfile
- not breaking "make check"
- not crashing anything (afaict)

What's missing (yet)?
===
- proper definition of the "pgstat_track_toast" GUC
- Gathering of times (for compression [and decompression?])
- improve "pg_stat_toast" view and include it in the catalog
- documentation (obviously)
- proper naming (of e.g. the hash key type, functions, view columns etc.)
- would it be necessary to implement overflow protection for the size & 
time sums?


Thanks in advance & best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - CatoFrom 05f81229fd2e81b8674649c8fd1e3857e2406fcd Mon Sep 17 00:00:00 2001
From: "Gunnar \"Nick\" Bluth" 
Date: Sun, 12 Dec 2021 15:35:35 +0100
Subject: [PATCH] initial patch of pg_stat_toast for -hackers

---
 pg_stat_toast.sql   |  19 ++
 src/backend/access/table/toast_helper.c |  19 ++
 src/backend/postmaster/pgstat.c | 311 +++-
 src/backend/utils/adt/pgstatfuncs.c |  60 +
 src/include/catalog/pg_proc.dat |  21 ++
 src/include/pgstat.h| 109 +
 6 files changed, 533 insertions(+), 6 deletions(-)
 create mode 100644 pg_stat_toast.sql

diff --git a/pg_stat_toast.sql b/pg_stat_toast.sql
new file mode 100644
index 00..1c653254ab
--- /dev/nu