Re: SLRU statistics

2020-05-14 Thread Tom Lane
Robert Haas  writes:
> I'm confused by why SLRU statistics are reported by messages sent to
> the stats collector rather than by just directly updating shared
> memory.

It would be better to consider that as an aspect of the WIP stats
collector redesign, rather than inventing a bespoke mechanism for
SLRU stats that's outside the stats collector (and, no doubt,
would have its own set of bugs).  We don't need to invent even more
pathways for this sort of data.

regards, tom lane




Re: SLRU statistics

2020-05-14 Thread Robert Haas
On Thu, May 14, 2020 at 2:27 AM Fujii Masao  wrote:
> Therefore what we can do right now seems to make checkpointer report the SLRU
> stats while it's running. Other issues need more time to investigate...
> Thought?

I'm confused by why SLRU statistics are reported by messages sent to
the stats collector rather than by just directly updating shared
memory. For database or table statistics there can be any number of
objects and we can't know in advance how many there will be, so we
can't set aside shared memory for the stats in advance. For SLRUs,
there's no such problem. Just having the individual backends
periodically merge their accumulated backend-local counters into the
shared counters seems like it would be way simpler and more
performant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: SLRU statistics

2020-05-14 Thread Fujii Masao




On 2020/05/07 13:47, Fujii Masao wrote:



On 2020/05/03 1:59, Tomas Vondra wrote:

On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:


...



Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?


No. For now what I'm in mind are just checkpointer, autovacuum worker,
logical replication walsender and parallel query worker. Seems logical
replication worker and syncer have sent slru stats via pgstat_report_stat().


Let me go back to this topic. As far as I read the code again, logical
walsender reports the stats at the exit via pgstat_beshutdown_hook()
process-exit callback. But it doesn't report the stats while it's running.
This is not the problem only for SLRU stats. We would need to consider
how to handle the stats by logical walsender, separately from SLRU stats.

Autovacuum worker reports the stats at the exit via pgstat_beshutdown_hook(),
too. Unlike logical walsender, autovacuum worker is not the process that
basically keeps running during the service. It exits after it does vacuum or
analyze. So it's not bad to report the stats only at the exit, in autovacuum
worker case. There is no need to add extra code for SLRU stats report by
autovacuum worker.

Parallel worker is in the same situation as autovacuum worker. Its lifetime
is basically short and its stats is reported at the exit via
pgstat_beshutdown_hook().

pgstat_beshutdown_hook() reports the stats only when MyDatabaseId is valid.
Checkpointer calls pgstat_beshutdown_hook() at the exit, but doesn't report
the stats because its MyDatabaseId is invalid. Also it doesn't report the SLRU
stats while it's running. As we discussed upthread, we need to make
checkpointer call pgstat_send_slru() just after pgstat_send_bgwriter().

However even if we do this, the stats updated during the last checkpointer's
activity (e.g., shutdown checkpoint) seems not reported because
pgstat_beshutdown_hook() doesn't report the stats in checkpointer case.
Do we need to address this issue? If yes, we would need to change
pgstat_beshutdown_hook() or register another checkpointer-exit callback
that sends the stats. Thought?

Startup process is in the same situation as checkpointer process. It reports
the stats neither at the exit nor whle it's running. But, like logical
walsender, this seems not the problem only for SLRU stats. We would need to
consider how to handle the stats by startup process, separately from SLRU
stats.

Therefore what we can do right now seems to make checkpointer report the SLRU
stats while it's running. Other issues need more time to investigate...
Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-05-13 Thread Tom Lane
Fujii Masao  writes:
> On 2020/05/14 2:44, Tom Lane wrote:
>> I got through check-world with the assertion shown that we are not
>> counting any SLRU operations in the postmaster.  Don't know if we
>> want to commit that or not --- any thoughts?

> +1 to add this assertion because basically it's not good thing
> to access to SLRU at postmaster and we may want to fix that if found.
> At least if we get rid of the SLRUStats initialization code,
> IMO it's better to add this assertion and ensure that postmaster
> doesn't update the SLRU stats counters.

Seems reasonable --- I'll include it.

It might be nice to have similar assertions protecting BgWriterStats.
But given that we've made that public to be hacked on directly by several
different modules, I'm not sure that there's any simple way to do that.

regards, tom lane




Re: SLRU statistics

2020-05-13 Thread Fujii Masao




On 2020/05/14 2:44, Tom Lane wrote:

I wrote:

(IIRC only the Async module is doing that.)



Hm, maybe we can fix that.


Yeah, it's quite easy to make async.c postpone its first write to the
async SLRU.  This seems like a win all around, because many installations
don't use NOTIFY and so will never need to do that work at all.  In
installations that do use notify, this costs an extra instruction or
two per NOTIFY, but that's down in the noise.


Looks good to me. Thanks for the patch!


I got through check-world with the assertion shown that we are not
counting any SLRU operations in the postmaster.  Don't know if we
want to commit that or not --- any thoughts?


+1 to add this assertion because basically it's not good thing
to access to SLRU at postmaster and we may want to fix that if found.
At least if we get rid of the SLRUStats initialization code,
IMO it's better to add this assertion and ensure that postmaster
doesn't update the SLRU stats counters.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-05-13 Thread Ranier Vilela
Em qua., 13 de mai. de 2020 às 11:46, Fujii Masao <
masao.fu...@oss.nttdata.com> escreveu:

>
>
> On 2020/05/13 23:26, Tom Lane wrote:
> > Fujii Masao  writes:
> >> On 2020/05/13 17:21, Tomas Vondra wrote:
> >>> On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
> >>>> Also I found another minor issue; SLRUStats has not been initialized
> to 0
> >>>> and which could update the counters unexpectedly. Attached patch fixes
> >>>> this issue.
> >
> >> Pushed both. Thanks!
> >
> > Why is that necessary?  A static variable is defined by C to start off
> > as zeroes.
>
> Because SLRUStats is not a static variable. No?
>
IMHO, BgWriterStats  have the same problem, shouldn't the same be done?

/* Initialize BgWriterStats to zero */
MemSet(, 0, sizeof(BgWriterStats));

/* Initialize SLRU statistics to zero */
memset(, 0, sizeof(SLRUStats));

regards,
Ranier Vilela


Re: SLRU statistics

2020-05-13 Thread Tom Lane
I wrote:
>>> (IIRC only the Async module is doing that.)

> Hm, maybe we can fix that.

Yeah, it's quite easy to make async.c postpone its first write to the
async SLRU.  This seems like a win all around, because many installations
don't use NOTIFY and so will never need to do that work at all.  In
installations that do use notify, this costs an extra instruction or
two per NOTIFY, but that's down in the noise.

I got through check-world with the assertion shown that we are not
counting any SLRU operations in the postmaster.  Don't know if we
want to commit that or not --- any thoughts?

regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 0c9d20e..6ecea01 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -200,7 +200,10 @@ typedef struct QueuePosition
 	} while (0)
 
 #define QUEUE_POS_EQUAL(x,y) \
-	 ((x).page == (y).page && (x).offset == (y).offset)
+	((x).page == (y).page && (x).offset == (y).offset)
+
+#define QUEUE_POS_IS_ZERO(x) \
+	((x).page == 0 && (x).offset == 0)
 
 /* choose logically smaller QueuePosition */
 #define QUEUE_POS_MIN(x,y) \
@@ -515,7 +518,6 @@ void
 AsyncShmemInit(void)
 {
 	bool		found;
-	int			slotno;
 	Size		size;
 
 	/*
@@ -562,13 +564,6 @@ AsyncShmemInit(void)
 		 * During start or reboot, clean out the pg_notify directory.
 		 */
 		(void) SlruScanDirectory(AsyncCtl, SlruScanDirCbDeleteAll, NULL);
-
-		/* Now initialize page zero to empty */
-		LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE);
-		slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(QUEUE_HEAD));
-		/* This write is just to verify that pg_notify/ is writable */
-		SimpleLruWritePage(AsyncCtl, slotno);
-		LWLockRelease(AsyncCtlLock);
 	}
 }
 
@@ -1470,9 +1465,18 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 */
 	queue_head = QUEUE_HEAD;
 
-	/* Fetch the current page */
+	/*
+	 * If this is the first write since the postmaster started, we need to
+	 * initialize the first page of the async SLRU.  Otherwise, the current
+	 * page should be initialized already, so just fetch it.
+	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
-	slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId);
+	if (QUEUE_POS_IS_ZERO(queue_head))
+		slotno = SimpleLruZeroPage(AsyncCtl, pageno);
+	else
+		slotno = SimpleLruReadPage(AsyncCtl, pageno, true,
+   InvalidTransactionId);
+
 	/* Note we mark the page dirty before writing in it */
 	AsyncCtl->shared->page_dirty[slotno] = true;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 80a06e5..e3c808b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -6729,6 +6729,8 @@ slru_entry(int slru_idx)
 {
 	Assert((slru_idx >= 0) && (slru_idx < SLRU_NUM_ELEMENTS));
 
+	Assert(IsUnderPostmaster || !IsPostmasterEnvironment);
+
 	return [slru_idx];
 }
 


Re: SLRU statistics

2020-05-13 Thread Tom Lane
Fujii Masao  writes:
> But since the counter that postmaster incremented is propagated to
> child processes via fork, it should be zeroed at postmaster or the
> beginning of child process? Otherwise that counter always starts
> with non-zero in child process.

Yes, if the postmaster is incrementing these counts then we would
have to reset them at the start of each child process.  I share
Alvaro's feeling that that's bad and we don't want to do it.

>> (IIRC only the Async module is doing that.)

Hm, maybe we can fix that.

regards, tom lane




Re: SLRU statistics

2020-05-13 Thread Fujii Masao




On 2020/05/14 1:14, Alvaro Herrera wrote:

On 2020-May-14, Fujii Masao wrote:


So I tried the similar test again and found that postmaster seems to be
able to increment the counters unless I'm missing something.
For example,

 frame #2: 0x00010d93845f 
postgres`pgstat_count_slru_page_zeroed(ctl=0x00010de27320) at 
pgstat.c:6739:2
 frame #3: 0x00010d5922ba 
postgres`SimpleLruZeroPage(ctl=0x00010de27320, pageno=0) at slru.c:290:2
 frame #4: 0x00010d6b9ae2 postgres`AsyncShmemInit at async.c:568:12
 frame #5: 0x00010d9da9a6 postgres`CreateSharedMemoryAndSemaphores at 
ipci.c:265:2
 frame #6: 0x00010d93f679 postgres`reset_shared at postmaster.c:2664:2
 frame #7: 0x00010d93d253 postgres`PostmasterMain(argc=3, 
argv=0x7fad56402e00) at postmaster.c:1008:2


Umm.  I have the feeling that we'd rather avoid these updates in
postmaster, per our general rule that postmaster should not touch shared
memory.  However, it might be that it's okay in this case, as it only
happens just as shmem is being "created", so other processes have not
yet had any time to mess things up.


But since the counter that postmaster incremented is propagated to
child processes via fork, it should be zeroed at postmaster or the
beginning of child process? Otherwise that counter always starts
with non-zero in child process.


(IIRC only the Async module is
doing that.)


Yes, as far as I do the test.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-05-13 Thread Alvaro Herrera
On 2020-May-14, Fujii Masao wrote:

> So I tried the similar test again and found that postmaster seems to be
> able to increment the counters unless I'm missing something.
> For example,
> 
> frame #2: 0x00010d93845f 
> postgres`pgstat_count_slru_page_zeroed(ctl=0x00010de27320) at 
> pgstat.c:6739:2
> frame #3: 0x00010d5922ba 
> postgres`SimpleLruZeroPage(ctl=0x00010de27320, pageno=0) at slru.c:290:2
> frame #4: 0x00010d6b9ae2 postgres`AsyncShmemInit at async.c:568:12
> frame #5: 0x00010d9da9a6 postgres`CreateSharedMemoryAndSemaphores at 
> ipci.c:265:2
> frame #6: 0x00010d93f679 postgres`reset_shared at postmaster.c:2664:2
> frame #7: 0x00010d93d253 postgres`PostmasterMain(argc=3, 
> argv=0x7fad56402e00) at postmaster.c:1008:2

Umm.  I have the feeling that we'd rather avoid these updates in
postmaster, per our general rule that postmaster should not touch shared
memory.  However, it might be that it's okay in this case, as it only
happens just as shmem is being "created", so other processes have not
yet had any time to mess things up.  (IIRC only the Async module is
doing that.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-05-13 Thread Fujii Masao




On 2020/05/14 0:38, Tom Lane wrote:

Tomas Vondra  writes:

I think it counts as a variable with "static storage duration" per 6.7.8
(para 10), see [1]. I wasn't aware of this either, but it probably means
the memset is unnecessary.
Also, it seems a bit strange/confusing to handle this differently from
BgWriterStats. And that worked fine without the init for years ...


Yeah, exactly.

There might be merit in memsetting it if we thought that it could have
become nonzero in the postmaster during a previous shmem cycle-of-life.
But the postmaster really shouldn't be accumulating such counts; and
if it is, then we have a bigger problem, because child processes would
be inheriting those counts via fork.


In my previous test, I thought I observed that the counters are already
updated at the beginning of some processes. So I thought that
the counters need to be initialized. Sorry, that's my fault...

So I tried the similar test again and found that postmaster seems to be
able to increment the counters unless I'm missing something.
For example,

frame #2: 0x00010d93845f 
postgres`pgstat_count_slru_page_zeroed(ctl=0x00010de27320) at 
pgstat.c:6739:2
frame #3: 0x00010d5922ba 
postgres`SimpleLruZeroPage(ctl=0x00010de27320, pageno=0) at slru.c:290:2
frame #4: 0x00010d6b9ae2 postgres`AsyncShmemInit at async.c:568:12
frame #5: 0x00010d9da9a6 postgres`CreateSharedMemoryAndSemaphores at 
ipci.c:265:2
frame #6: 0x00010d93f679 postgres`reset_shared at postmaster.c:2664:2
frame #7: 0x00010d93d253 postgres`PostmasterMain(argc=3, 
argv=0x7fad56402e00) at postmaster.c:1008:2

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-05-13 Thread Tom Lane
Tomas Vondra  writes:
> I think it counts as a variable with "static storage duration" per 6.7.8
> (para 10), see [1]. I wasn't aware of this either, but it probably means
> the memset is unnecessary.
> Also, it seems a bit strange/confusing to handle this differently from
> BgWriterStats. And that worked fine without the init for years ...

Yeah, exactly.

There might be merit in memsetting it if we thought that it could have
become nonzero in the postmaster during a previous shmem cycle-of-life.
But the postmaster really shouldn't be accumulating such counts; and
if it is, then we have a bigger problem, because child processes would
be inheriting those counts via fork.

I think this change is unnecessary and should be reverted to avoid
future confusion about whether somehow it is necessary.

regards, tom lane




Re: SLRU statistics

2020-05-13 Thread Tomas Vondra

On Wed, May 13, 2020 at 11:46:30PM +0900, Fujii Masao wrote:



On 2020/05/13 23:26, Tom Lane wrote:

Fujii Masao  writes:

On 2020/05/13 17:21, Tomas Vondra wrote:

On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:

Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.



Pushed both. Thanks!


Why is that necessary?  A static variable is defined by C to start off
as zeroes.


Because SLRUStats is not a static variable. No?



I think it counts as a variable with "static storage duration" per 6.7.8
(para 10), see [1]. I wasn't aware of this either, but it probably means
the memset is unnecessary.

Also, it seems a bit strange/confusing to handle this differently from
BgWriterStats. And that worked fine without the init for years ...


[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-05-13 Thread Tomas Vondra

On Wed, May 13, 2020 at 11:01:47AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

On Wed, May 13, 2020 at 10:26:39AM -0400, Tom Lane wrote:

Why is that necessary?  A static variable is defined by C to start off
as zeroes.



But is it a static variable? It's not declared as 'static' but maybe we
can assume it inits to zeroes anyway? I see we do that for
BgWriterStats.


Sorry, by "static" I meant "statically allocated", not "private to
this module".  I'm sure the C standard has some more precise terminology
for this distinction, but I forget what it is.



Ah, I see. I'm no expert in reading C standard (or any other standard),
but a quick google search yielded this section of C99 standard:

-
If an object that has static storage duration is not initialized
explicitly, then:

- if it has pointer type, it is initialized to a null pointer;

- if it has arithmetic type, it is initialized to (positive or unsigned)
  zero;

- if it is an aggregate, every member is initialized (recursively)
  according to these rules;

- if it is au nion, the first named member is initialized (recursively)
  according to these rules
-

I assume the SLRU variable counts as aggregate, with members having
arithmetic types. In which case it really should be initialized to 0.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-05-13 Thread Tom Lane
Tomas Vondra  writes:
> On Wed, May 13, 2020 at 10:26:39AM -0400, Tom Lane wrote:
>> Why is that necessary?  A static variable is defined by C to start off
>> as zeroes.

> But is it a static variable? It's not declared as 'static' but maybe we
> can assume it inits to zeroes anyway? I see we do that for
> BgWriterStats.

Sorry, by "static" I meant "statically allocated", not "private to
this module".  I'm sure the C standard has some more precise terminology
for this distinction, but I forget what it is.

regards, tom lane




Re: SLRU statistics

2020-05-13 Thread Tomas Vondra

On Wed, May 13, 2020 at 10:26:39AM -0400, Tom Lane wrote:

Fujii Masao  writes:

On 2020/05/13 17:21, Tomas Vondra wrote:

On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:

Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.



Pushed both. Thanks!


Why is that necessary?  A static variable is defined by C to start off
as zeroes.



But is it a static variable? It's not declared as 'static' but maybe we
can assume it inits to zeroes anyway? I see we do that for
BgWriterStats.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-05-13 Thread Fujii Masao




On 2020/05/13 23:26, Tom Lane wrote:

Fujii Masao  writes:

On 2020/05/13 17:21, Tomas Vondra wrote:

On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:

Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.



Pushed both. Thanks!


Why is that necessary?  A static variable is defined by C to start off
as zeroes.


Because SLRUStats is not a static variable. No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-05-13 Thread Tom Lane
Fujii Masao  writes:
> On 2020/05/13 17:21, Tomas Vondra wrote:
>> On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>>> Also I found another minor issue; SLRUStats has not been initialized to 0
>>> and which could update the counters unexpectedly. Attached patch fixes
>>> this issue.

> Pushed both. Thanks!

Why is that necessary?  A static variable is defined by C to start off
as zeroes.

regards, tom lane




Re: SLRU statistics

2020-05-13 Thread Fujii Masao




On 2020/05/13 17:21, Tomas Vondra wrote:

On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:



On 2020/05/07 13:47, Fujii Masao wrote:



On 2020/05/03 1:59, Tomas Vondra wrote:

On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:


...



Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?


No. For now what I'm in mind are just checkpointer, autovacuum worker,
logical replication walsender and parallel query worker. Seems logical
replication worker and syncer have sent slru stats via pgstat_report_stat().


I've looked at places calling pgstat_send_* functions, and I found
thsese places:

src/backend/postmaster/bgwriter.c

- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.

src/backend/postmaster/checkpointer.c

- This is what we're already discussing here.

src/backend/postmaster/pgarch.c

- Seems irrelevant.


I'm a bit puzzled why we're not sending any stats from walsender, which
I suppose could do various stuff during logical decoding.


Not sure why, but that seems an oversight...


Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.


This is minor issue, but basically it's better to fix that before
v13 beta1 release. So barring any objection, I will commit the patch.

+    values[8] = Int64GetDatum(stat.stat_reset_timestamp);

Also I found another small issue: pg_stat_get_slru() returns the timestamp
when pg_stat_slru was reset by using Int64GetDatum(). This works maybe
because the timestamp is also int64. But TimestampTzGetDatum() should
be used here, instead. Patch attached. Thought?



I agree with both fixes.


Pushed both. Thanks!

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-05-13 Thread Tomas Vondra

On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:



On 2020/05/07 13:47, Fujii Masao wrote:



On 2020/05/03 1:59, Tomas Vondra wrote:

On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:


...



Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?


No. For now what I'm in mind are just checkpointer, autovacuum worker,
logical replication walsender and parallel query worker. Seems logical
replication worker and syncer have sent slru stats via pgstat_report_stat().


I've looked at places calling pgstat_send_* functions, and I found
thsese places:

src/backend/postmaster/bgwriter.c

- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.

src/backend/postmaster/checkpointer.c

- This is what we're already discussing here.

src/backend/postmaster/pgarch.c

- Seems irrelevant.


I'm a bit puzzled why we're not sending any stats from walsender, which
I suppose could do various stuff during logical decoding.


Not sure why, but that seems an oversight...


Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.


This is minor issue, but basically it's better to fix that before
v13 beta1 release. So barring any objection, I will commit the patch.

+   values[8] = Int64GetDatum(stat.stat_reset_timestamp);

Also I found another small issue: pg_stat_get_slru() returns the timestamp
when pg_stat_slru was reset by using Int64GetDatum(). This works maybe
because the timestamp is also int64. But TimestampTzGetDatum() should
be used here, instead. Patch attached. Thought?



I agree with both fixes.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: SLRU statistics

2020-05-13 Thread Fujii Masao



On 2020/05/07 13:47, Fujii Masao wrote:



On 2020/05/03 1:59, Tomas Vondra wrote:

On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:


...



Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?


No. For now what I'm in mind are just checkpointer, autovacuum worker,
logical replication walsender and parallel query worker. Seems logical
replication worker and syncer have sent slru stats via pgstat_report_stat().


I've looked at places calling pgstat_send_* functions, and I found
thsese places:

src/backend/postmaster/bgwriter.c

- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.

src/backend/postmaster/checkpointer.c

- This is what we're already discussing here.

src/backend/postmaster/pgarch.c

- Seems irrelevant.


I'm a bit puzzled why we're not sending any stats from walsender, which
I suppose could do various stuff during logical decoding.


Not sure why, but that seems an oversight...


Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.


This is minor issue, but basically it's better to fix that before
v13 beta1 release. So barring any objection, I will commit the patch.

+   values[8] = Int64GetDatum(stat.stat_reset_timestamp);

Also I found another small issue: pg_stat_get_slru() returns the timestamp
when pg_stat_slru was reset by using Int64GetDatum(). This works maybe
because the timestamp is also int64. But TimestampTzGetDatum() should
be used here, instead. Patch attached. Thought?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/utils/adt/pgstatfuncs.c 
b/src/backend/utils/adt/pgstatfuncs.c
index 446044609e..6b47617328 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1757,7 +1757,7 @@ pg_stat_get_slru(PG_FUNCTION_ARGS)
values[5] = Int64GetDatum(stat.blocks_exists);
values[6] = Int64GetDatum(stat.flush);
values[7] = Int64GetDatum(stat.truncate);
-   values[8] = Int64GetDatum(stat.stat_reset_timestamp);
+   values[8] = TimestampTzGetDatum(stat.stat_reset_timestamp);
 
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
}


Re: SLRU statistics

2020-05-06 Thread Fujii Masao



On 2020/05/03 1:59, Tomas Vondra wrote:

On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:


...



Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?


No. For now what I'm in mind are just checkpointer, autovacuum worker,
logical replication walsender and parallel query worker. Seems logical
replication worker and syncer have sent slru stats via pgstat_report_stat().


I've looked at places calling pgstat_send_* functions, and I found
thsese places:

src/backend/postmaster/bgwriter.c

- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.

src/backend/postmaster/checkpointer.c

- This is what we're already discussing here.

src/backend/postmaster/pgarch.c

- Seems irrelevant.


I'm a bit puzzled why we're not sending any stats from walsender, which
I suppose could do various stuff during logical decoding.


Not sure why, but that seems an oversight...


Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f8105c6eb..416f86fbd6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2900,6 +2900,9 @@ pgstat_initialize(void)
MyBEEntry = [MaxBackends + MyAuxProcType];
}
 
+   /* Initialize SLRU statistics to zero */
+   memset(, 0, sizeof(SLRUStats));
+
/* Set up a process-exit hook to clean up */
on_shmem_exit(pgstat_beshutdown_hook, 0);
 }


Re: SLRU statistics

2020-05-02 Thread Tomas Vondra

On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:


...



Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?



I've looked at places calling pgstat_send_* functions, and I found
thsese places:

src/backend/postmaster/bgwriter.c

- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.

src/backend/postmaster/checkpointer.c

- This is what we're already discussing here.

src/backend/postmaster/pgarch.c

- Seems irrelevant.


I'm a bit puzzled why we're not sending any stats from walsender, which
I suppose could do various stuff during logical decoding.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-05-02 Thread Tomas Vondra

On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:



On 2020/05/02 9:08, Tomas Vondra wrote:

On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:



On 2020/05/01 3:19, Tomas Vondra wrote:

On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:



On 2020/04/02 9:41, Tomas Vondra wrote:

Hi,

I've pushed this after some minor cleanup and improvements.


+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+  "multixact_offset", "multixact_member",
+  "oldserxid", "pg_xact", "subtrans",
+  "other" /* has to be last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.



Yeah, I think I got confused and accidentally added both "clog" and
"pg_xact". I'll get "pg_xact" removed.


Thanks!



OK, pushed. Thanks!


Thanks a lot!

But, like the patch that I attached in the previous email does,
"pg_xact" should be removed from the description of pg_stat_reset_slru()
in monitoring.sgml.



Whooops. My bad, will fix.


Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?



Yep, that's what I proposed.


In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.



Probably. Do you have any other process type in mind?


Atsushi-san reported another issue in pg_stat_slru.
You're planning to work on that?
https://postgr.es/m/cacz0uyfe16pjzxqyatn53mspym7dgmpyl3djljjpw69gmcc...@mail.gmail.com



Yes, I'll investigate.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-05-02 Thread Fujii Masao




On 2020/05/02 9:08, Tomas Vondra wrote:

On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:



On 2020/05/01 3:19, Tomas Vondra wrote:

On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:



On 2020/04/02 9:41, Tomas Vondra wrote:

Hi,

I've pushed this after some minor cleanup and improvements.


+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+  "multixact_offset", "multixact_member",
+  "oldserxid", "pg_xact", "subtrans",
+  "other" /* has to be last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.



Yeah, I think I got confused and accidentally added both "clog" and
"pg_xact". I'll get "pg_xact" removed.


Thanks!



OK, pushed. Thanks!


Thanks a lot!

But, like the patch that I attached in the previous email does,
"pg_xact" should be removed from the description of pg_stat_reset_slru()
in monitoring.sgml.


Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.


Yes.


I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?


Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?

In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.

Atsushi-san reported another issue in pg_stat_slru.
You're planning to work on that?
https://postgr.es/m/cacz0uyfe16pjzxqyatn53mspym7dgmpyl3djljjpw69gmcc...@mail.gmail.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-05-01 Thread Tomas Vondra

On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:



On 2020/05/01 3:19, Tomas Vondra wrote:

On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:



On 2020/04/02 9:41, Tomas Vondra wrote:

Hi,

I've pushed this after some minor cleanup and improvements.


+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+  "multixact_offset", "multixact_member",
+  "oldserxid", "pg_xact", "subtrans",
+  "other" /* has to be last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.



Yeah, I think I got confused and accidentally added both "clog" and
"pg_xact". I'll get "pg_xact" removed.


Thanks!



OK, pushed. Thanks!


Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.



Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.

I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-04-30 Thread Fujii Masao




On 2020/05/01 3:19, Tomas Vondra wrote:

On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:



On 2020/04/02 9:41, Tomas Vondra wrote:

Hi,

I've pushed this after some minor cleanup and improvements.


+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+  "multixact_offset", "multixact_member",
+  "oldserxid", "pg_xact", "subtrans",
+  "other" /* has to be last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.



Yeah, I think I got confused and accidentally added both "clog" and
"pg_xact". I'll get "pg_xact" removed.


Thanks!

Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: SLRU statistics

2020-04-30 Thread Fujii Masao



On 2020/04/02 9:41, Tomas Vondra wrote:

Hi,

I've pushed this after some minor cleanup and improvements.


+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+ "multixact_offset", 
"multixact_member",
+ "oldserxid", "pg_xact", 
"subtrans",
+ "other" /* has to be 
last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6562cc400b..ba6d8d2123 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3483,7 +3483,7 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i
predefined list (async, clog,
commit_timestamp, 
multixact_offset,
multixact_member, oldserxid,
-   pg_xact, subtrans and
+   subtrans and
other) resets counters for only that entry.
Names not included in this list are treated as other.
   
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 50eea2e8a8..2ba3858d31 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -152,7 +152,7 @@ PgStat_MsgBgWriter BgWriterStats;
  */
 static char *slru_names[] = {"async", "clog", "commit_timestamp",
  "multixact_offset", 
"multixact_member",
- "oldserxid", 
"pg_xact", "subtrans",
+ "oldserxid", 
"subtrans",
  "other" /* has to be 
last */};
 
 /* number of elemenents of slru_name array */


Re: SLRU statistics

2020-04-07 Thread Tomas Vondra

On Tue, Apr 07, 2020 at 05:01:37PM +0530, Kuntal Ghosh wrote:

Hello Tomas,

On Thu, Apr 2, 2020 at 5:59 PM Tomas Vondra
 wrote:

Thank you for the patch, pushed.


In SimpleLruReadPage_ReadOnly, we first try to find the SLRU page in
shared buffer under shared lock, then conditionally visit
SimpleLruReadPage if reading is necessary. IMHO, we should update
hit_count if we can find the buffer in SimpleLruReadPage_ReadOnly
directly. Am I missing something?

Attached a patch for the same.



Yes, I think that's correct - without this we fail to account for
(possibly) a quite significant number of hits. Thanks for the report,
I'll get this pushed later today.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: SLRU statistics

2020-04-07 Thread Kuntal Ghosh
Hello Tomas,

On Thu, Apr 2, 2020 at 5:59 PM Tomas Vondra
 wrote:
> Thank you for the patch, pushed.
>
In SimpleLruReadPage_ReadOnly, we first try to find the SLRU page in
shared buffer under shared lock, then conditionally visit
SimpleLruReadPage if reading is necessary. IMHO, we should update
hit_count if we can find the buffer in SimpleLruReadPage_ReadOnly
directly. Am I missing something?

Attached a patch for the same.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Update-stats-in-SimpleLruReadPage_ReadOnly.patch
Description: Binary data


Re: SLRU statistics

2020-04-02 Thread Tomas Vondra

On Thu, Apr 02, 2020 at 02:04:10AM +, Shinoda, Noriyoshi (PN Japan A 
Delivery) wrote:

Hi,

Thank you for developing great features.
The attached patch is a small fix to the committed documentation for the data 
type name of blks_hit column.



Thank you for the patch, pushed.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: SLRU statistics

2020-04-01 Thread Shinoda, Noriyoshi (PN Japan A Delivery)
Hi, 

Thank you for developing great features.
The attached patch is a small fix to the committed documentation for the data 
type name of blks_hit column.

Best regards,
Noriyoshi Shinoda

-Original Message-
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] 
Sent: Thursday, April 2, 2020 9:42 AM
To: Alvaro Herrera 
Cc: Masahiko Sawada ; 
tsunakawa.ta...@fujitsu.com; pgsql-hack...@postgresql.org
Subject: Re: SLRU statistics

Hi,

I've pushed this after some minor cleanup and improvements.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




monitoring_pg_stat_slru.diff
Description: monitoring_pg_stat_slru.diff


Re: SLRU statistics

2020-04-01 Thread Tomas Vondra

Hi,

I've pushed this after some minor cleanup and improvements.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-03-27 Thread Tomas Vondra

Hi,

here is a bit improved version of the patch - I've been annoyed by how
the resetting works (per-entry timestamp, but resetting all entries) so
I've added a new function pg_stat_reset_slru() that allows resetting
either all entries or just one entry (identified by name). So

SELECT pg_stat_reset_slru('clog');

resets just "clog" SLRU counters, while

SELECT pg_stat_reset_slru(NULL);

resets all entries.

I've also done a bit of benchmarking, to see if this has measurable
impact (in which case it might deserve a new GUC), and I think it's not
measurable. I've used a tiny unlogged table (single row).

CREATE UNLOGGED TABLE t (a int);
INSERT INTO t VALUES (1);

and then short pgbench runs with a single client, updatint the row. I've
been unable to measure any regression, it's all well within 1% so noise.
But perhaps there's some other benchmark that I should do?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1a065c21a9a791909cd1ca752db5aaf1f814fe37 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 26 Mar 2020 20:52:26 +0100
Subject: [PATCH] Collect SLRU statistics

Adds a new system view pg_stats_slru with stats about SLRU caches, and
a function pg_stat_reset_slru() to reset either all counters or just
counters for a single SLRU.

There is no SLRU registry this patch could use, so it simply uses the
SLRU name (which is also used for LWLock tranche name) as an identifier,
and a predefined list of SLRU names, and an extra "others" entry for
SLRUs without a dedicated entry. Presumably, the number of extensions
defining their own SLRU is very small.

Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Discussion: 
https://www.postgresql.org/message-id/flat/20200119143707.gyinppnigokesjok@development
---
 doc/src/sgml/monitoring.sgml |  97 +
 src/backend/access/transam/slru.c|  23 ++
 src/backend/catalog/system_views.sql |  14 ++
 src/backend/postmaster/pgstat.c  | 300 +++
 src/backend/utils/adt/pgstatfuncs.c  |  91 
 src/include/catalog/pg_proc.dat  |  15 ++
 src/include/pgstat.h |  65 ++
 src/test/regress/expected/rules.out  |  10 +
 8 files changed, 615 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 270178d57e..7ba0dbee6a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -575,6 +575,13 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  
pg_stat_slrupg_stat_slru
+  One row per SLRU, showing statistics of operations. See
+for details.
+  
+ 
+
 

   
@@ -3254,6 +3261,76 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i

   
 
+  
+   The pg_stat_slru view will contain
+   one row for each tracked SLRU cache, showing statistics about access
+   to cached pages.
+  
+
+  
+   pg_stat_slru View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ name
+ name
+ name of the SLRU
+
+
+ blks_zeroed
+ bigint
+ Number of blocks zeroed during initializations
+
+
+ blks_hit
+ biging
+ Number of times disk blocks were found already in the SLRU,
+  so that a read was not necessary (this only includes hits in the
+  SLRU, not the operating system's file system cache)
+ 
+
+
+ blks_read
+ bigint
+ Number of disk blocks read for this SLRU
+
+
+ blks_written
+ bigint
+ Number of disk blocks written for this SLRU
+
+
+ blks_exists
+ bigint
+ Number of blocks checked for existence for this SLRU
+
+
+ flushes
+ bigint
+ Number of flushes of dirty data for this SLRU
+
+
+ truncates
+ bigint
+ Number of truncates for this SLRU
+
+
+ stats_reset
+ timestamp with time zone
+ Time at which these statistics were last reset
+
+   
+   
+  
+
   
The pg_stat_user_functions view will contain
one row for each tracked function, showing statistics about executions of
@@ -3378,6 +3455,26 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i
function can be granted to others)
   
  
+
+ 
+  
pg_stat_reset_slru(text)pg_stat_reset_slru
+  void
+  
+   Reset statistics either for a single SLRU or all SLRUs in the cluster
+   to zero (requires superuser privileges by default, but EXECUTE for this
+   function can be granted to others).
+   Calling pg_stat_reset_slru(NULL) will zero all the
+   counters shown in the pg_stat_slru view for
+   all SLRU caches.
+   Calling pg_stat_reset_slru(name) with names from a
+   predefined list (async, clo

Re: SLRU statistics

2020-03-26 Thread Tomas Vondra
nt_page_read(ctl);
+
return slotno;
}
 }
@@ -596,6 +607,9 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
boolresult;
off_t   endpos;
 
+   /* update the stats counter of checked pages */
+   pgstat_slru_count_page_exists(ctl);
+
SlruFileName(ctl, path, segno);
 
fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
@@ -730,6 +744,9 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, 
SlruFlush fdata)
charpath[MAXPGPATH];
int fd = -1;
 
+   /* update the stats counter of written pages */
+   pgstat_slru_count_page_written(ctl);
+
/*
 * Honor the write-WAL-before-data rule, if appropriate, so that we do 
not
 * write out data before associated WAL records.  This is the same 
action
@@ -1125,6 +1142,9 @@ SimpleLruFlush(SlruCtl ctl, bool allow_redirtied)
int i;
boolok;
 
+   /* update the stats counter of flushes */
+   pgstat_slru_count_flush(ctl);
+
/*
 * Find and write dirty pages
 */
@@ -1186,6 +1206,9 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
SlruShared  shared = ctl->shared;
int slotno;
 
+   /* update the stats counter of truncates */
+   pgstat_slru_count_truncate(ctl);
+
/*
 * The cutoff point is the start of the segment containing cutoffPage.
 */
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 5a6dc61630..7dba85dd07 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -792,6 +792,19 @@ CREATE VIEW pg_stat_replication AS
 JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
 LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
+CREATE VIEW pg_stat_slru AS
+SELECT
+s.name,
+s.blks_zeroed,
+s.blks_hit,
+s.blks_read,
+s.blks_written,
+s.blks_exists,
+s.flushes,
+s.truncates,
+s.stats_reset
+FROM pg_stat_get_slru() s;
+
 CREATE VIEW pg_stat_wal_receiver AS
 SELECT
 s.pid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4763c24be9..895efb4cd2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,25 @@ char  *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * SLRU statistics counters (unused in other processes) stored directly in
+ * stats structure so it can be sent without needing to copy things around.
+ * We assume this inits to zeroes.
+ *
+ * There's a separte entry for each SLRU we have. The "other" entry is used
+ * for all SLRUs without an explicit entry (e.g. SLRUs in extensions).
+ */
+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+ "multixact_offset", 
"multixact_member",
+ "oldserxid", 
"pg_xact", "subtrans",
+ "other" /* has to be 
last */};
+
+#define SLRU_NUM_ELEMENTS  (sizeof(slru_names) / sizeof(char *))
+static int slru_index(char *name);
+
+/* entries in the same order as slru_names */
+PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
+
 /* --
  * Local data
  * --
@@ -255,6 +274,7 @@ static int  localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
  * List of OIDs of databases we need to write out.  If an entry is InvalidOid,
@@ -297,6 +317,7 @@ static bool pgstat_db_requested(Oid databaseid);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
+static void pgstat_send_slru(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
 
 static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
@@ -324,6 +345,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int 
len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int 
len);
@@ -907,6 +929,9 @@ pgstat_report_stat(bool force)
 
/* Now, send function statistics */
pgstat_send_funcstats()

Re: SLRU statistics

2020-02-29 Thread Tomas Vondra
am/subtrans.c
@@ -193,7 +193,7 @@ SUBTRANSShmemInit(void)
SubTransCtl->PagePrecedes = SubTransPagePrecedes;
SimpleLruInit(SubTransCtl, "subtrans", NUM_SUBTRANS_BUFFERS, 0,
  SubtransControlLock, "pg_subtrans",
- LWTRANCHE_SUBTRANS_BUFFERS);
+ LWTRANCHE_SUBTRANS_BUFFERS, SLRU_SUBTRANS);
/* Override default assumption that writes should be fsync'd */
SubTransCtl->do_fsync = false;
 }
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index f681aafcf9..10677ff778 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -792,6 +792,20 @@ CREATE VIEW pg_stat_replication AS
 JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
 LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
+CREATE VIEW pg_stat_slru AS
+SELECT
+s.name,
+s.pages_zero,
+s.pages_hit,
+s.pages_read,
+s.pages_write,
+s.pages_exists,
+s.io_error,
+s.flushes,
+s.truncates,
+s.stat_reset
+FROM pg_stat_get_slru() s;
+
 CREATE VIEW pg_stat_wal_receiver AS
 SELECT
 s.pid,
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index dae939a4ab..f442125ead 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -552,7 +552,8 @@ AsyncShmemInit(void)
 */
AsyncCtl->PagePrecedes = asyncQueuePagePrecedes;
SimpleLruInit(AsyncCtl, "async", NUM_ASYNC_BUFFERS, 0,
- AsyncCtlLock, "pg_notify", 
LWTRANCHE_ASYNC_BUFFERS);
+ AsyncCtlLock, "pg_notify", 
LWTRANCHE_ASYNC_BUFFERS,
+ SLRU_ASYNC);
/* Override default assumption that writes should be fsync'd */
AsyncCtl->do_fsync = false;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 462b4d7e06..37e1312fa2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,9 @@ char   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/* TODO */
+PgStat_MsgSlru SlruStats[SLRU_OTHER + 1];
+
 /* --
  * Local data
  * --
@@ -255,6 +258,7 @@ static int  localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_SlruStats slruStats[SLRU_OTHER + 1];
 
 /*
  * List of OIDs of databases we need to write out.  If an entry is InvalidOid,
@@ -297,6 +301,7 @@ static bool pgstat_db_requested(Oid databaseid);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
+static void pgstat_send_slru(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
 
 static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
@@ -324,6 +329,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int 
len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_slru(PgStat_MsgSlru *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int 
len);
@@ -907,6 +913,9 @@ pgstat_report_stat(bool force)
 
/* Now, send function statistics */
pgstat_send_funcstats();
+
+   /* Finally send SLRU statistics */
+   pgstat_send_slru();
 }
 
 /*
@@ -1337,6 +1346,8 @@ pgstat_reset_shared_counters(const char *target)
msg.m_resettarget = RESET_ARCHIVER;
else if (strcmp(target, "bgwriter") == 0)
msg.m_resettarget = RESET_BGWRITER;
+   else if (strcmp(target, "slru") == 0)
+   msg.m_resettarget = RESET_SLRU;
else
ereport(ERROR,
    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -2622,6 +2633,23 @@ pgstat_fetch_global(void)
 }
 
 
+/*
+ * -
+ * pgstat_fetch_slru() -
+ *
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * a pointer to the slru statistics struct.
+ * -
+ */
+PgStat_SlruStats *
+pgstat_fetch_slru(void)
+{
+   backend_read_statsfile();
+
+   return slruStats;
+}
+
+
 /* 
  * Functions for management of the shared-memory PgBackendStatus array
  * 
@@ -4413,6 +4441,46 @@ pgstat_send_bgwriter(void)
MemSet(, 0, sizeof(BgWriterStats));
 }
 
+/* -

Re: SLRU statistics

2020-02-29 Thread Alvaro Herrera
On 2020-Feb-29, Tomas Vondra wrote:

> Did we actually remove track-enabling GUCs? I think we still have
> 
>  - track_activities
>  - track_counts
>  - track_io_timing
>  - track_functions
> 
> But maybe I'm missing something?

Hm I remembered we removed the one for row-level stats
(track_row_stats), but what we really did is merge it with block-level
stats (track_block_stats) into track_counts -- commit 48f7e6439568. 
Funnily enough, if you disable that autovacuum won't work, so I'm not
sure it's a very useful tunable.  And it definitely has more overhead
than what this new GUC would have.

> > I find SlruType pretty odd, and the accompanying "if" list in
> > pg_stat_get_slru() correspondingly so.  Would it be possible to have
> > each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
> > and query that, somehow.  (I don't think we have an array of SlruCtlData
> > anywhere though, so this might be a useless idea.)
> 
> Well, maybe. We could have a system to register SLRUs dynamically, but
> the trick here is that by having a fixed predefined number of SLRUs
> simplifies serialization in pgstat.c and so on. I don't think the "if"
> branches in pg_stat_get_slru() are particularly ugly, but maybe we could
> replace the enum with a registry of structs, something like rmgrlist.h.
> It seems like an overkill to me, though.

Yeah, maybe we don't have to fix that now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-02-29 Thread Tomas Vondra

On Fri, Feb 28, 2020 at 08:19:18PM -0300, Alvaro Herrera wrote:

On 2020-Jan-21, Tomas Vondra wrote:


On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:



> I've not tested the performance impact but perhaps we might want to
> disable these counter by default and controlled by a GUC. And similar
> to buffer statistics it might be better to inline
> pgstat_count_slru_page_xxx function for better performance.

Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
something like track_slru GUC.


I disagree with adding a GUC.  If a performance impact can be measured
let's turn the functions to static inline, as already proposed.  My
guess is that pgstat_count_slru_page_hit() is the only candidate for
that; all the other paths involve I/O or lock acquisition or even WAL
generation, so the impact won't be measurable anyhow.  We removed
track-enabling GUCs years ago.



Did we actually remove track-enabling GUCs? I think we still have

 - track_activities
 - track_counts
 - track_io_timing
 - track_functions

But maybe I'm missing something?

That being said, I'm not sure we need to add a GUC. I'll do some
measurements and we'll see. Maybe the statis inline will me enough.


BTW, this comment:
/* update the stats counter of pages found in shared 
buffers */

is not strictly true, because we don't use what we normally call "shared
buffers"  for SLRUs.



Oh, right. Will fix.


Patch applies cleanly.  I suggest to move the page_miss() call until
after SlruRecentlyUsed(), for consistency with the other case.



OK.


I find SlruType pretty odd, and the accompanying "if" list in
pg_stat_get_slru() correspondingly so.  Would it be possible to have
each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
and query that, somehow.  (I don't think we have an array of SlruCtlData
anywhere though, so this might be a useless idea.)



Well, maybe. We could have a system to register SLRUs dynamically, but
the trick here is that by having a fixed predefined number of SLRUs
simplifies serialization in pgstat.c and so on. I don't think the "if"
branches in pg_stat_get_slru() are particularly ugly, but maybe we could
replace the enum with a registry of structs, something like rmgrlist.h.
It seems like an overkill to me, though.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-02-28 Thread Alvaro Herrera
On 2020-Jan-21, Tomas Vondra wrote:

> On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:

> > I've not tested the performance impact but perhaps we might want to
> > disable these counter by default and controlled by a GUC. And similar
> > to buffer statistics it might be better to inline
> > pgstat_count_slru_page_xxx function for better performance.
> 
> Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
> something like track_slru GUC.

I disagree with adding a GUC.  If a performance impact can be measured
let's turn the functions to static inline, as already proposed.  My
guess is that pgstat_count_slru_page_hit() is the only candidate for
that; all the other paths involve I/O or lock acquisition or even WAL
generation, so the impact won't be measurable anyhow.  We removed
track-enabling GUCs years ago.

BTW, this comment:
/* update the stats counter of pages found in shared 
buffers */

is not strictly true, because we don't use what we normally call "shared
buffers"  for SLRUs.

Patch applies cleanly.  I suggest to move the page_miss() call until
after SlruRecentlyUsed(), for consistency with the other case.

I find SlruType pretty odd, and the accompanying "if" list in
pg_stat_get_slru() correspondingly so.  Would it be possible to have
each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
and query that, somehow.  (I don't think we have an array of SlruCtlData
anywhere though, so this might be a useless idea.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-01-21 Thread Tomas Vondra

On Tue, Jan 21, 2020 at 06:24:29AM +, tsunakawa.ta...@fujitsu.com
wrote:

From: Tomas Vondra 

You're right the users can't really take advantage of this - my
primary motivation was providing a feedback for devs, benchmarking
etc. That might have been done with DEBUG messages or something, but
this seems more convenient.


Understood.  I'm in favor of adding performance information even if it
doesn't make sense for users (like other DBMSs sometimes do.)  One
concern is that all the PostgreSQL performance statistics have been
useful so far for tuning in some way, and this may become the first
exception.  Do we describe the SLRU stats view in the manual, or hide
it only for PG devs and support staff?



Yes, the pg_stat_slru view should be described in a manual. That's
missing from the patch.




I think it's unclear how desirable / necessary it is to allow users
to tweak those caches. I don't think we should have a GUC for
everything, but maybe there's some sort of heuristics to determine
the size. The assumption is we actually find practical workloads
where the size of these SLRUs is a performance issue.


I understood that the new performance statistics are expected to reveal
what SLRUs need to be tunable and/or implemented with a different
mechanism like shared buffers.



Right. It's certainly meant to provide information for further tuning.
I'm just saying it's targeted more at developers, at least initially.
Maybe we'll end up with GUCs, maybe we'll choose other approaches for
some SLRUs. I don't have an opinion on that yet.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-01-21 Thread Tomas Vondra

On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:

On Tue, 21 Jan 2020 at 01:38, Tomas Vondra  wrote:


On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:
>From: Tomas Vondra 
>> One of the stats I occasionally wanted to know are stats for the SLRU
>> stats (we have couple of those - clog, subtrans, ...). So here is a WIP
>> version of a patch adding that.


+1


>
>How can users take advantage of this information?  I think we also need
>the ability to set the size of SLRU buffers.  (I want to be freed from
>the concern about the buffer shortage by setting the buffer size to its
>maximum.  For example, CLOG would be only 1 GB.)
>

You're right the users can't really take advantage of this - my primary
motivation was providing a feedback for devs, benchmarking etc. That
might have been done with DEBUG messages or something, but this seems
more convenient.


I've not tested the performance impact but perhaps we might want to
disable these counter by default and controlled by a GUC. And similar
to buffer statistics it might be better to inline
pgstat_count_slru_page_xxx function for better performance.



Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
something like track_slru GUC.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-01-21 Thread Masahiko Sawada
On Tue, 21 Jan 2020 at 01:38, Tomas Vondra  wrote:
>
> On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:
> >From: Tomas Vondra 
> >> One of the stats I occasionally wanted to know are stats for the SLRU
> >> stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> >> version of a patch adding that.

+1

> >
> >How can users take advantage of this information?  I think we also need
> >the ability to set the size of SLRU buffers.  (I want to be freed from
> >the concern about the buffer shortage by setting the buffer size to its
> >maximum.  For example, CLOG would be only 1 GB.)
> >
>
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

I've not tested the performance impact but perhaps we might want to
disable these counter by default and controlled by a GUC. And similar
to buffer statistics it might be better to inline
pgstat_count_slru_page_xxx function for better performance.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: SLRU statistics

2020-01-20 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

Understood.  I'm in favor of adding performance information even if it doesn't 
make sense for users (like other DBMSs sometimes do.)  One concern is that all 
the PostgreSQL performance statistics have been useful so far for tuning in 
some way, and this may become the first exception.  Do we describe the SLRU 
stats view in the manual, or hide it only for PG devs and support staff?


> I think it's unclear how desirable / necessary it is to allow users to
> tweak those caches. I don't think we should have a GUC for everything,
> but maybe there's some sort of heuristics to determine the size. The
> assumption is we actually find practical workloads where the size of
> these SLRUs is a performance issue.

I understood that the new performance statistics are expected to reveal what 
SLRUs need to be tunable and/or implemented with a different mechanism like 
shared buffers.


Regards
Takayuki Tsunakawa







Re: SLRU statistics

2020-01-20 Thread Tomas Vondra

On Mon, Jan 20, 2020 at 03:01:36PM -0300, Alvaro Herrera wrote:

On 2020-Jan-20, Tomas Vondra wrote:


On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
> > One of the stats I occasionally wanted to know are stats for the SLRU
> > stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> > version of a patch adding that.
>
> How can users take advantage of this information?  I think we also need
> the ability to set the size of SLRU buffers.  (I want to be freed from
> the concern about the buffer shortage by setting the buffer size to its
> maximum.  For example, CLOG would be only 1 GB.)

You're right the users can't really take advantage of this - my primary
motivation was providing a feedback for devs, benchmarking etc. That
might have been done with DEBUG messages or something, but this seems
more convenient.


I think the stats are definitely needed if we keep the current code.
I've researched some specific problems in this code, such as the need
for more subtrans SLRU buffers; IIRC it was pretty painful to figure out
what the problem was without counters, and it'd have been trivial with
them.



Right. Improving our ability to monitor/measure things is the goal of
this patch.


I think it's unclear how desirable / necessary it is to allow users to
tweak those caches. I don't think we should have a GUC for everything,
but maybe there's some sort of heuristics to determine the size. The
assumption is we actually find practical workloads where the size of
these SLRUs is a performance issue.


I expect we'll eventually realize the need for changes in this area.
Either configurability in the buffer pool sizes, or moving them to be
part of shared_buffers (IIRC Thomas Munro had a patch for this.)
Example: SLRUs like pg_commit and pg_subtrans have higher buffer
consumption as the range of open transactions increases; for many users
this is not a concern and they can live with the default values.

(I think when pg_commit (née pg_clog) buffers were increased, we should
have increased pg_subtrans buffers to match.)



Quite possibly, yes. All I'm saying is that it's not something I intend
to address with this patch. It's quite possible the solutions will be
different for each SLRU, and that will require more research.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-01-20 Thread Alvaro Herrera
On 2020-Jan-20, Tomas Vondra wrote:

> On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:
> > From: Tomas Vondra 
> > > One of the stats I occasionally wanted to know are stats for the SLRU
> > > stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> > > version of a patch adding that.
> > 
> > How can users take advantage of this information?  I think we also need
> > the ability to set the size of SLRU buffers.  (I want to be freed from
> > the concern about the buffer shortage by setting the buffer size to its
> > maximum.  For example, CLOG would be only 1 GB.)
> 
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

I think the stats are definitely needed if we keep the current code.
I've researched some specific problems in this code, such as the need
for more subtrans SLRU buffers; IIRC it was pretty painful to figure out
what the problem was without counters, and it'd have been trivial with
them.

> I think it's unclear how desirable / necessary it is to allow users to
> tweak those caches. I don't think we should have a GUC for everything,
> but maybe there's some sort of heuristics to determine the size. The
> assumption is we actually find practical workloads where the size of
> these SLRUs is a performance issue.

I expect we'll eventually realize the need for changes in this area.
Either configurability in the buffer pool sizes, or moving them to be
part of shared_buffers (IIRC Thomas Munro had a patch for this.)
Example: SLRUs like pg_commit and pg_subtrans have higher buffer
consumption as the range of open transactions increases; for many users
this is not a concern and they can live with the default values.

(I think when pg_commit (née pg_clog) buffers were increased, we should
have increased pg_subtrans buffers to match.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SLRU statistics

2020-01-20 Thread Tomas Vondra

On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:

From: Tomas Vondra 

One of the stats I occasionally wanted to know are stats for the SLRU
stats (we have couple of those - clog, subtrans, ...). So here is a WIP
version of a patch adding that.


How can users take advantage of this information?  I think we also need
the ability to set the size of SLRU buffers.  (I want to be freed from
the concern about the buffer shortage by setting the buffer size to its
maximum.  For example, CLOG would be only 1 GB.)



You're right the users can't really take advantage of this - my primary
motivation was providing a feedback for devs, benchmarking etc. That
might have been done with DEBUG messages or something, but this seems
more convenient.

I think it's unclear how desirable / necessary it is to allow users to
tweak those caches. I don't think we should have a GUC for everything,
but maybe there's some sort of heuristics to determine the size. The
assumption is we actually find practical workloads where the size of
these SLRUs is a performance issue.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: SLRU statistics

2020-01-19 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> One of the stats I occasionally wanted to know are stats for the SLRU
> stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> version of a patch adding that.

How can users take advantage of this information?  I think we also need the 
ability to set the size of SLRU buffers.  (I want to be freed from the concern 
about the buffer shortage by setting the buffer size to its maximum.  For 
example, CLOG would be only 1 GB.)


Regards
Takayuki Tsunakawa







SLRU statistics

2020-01-19 Thread Tomas Vondra
rc/backend/commands/async.c b/src/backend/commands/async.c
index 9aa2b61600..0cc92a 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -552,7 +552,8 @@ AsyncShmemInit(void)
 */
AsyncCtl->PagePrecedes = asyncQueuePagePrecedes;
SimpleLruInit(AsyncCtl, "async", NUM_ASYNC_BUFFERS, 0,
- AsyncCtlLock, "pg_notify", 
LWTRANCHE_ASYNC_BUFFERS);
+ AsyncCtlLock, "pg_notify", 
LWTRANCHE_ASYNC_BUFFERS,
+ SLRU_ASYNC);
/* Override default assumption that writes should be fsync'd */
AsyncCtl->do_fsync = false;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486bebd..aa825029b1 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,9 @@ char   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/* TODO */
+PgStat_MsgSlru SlruStats[SLRU_OTHER + 1];
+
 /* --
  * Local data
  * --
@@ -255,6 +258,7 @@ static int  localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_SlruStats slruStats[SLRU_OTHER + 1];
 
 /*
  * List of OIDs of databases we need to write out.  If an entry is InvalidOid,
@@ -297,6 +301,7 @@ static bool pgstat_db_requested(Oid databaseid);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
+static void pgstat_send_slru(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
 
 static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
@@ -324,6 +329,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int 
len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_slru(PgStat_MsgSlru *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int 
len);
@@ -904,6 +910,9 @@ pgstat_report_stat(bool force)
 
/* Now, send function statistics */
pgstat_send_funcstats();
+
+   /* Finally send SLRU statistics */
+   pgstat_send_slru();
 }
 
 /*
@@ -2619,6 +2628,23 @@ pgstat_fetch_global(void)
 }
 
 
+/*
+ * -
+ * pgstat_fetch_slru() -
+ *
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * a pointer to the slru statistics struct.
+ * -
+ */
+PgStat_SlruStats *
+pgstat_fetch_slru(void)
+{
+   backend_read_statsfile();
+
+   return slruStats;
+}
+
+
 /* 
  * Functions for management of the shared-memory PgBackendStatus array
  * ----
@@ -4410,6 +4436,46 @@ pgstat_send_bgwriter(void)
MemSet(, 0, sizeof(BgWriterStats));
 }
 
+/* --
+ * pgstat_send_slru() -
+ *
+ * Send slru statistics to the collector
+ * --
+ */
+static void
+pgstat_send_slru(void)
+{
+   int i;
+
+   /* We assume this initializes to zeroes */
+   static const PgStat_MsgSlru all_zeroes;
+
+   for (i = 0; i <= SLRU_OTHER; i++)
+   {
+   /*
+* This function can be called even if nothing at all has 
happened. In
+* this case, avoid sending a completely empty message to the 
stats
+* collector.
+*/
+   if (memcmp([i], _zeroes, sizeof(PgStat_MsgSlru)) 
== 0)
+   continue;
+
+   /* set the SLRU type before each send */
+   SlruStats[i].m_type = i;
+
+   /*
+* Prepare and send the message
+*/
+   pgstat_setheader([i].m_hdr, PGSTAT_MTYPE_SLRU);
+   pgstat_send([i], sizeof(PgStat_MsgSlru));
+
+   /*
+* Clear out the statistics buffer, so it can be re-used.
+*/
+   MemSet([i], 0, sizeof(PgStat_MsgSlru));
+   }
+}
+
 
 /* --
  * PgstatCollectorMain() -
@@ -4602,6 +4668,10 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_bgwriter(_bgwriter, 
len);
break;
 
+   case PGSTAT_MTYPE_SLRU:
+   pgstat_recv_slru(_slru, len);
+   break;
+
case PGSTAT_MTYPE_FUNCSTAT:
pgstat_recv_funcstat(_funcstat, 
len);
break;
@@ -4832,6 +4902,7 @@ pg