Re: [HACKERS] More stats about skipped vacuums

2018-02-09 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 08 Feb 2018 18:21:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180208.182156.96551245.horiguchi.kyot...@lab.ntt.co.jp>
> I suppose that the problem has not been resolved yet..

I found several bugs during studying this but my conclusion here
is that the required decision here is that whether we regard the
unavailability of DSM as a fatal error as we do for out of
memory. Maybe we can go for the direction but just doing it
certainly let some buidfarm animals (at least anole? smew is not
found.) out of their lives.

I've not found the exact cause of the problem that regtest on the
bf animals always suceeded using sysv shmem but "postgres --boot"
by initdb alone can fail using the same mechanism. But regtest
seems to continue working if initdb sets max_connection to 20 or
more.  At least it suceeds for me with the values max_connection
= 20 and shared_buffers=50MB on centos.

Finally, I'd like to propose the followings.

 - kill dynamic_shared_memory_type = nune just now.

 * server stops at startup if DSM is not available.

 - Let initdb set max_connection = 20 as the fallback value in
   the case. (Another porposed patch) And regression should
   succeed with that.

If we are agreed on this, I will be able to go forward.


I want to have opinions on this from the expericed poeple.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] More stats about skipped vacuums

2018-02-08 Thread Kyotaro HORIGUCHI
At Thu, 08 Feb 2018 18:04:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180208.180415.112312013.horiguchi.kyot...@lab.ntt.co.jp>
> > > I suggest we remove support for dynamic_shared_memory_type = none first,
> > > and see if we get any complaints.  If we don't, then future patches can
> > > rely on it being present.
> > 
> > If we remove it in v11, it'd still be maybe a year from now before we'd
> > have much confidence from that alone that nobody cares.  I think the lack
> > of complaints about it in 9.6 and 10 is a more useful data point.
> 
> So that means that we are assumed to be able to rely on the
> existence of DSM at the present since over a year we had no
> complain despite the fact that DSM is silently turned on? And
> apart from that we are ready to remove 'none' from the options of
> dynamic_shared_memory_type right now?

I found the follwoing commit related to this.

| commit d41ab71712a4457ed39d5471b23949872ac91def
| Author: Robert Haas 
| Date:   Wed Oct 16 09:41:03 2013 -0400
| 
| initdb: Suppress dynamic shared memory when probing for max_connections.
| 
| This might not be the right long-term solution here, but it will
| hopefully turn the buildfarm green again.
| 
| Oversight noted by Andres Freund

The discussion is found here.

https://www.postgresql.org/message-id/ca+tgmoyhiigrcvssjhmbsebmof2zx_9_9rwd75cwvu99yrd...@mail.gmail.com

I suppose that the problem has not been resolved yet..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] More stats about skipped vacuums

2018-02-08 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 07 Feb 2018 16:59:20 -0500, Tom Lane  wrote in 
<3246.1518040...@sss.pgh.pa.us>
> Robert Haas  writes:
> > It seems to me that there was a thread where Tom proposed removing
> > support for dynamic_shared_memory_type = none.
> 
> I think you're recalling <32138.1502675...@sss.pgh.pa.us>, wherein
> I pointed out that
> 
> >>> Whether that's worth the trouble is debatable.  The current code
> >>> in initdb believes that every platform has some type of DSM support
> >>> (see choose_dsm_implementation).  Nobody's complained about that,
> >>> and it certainly works on every buildfarm animal.  So for all we know,
> >>> dynamic_shared_memory_type = none is broken already.
> 
> (That was in fact in the same thread Kyotaro-san just linked to about
> reimplementing the stats collector.)
> 
> It's still true that we've no reason to believe there are any supported
> platforms that haven't got some sort of DSM.  Performance might be a
> different question, of course ... but it's hard to believe that
> transferring stats through DSM wouldn't be better than writing them
> out to files.

Good to hear. Thanks.

> > I suggest we remove support for dynamic_shared_memory_type = none first,
> > and see if we get any complaints.  If we don't, then future patches can
> > rely on it being present.
> 
> If we remove it in v11, it'd still be maybe a year from now before we'd
> have much confidence from that alone that nobody cares.  I think the lack
> of complaints about it in 9.6 and 10 is a more useful data point.

So that means that we are assumed to be able to rely on the
existence of DSM at the present since over a year we had no
complain despite the fact that DSM is silently turned on? And
apart from that we are ready to remove 'none' from the options of
dynamic_shared_memory_type right now?

If I may rely on DSM, fallback stuff would not be required.


>   regards, tom lane

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] More stats about skipped vacuums

2018-02-07 Thread Tom Lane
Robert Haas  writes:
> It seems to me that there was a thread where Tom proposed removing
> support for dynamic_shared_memory_type = none.

I think you're recalling <32138.1502675...@sss.pgh.pa.us>, wherein
I pointed out that

>>> Whether that's worth the trouble is debatable.  The current code
>>> in initdb believes that every platform has some type of DSM support
>>> (see choose_dsm_implementation).  Nobody's complained about that,
>>> and it certainly works on every buildfarm animal.  So for all we know,
>>> dynamic_shared_memory_type = none is broken already.

(That was in fact in the same thread Kyotaro-san just linked to about
reimplementing the stats collector.)

It's still true that we've no reason to believe there are any supported
platforms that haven't got some sort of DSM.  Performance might be a
different question, of course ... but it's hard to believe that
transferring stats through DSM wouldn't be better than writing them
out to files.

> I suggest we remove support for dynamic_shared_memory_type = none first,
> and see if we get any complaints.  If we don't, then future patches can
> rely on it being present.

If we remove it in v11, it'd still be maybe a year from now before we'd
have much confidence from that alone that nobody cares.  I think the lack
of complaints about it in 9.6 and 10 is a more useful data point.

regards, tom lane



Re: [HACKERS] More stats about skipped vacuums

2018-02-07 Thread Robert Haas
On Tue, Feb 6, 2018 at 8:34 PM, Kyotaro HORIGUCHI
 wrote:
> Based on the reason, it fails to run when
> dynamic_shared_memory_type = none and it is accompanied by
> several cleanup complexities. The decision there is we should go
> for just using static shared memory rather than adding complexity
> for nothing. If it really needs to be expandable in the future,
> it's the time to use DSA. (But would still maintain a fallback
> stuff.)

It seems to me that there was a thread where Tom proposed removing
support for dynamic_shared_memory_type = none.  The main reason that I
included that option initially was because it seemed silly to risk
causing problems for users whose dynamic shared memory facilities
didn't work for the sake of a feature that, at the time (9.4), had no
in-core users.

But things have shifted a bit since then.  We have had few complaints
about dynamic shared memory causing portability problems (except for
performance: apparently some implementations perform better than
others on some systems, and we need support for huge pages, but
neither of those things are a reason to disable it) and we now have
in-core use that is enabled by default.  I suggest we remove support
for dynamic_shared_memory_type = none first, and see if we get any
complaints.  If we don't, then future patches can rely on it being
present.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More stats about skipped vacuums

2018-02-06 Thread Kyotaro HORIGUCHI
At Tue, 06 Feb 2018 19:24:37 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180206.192437.229464841.horiguchi.kyot...@lab.ntt.co.jp>
> At Tue, 6 Feb 2018 14:50:01 +0900, Masahiko Sawada  
> wrote in 
> > The implementation of autovacuum work-item has been changed (by commit
> > 31ae1638) because dynamic shared memory is not portable enough. IIUC
> > this patch is going to do the similar thing. Since stats collector is
> > also a critical part of the server, should we consider whether we can
> > change it? Or the portability problem is not relevant with this patch?
> 
> Thank you for the pointer. I digged out the following thread from
> this and the the patch seems to be the consequence of the
> discussion. I'll study it and think what to do on this.
> 
> https://www.postgresql.org/message-id/20170814005656.d5tvz464qkmz6...@alap3.anarazel.de

Done. The dominant reason for the ripping-off is that the
work-items array was allocated in a fixed-size DSA segment at
process startup time and wouldn't be resized.

Based on the reason, it fails to run when
dynamic_shared_memory_type = none and it is accompanied by
several cleanup complexities. The decision there is we should go
for just using static shared memory rather than adding complexity
for nothing. If it really needs to be expandable in the future,
it's the time to use DSA. (But would still maintain a fallback
stuff.)


Thinking of this, I think that this patch has a reason to use DSA
but needs some additional work.

- Fallback mechanism when dynamic_shared_memory_type = none

  This means that the old file based stuff lives alongside the
  DSA stuff. This is used when '= none' and on failure of DSA
  mechanism.

- Out-of transactoin clean up stuff

  Something like the following patch.

  
https://www.postgresql.org/message-id/20170814231638.x6vgnzlr7eww4bui@alvherre.pgsql

And as known problems:

- Make it use less LWLocks.

- DSHash shrink mechanism. Maybe need to find the way to detect
  the necessity to shrink.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] More stats about skipped vacuums

2018-02-06 Thread Kyotaro HORIGUCHI
At Tue, 6 Feb 2018 14:50:01 +0900, Masahiko Sawada  
wrote in 
> On Mon, Dec 11, 2017 at 8:15 PM, Kyotaro HORIGUCHI
>  wrote:
> > I considered dshash for pgstat.c and the attached is a *PoC*
> > patch, which is not full-fledged and just working under a not so
> > concurrent situation.
> >
> > - Made stats collector an auxiliary process. A crash of stats
> >   collector leads to a whole-server restarting.
> >
> > - dshash lacks capability of sequential scan so added it.
> >
> > - Also added snapshot function to dshash. It just copies
> >   unerlying DSA segments into local memory but currently it
> >   doesn't aquire dshash-level locks at all. I tried the same
> >   thing with resize but it leads to very quick exhaustion of
> >   LWLocks. An LWLock for the whole dshash would be required. (and
> >   it is also useful to resize() and sequential scan.
> >
> > - The current dshash doesn't shrink at all. Such a feature also
> >   will be required. (A server restart causes a shrink of hashes
> >   in the same way as before but bloat dshash requires copying
> >   more than necessary size of memory on takeing a snapshot.)
> >
> > The size of DSA is about 1MB at minimum. Copying entry-by-entry
> > into (non-ds) hash might be better than copying underlying DSA as
> > a whole, and DSA/DSHASH snapshot brings a kind of dirty..
> >
> >
> > Does anyone give me opinions or suggestions?
> >
> 
> The implementation of autovacuum work-item has been changed (by commit
> 31ae1638) because dynamic shared memory is not portable enough. IIUC
> this patch is going to do the similar thing. Since stats collector is
> also a critical part of the server, should we consider whether we can
> change it? Or the portability problem is not relevant with this patch?

Thank you for the pointer. I digged out the following thread from
this and the the patch seems to be the consequence of the
discussion. I'll study it and think what to do on this.

https://www.postgresql.org/message-id/20170814005656.d5tvz464qkmz6...@alap3.anarazel.de

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] More stats about skipped vacuums

2018-02-05 Thread Masahiko Sawada
On Mon, Dec 11, 2017 at 8:15 PM, Kyotaro HORIGUCHI
 wrote:
> At Mon, 27 Nov 2017 13:51:22 -0500, Robert Haas  wrote 
> in 
>> On Mon, Nov 27, 2017 at 1:49 AM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hmmm. Okay, we must make stats collector more effeicient if we
>> > want to have additional counters with smaller significance in the
>> > table stats. Currently sizeof(PgStat_StatTabEntry) is 168
>> > bytes. The whole of the patchset increases it to 232 bytes. Thus
>> > the size of a stat file for a database with 1 tables
>> > increases from about 1.7MB to 2.4MB.  DSM and shared dynahash is
>> > not dynamically expandable so placing stats on shared hash
>> > doesn't seem effective. Stats as a regular table could work but
>> > it seems too-much.
>>
>> dshash, which is already committed, is both DSM-based and dynamically
>> expandable.
>
> Yes, I forgot about that. We can just copy memory blocks to take
> a snapshot of stats.
>
>> > Is it acceptable that adding a new section containing this new
>> > counters, which is just loaded as a byte sequence and parsing
>> > (and filling the corresponding hash) is postponed until a counter
>> > in the section is really requested?  The new counters need to be
>> > shown in a separate stats view (maybe named pg_stat_vacuum).
>>
>> Still makes the stats file bigger.
>
> I considered dshash for pgstat.c and the attached is a *PoC*
> patch, which is not full-fledged and just working under a not so
> concurrent situation.
>
> - Made stats collector an auxiliary process. A crash of stats
>   collector leads to a whole-server restarting.
>
> - dshash lacks capability of sequential scan so added it.
>
> - Also added snapshot function to dshash. It just copies
>   unerlying DSA segments into local memory but currently it
>   doesn't aquire dshash-level locks at all. I tried the same
>   thing with resize but it leads to very quick exhaustion of
>   LWLocks. An LWLock for the whole dshash would be required. (and
>   it is also useful to resize() and sequential scan.
>
> - The current dshash doesn't shrink at all. Such a feature also
>   will be required. (A server restart causes a shrink of hashes
>   in the same way as before but bloat dshash requires copying
>   more than necessary size of memory on takeing a snapshot.)
>
> The size of DSA is about 1MB at minimum. Copying entry-by-entry
> into (non-ds) hash might be better than copying underlying DSA as
> a whole, and DSA/DSHASH snapshot brings a kind of dirty..
>
>
> Does anyone give me opinions or suggestions?
>

The implementation of autovacuum work-item has been changed (by commit
31ae1638) because dynamic shared memory is not portable enough. IIUC
this patch is going to do the similar thing. Since stats collector is
also a critical part of the server, should we consider whether we can
change it? Or the portability problem is not relevant with this patch?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] More stats about skipped vacuums

2017-12-11 Thread Kyotaro HORIGUCHI
At Mon, 27 Nov 2017 13:51:22 -0500, Robert Haas  wrote 
in 
> On Mon, Nov 27, 2017 at 1:49 AM, Kyotaro HORIGUCHI
>  wrote:
> > Hmmm. Okay, we must make stats collector more effeicient if we
> > want to have additional counters with smaller significance in the
> > table stats. Currently sizeof(PgStat_StatTabEntry) is 168
> > bytes. The whole of the patchset increases it to 232 bytes. Thus
> > the size of a stat file for a database with 1 tables
> > increases from about 1.7MB to 2.4MB.  DSM and shared dynahash is
> > not dynamically expandable so placing stats on shared hash
> > doesn't seem effective. Stats as a regular table could work but
> > it seems too-much.
> 
> dshash, which is already committed, is both DSM-based and dynamically
> expandable.

Yes, I forgot about that. We can just copy memory blocks to take
a snapshot of stats.

> > Is it acceptable that adding a new section containing this new
> > counters, which is just loaded as a byte sequence and parsing
> > (and filling the corresponding hash) is postponed until a counter
> > in the section is really requested?  The new counters need to be
> > shown in a separate stats view (maybe named pg_stat_vacuum).
> 
> Still makes the stats file bigger.

I considered dshash for pgstat.c and the attached is a *PoC*
patch, which is not full-fledged and just working under a not so
concurrent situation.

- Made stats collector an auxiliary process. A crash of stats
  collector leads to a whole-server restarting.

- dshash lacks capability of sequential scan so added it.

- Also added snapshot function to dshash. It just copies
  unerlying DSA segments into local memory but currently it
  doesn't aquire dshash-level locks at all. I tried the same
  thing with resize but it leads to very quick exhaustion of
  LWLocks. An LWLock for the whole dshash would be required. (and
  it is also useful to resize() and sequential scan.

- The current dshash doesn't shrink at all. Such a feature also
  will be required. (A server restart causes a shrink of hashes
  in the same way as before but bloat dshash requires copying
  more than necessary size of memory on takeing a snapshot.)

The size of DSA is about 1MB at minimum. Copying entry-by-entry
into (non-ds) hash might be better than copying underlying DSA as
a whole, and DSA/DSHASH snapshot brings a kind of dirty..


Does anyone give me opinions or suggestions?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f5c9c45384ec43734c0890dd875101defe6590bc Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Dec 2017 14:34:47 +0900
Subject: [PATCH 1/4] Simple implement of local shapshot of dshash.

Add snapshot feature to DSHASH. This makes palloc'ed copy of
underlying DSA and returns unmodifiable DSHASH using the copied DSA.
---
 src/backend/lib/dshash.c | 74 +++-
 src/backend/utils/mmgr/dsa.c | 57 +-
 src/include/lib/dshash.h |  1 +
 src/include/utils/dsa.h  |  1 +
 4 files changed, 124 insertions(+), 9 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index dd87573..973a826 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -112,6 +112,7 @@ struct dshash_table
 	size_t		size_log2;		/* log2(number of buckets) */
 	bool		find_locked;	/* Is any partition lock held by 'find'? */
 	bool		find_exclusively_locked;	/* ... exclusively? */
+	bool		is_snapshot;	/* Is this hash is a local snapshot?*/
 };
 
 /* Given a pointer to an item, find the entry (user data) it holds. */
@@ -228,6 +229,7 @@ dshash_create(dsa_area *area, const dshash_parameters *params, void *arg)
 
 	hash_table->find_locked = false;
 	hash_table->find_exclusively_locked = false;
+	hash_table->is_snapshot = false;
 
 	/*
 	 * Set up the initial array of buckets.  Our initial size is the same as
@@ -279,6 +281,7 @@ dshash_attach(dsa_area *area, const dshash_parameters *params,
 	hash_table->control = dsa_get_address(area, control);
 	hash_table->find_locked = false;
 	hash_table->find_exclusively_locked = false;
+	hash_table->is_snapshot = false;
 	Assert(hash_table->control->magic == DSHASH_MAGIC);
 
 	/*
@@ -321,6 +324,15 @@ dshash_destroy(dshash_table *hash_table)
 	size_t		i;
 
 	Assert(hash_table->control->magic == DSHASH_MAGIC);
+
+	/* this is a local copy */
+	if (hash_table->is_snapshot)
+	{
+		pfree(hash_table->area);
+		pfree(hash_table);
+		return;
+	}
+
 	ensure_valid_bucket_pointers(hash_table);
 
 	/* Free all the entries. */
@@ -355,6 +367,29 @@ dshash_destroy(dshash_table *hash_table)
 }
 
 /*
+ * take a local snapshot of a dshash table
+ */
+dshash_table *
+dshash_take_snapshot(dshash_table *org_table, dsa_area *new_area)
+{
+	dshash_table *new_table;
+
+	if (org_table->is_snapshot)
+		elog(ERROR, "cannot make local copy of 

Re: [HACKERS] More stats about skipped vacuums

2017-11-28 Thread Magnus Hagander
On Tue, Nov 28, 2017 at 12:16 AM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > What I've been thinking about for that one before is if we could just
> > invent a protocol (shmq based maybe) whereby autovacuum can ask the stats
> > collector for a single table or index stat. If autovacuum never needs to
> > see a consistent view between multiple tables, I would think that's going
> > to be a win in a lot of cases.
>
> Perhaps.  Autovac might run through quite a few tables before it finds
> one in need of processing, though, so I'm not quite certain this would
> yield such great benefits in isolation.
>

Hmm. Good point.



> > However, when it comes to the stats system, I'd say that on any busy
> system
> > (which would be the ones to care about), the stats structures are still
> > going to be *written* a lot more than they are read.
>
> Uh, what?  The stats collector doesn't write those files at all except
> on-demand.
>

Oops. Missing one important word. They're going to be *written to* a lot
more than they are read. Meaning that each individual value is likely to be
updated many times before it's ever read. In memory, in the stats
collector.  So not talking about the files at all -- just the numbers,
independent of implementation.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] More stats about skipped vacuums

2017-11-27 Thread Tom Lane
Magnus Hagander  writes:
> What I've been thinking about for that one before is if we could just
> invent a protocol (shmq based maybe) whereby autovacuum can ask the stats
> collector for a single table or index stat. If autovacuum never needs to
> see a consistent view between multiple tables, I would think that's going
> to be a win in a lot of cases.

Perhaps.  Autovac might run through quite a few tables before it finds
one in need of processing, though, so I'm not quite certain this would
yield such great benefits in isolation.

> However, when it comes to the stats system, I'd say that on any busy system
> (which would be the ones to care about), the stats structures are still
> going to be *written* a lot more than they are read.

Uh, what?  The stats collector doesn't write those files at all except
on-demand.

regards, tom lane



Re: [HACKERS] More stats about skipped vacuums

2017-11-27 Thread Magnus Hagander
On Mon, Nov 27, 2017 at 7:53 PM, Robert Haas  wrote:

> On Sun, Nov 26, 2017 at 3:19 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Sat, Nov 25, 2017 at 12:09 PM, Tom Lane  wrote:
> >>> Mumble.  It's a property I'm pretty hesitant to give up, especially
> >>> since the stats views have worked like that since day one.  It's
> >>> inevitable that weakening that guarantee would break peoples' queries,
> >>> probably subtly.
> >
> >> You mean, queries against the stats views, or queries in general?  If
> >> the latter, by what mechanism would the breakage happen?
> >
> > Queries against the stats views, of course.
>
> Hmm.  Those are probably rare.  If we only took a snapshot of the
> statistics for the backends that explicitly access those views, that
> probably wouldn't be too crazy.
>
> Sorry if this is a stupid question, but how often and for what purpose
> to regular backends need the stats collector data for purposes other
> than querying the stats views?  I thought that the data was only used
> to decide whether to VACUUM/ANALYZE, and therefore would be accessed
> mostly by autovacuum, and for that you'd actually want the most
> up-to-date view of the stats for a particular table that is available,
> not any older snapshot.
>
>
Autovacuum resets the stats to make sure. Autovacuum in particular can
probably be made a lot more efficient, because it only ever looks at one
relation at a time, I think.

What I've been thinking about for that one before is if we could just
invent a protocol (shmq based maybe) whereby autovacuum can ask the stats
collector for a single table or index stat. If autovacuum never needs to
see a consistent view between multiple tables, I would think that's going
to be a win in a lot of cases.

I don't think regular backends use them at all. But anybody looking at the
stats do, and it is pretty important there.

However, when it comes to the stats system, I'd say that on any busy system
(which would be the ones to care about), the stats structures are still
going to be *written* a lot more than they are read. We certainly don't
read them at the rate of once per transaction. A lot of the reads are also
limited to one database of course.

I wonder if we want to implement some sort of copy-on-read-snapshot in the
stats collector itself. So instead of unconditionally publishing
everything, have the backends ask for it. When a backend asks for it it
gets a "snapshot counter" or something from the stats collector, and on the
next write after that we do a copy-write if the snapshot it still
available. (no, i have not thought in detail)

Or -- if we keep a per-database hashtable in dynamic shared memory (which
we can now). Can we copy it into local memory in the backend fast enough
that we can hold a lock and just queue up the stats updates during the
copy? If we can copy the complete structure, that would fix one of the
bigger bottlenecks with it today which is that we dump and rebuild the
hashtables as we go through the tempfiles.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] More stats about skipped vacuums

2017-11-27 Thread Robert Haas
On Sun, Nov 26, 2017 at 3:19 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sat, Nov 25, 2017 at 12:09 PM, Tom Lane  wrote:
>>> Mumble.  It's a property I'm pretty hesitant to give up, especially
>>> since the stats views have worked like that since day one.  It's
>>> inevitable that weakening that guarantee would break peoples' queries,
>>> probably subtly.
>
>> You mean, queries against the stats views, or queries in general?  If
>> the latter, by what mechanism would the breakage happen?
>
> Queries against the stats views, of course.

Hmm.  Those are probably rare.  If we only took a snapshot of the
statistics for the backends that explicitly access those views, that
probably wouldn't be too crazy.

Sorry if this is a stupid question, but how often and for what purpose
to regular backends need the stats collector data for purposes other
than querying the stats views?  I thought that the data was only used
to decide whether to VACUUM/ANALYZE, and therefore would be accessed
mostly by autovacuum, and for that you'd actually want the most
up-to-date view of the stats for a particular table that is available,
not any older snapshot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More stats about skipped vacuums

2017-11-27 Thread Robert Haas
On Mon, Nov 27, 2017 at 1:49 AM, Kyotaro HORIGUCHI
 wrote:
> Hmmm. Okay, we must make stats collector more effeicient if we
> want to have additional counters with smaller significance in the
> table stats. Currently sizeof(PgStat_StatTabEntry) is 168
> bytes. The whole of the patchset increases it to 232 bytes. Thus
> the size of a stat file for a database with 1 tables
> increases from about 1.7MB to 2.4MB.  DSM and shared dynahash is
> not dynamically expandable so placing stats on shared hash
> doesn't seem effective. Stats as a regular table could work but
> it seems too-much.

dshash, which is already committed, is both DSM-based and dynamically
expandable.

> Is it acceptable that adding a new section containing this new
> counters, which is just loaded as a byte sequence and parsing
> (and filling the corresponding hash) is postponed until a counter
> in the section is really requested?  The new counters need to be
> shown in a separate stats view (maybe named pg_stat_vacuum).

Still makes the stats file bigger.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More stats about skipped vacuums

2017-11-26 Thread Michael Paquier
On Mon, Nov 27, 2017 at 5:19 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sat, Nov 25, 2017 at 12:09 PM, Tom Lane  wrote:
>>> Mumble.  It's a property I'm pretty hesitant to give up, especially
>>> since the stats views have worked like that since day one.  It's
>>> inevitable that weakening that guarantee would break peoples' queries,
>>> probably subtly.
>
>> You mean, queries against the stats views, or queries in general?  If
>> the latter, by what mechanism would the breakage happen?
>
> Queries against the stats views, of course.

There has been much discussion on this thread, and the set of patches
as presented may hurt performance for users with a large number of
tables, so I am marking it as returned with feedback.
-- 
Michael



Re: [HACKERS] More stats about skipped vacuums

2017-11-26 Thread Michael Paquier
On Mon, Nov 27, 2017 at 2:49 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> ... Would you think
>> that it is acceptable to add the number of index scans that happened
>> with the verbose output then?
>
> I don't have an objection to it, but can't you tell that from VACUUM
> VERBOSE already?  There should be a "INFO:  scanned index" line for
> each scan.

Of course, sorry for the noise.
-- 
Michael



Re: [HACKERS] More stats about skipped vacuums

2017-11-25 Thread Michael Paquier
On Sun, Nov 26, 2017 at 12:34 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Sat, Nov 25, 2017 at 12:55 AM, Robert Haas  wrote:
>>> I think that's a good thing to worry about.   In the past, Tom has
>>> expressed reluctance to make stats tables that have a row per table
>>> any wider at all, IIRC.
>
>> Tom, any opinions to offer here? pg_stat_all_tables is currently at 22
>> columns (this takes two full lines on my terminal with a font size at
>> 13). With the first patch of what's proposed on this thread there
>> would be 24 columns. Perhaps it would be time to split the vacuum
>> statistics into a new view like pg_stat_tables_vacuum or similar?
>
> My concern is not with the width of any view --- you can always select a
> subset of columns if a view is too wide for your screen.  The issue is the
> number of counters in the stats collector's state.  We know, without any
> doubt, that widening PgStat_StatTabEntry causes serious pain to people
> with large numbers of tables; and they have no recourse when we do it.
> So the bar to adding more counters is very high IMO.  I won't say never,
> but I do doubt that something like skipped vacuums should make the cut.

I am not arguing about skipped vacuuum data here (don't think much of
it by the way), but of the number of index scans done by the last
vacuum or autovacuum. This helps in tunning autovacuum_work_mem and
maintenance_work_mem. The bar is still too high for that?
-- 
Michael



Re: [HACKERS] More stats about skipped vacuums

2017-11-25 Thread Tom Lane
Robert Haas  writes:
> Of course, the other obvious question is whether we really need a
> consistent snapshot, because that's bound to be pretty expensive even
> if you eliminate the I/O cost.  Taking a consistent snapshot across
> all 100,000 tables in the database even if we're only ever going to
> access 5 of those tables doesn't seem like a good or scalable design.

Mumble.  It's a property I'm pretty hesitant to give up, especially
since the stats views have worked like that since day one.  It's
inevitable that weakening that guarantee would break peoples' queries,
probably subtly.

What we need is a way to have a consistent snapshot without implementing
it by copying 100,000 tables' worth of data for every query.  Hmm, I heard
of a technique called MVCC once ...

regards, tom lane



Re: [HACKERS] More stats about skipped vacuums

2017-11-25 Thread Robert Haas
On Sat, Nov 25, 2017 at 10:34 AM, Tom Lane  wrote:
> If we could get rid of the copy-to-a-temporary-file technology for
> transferring the stats collector's data to backends, then this problem
> would probably vanish or at least get a lot less severe.  But that seems
> like a nontrivial project.  With the infrastructure we have today, we
> could probably keep the stats tables in a DSM segment; but how would
> a backend get a consistent snapshot of them?

I suppose the obvious approach is to have a big lock around the
statistics data proper; this could be taken in shared mode to take a
snapshot or in exclusive mode to update statistics.  In addition,
create one or more queues where statistics messages can be enqueued in
lieu of updating the main statistics data directly.  If that doesn't
perform well enough, you could keep two copies of the statistics, A
and B.  At any given time, one copy is quiescent and the other copy is
being updated.  Periodically, at a time when we know that nobody is
taking a snapshot of the statistics, they reverse roles.

Of course, the other obvious question is whether we really need a
consistent snapshot, because that's bound to be pretty expensive even
if you eliminate the I/O cost.  Taking a consistent snapshot across
all 100,000 tables in the database even if we're only ever going to
access 5 of those tables doesn't seem like a good or scalable design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More stats about skipped vacuums

2017-11-25 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Nov 25, 2017 at 12:55 AM, Robert Haas  wrote:
>> I think that's a good thing to worry about.   In the past, Tom has
>> expressed reluctance to make stats tables that have a row per table
>> any wider at all, IIRC.

> Tom, any opinions to offer here? pg_stat_all_tables is currently at 22
> columns (this takes two full lines on my terminal with a font size at
> 13). With the first patch of what's proposed on this thread there
> would be 24 columns. Perhaps it would be time to split the vacuum
> statistics into a new view like pg_stat_tables_vacuum or similar?

My concern is not with the width of any view --- you can always select a
subset of columns if a view is too wide for your screen.  The issue is the
number of counters in the stats collector's state.  We know, without any
doubt, that widening PgStat_StatTabEntry causes serious pain to people
with large numbers of tables; and they have no recourse when we do it.
So the bar to adding more counters is very high IMO.  I won't say never,
but I do doubt that something like skipped vacuums should make the cut.

If we could get rid of the copy-to-a-temporary-file technology for
transferring the stats collector's data to backends, then this problem
would probably vanish or at least get a lot less severe.  But that seems
like a nontrivial project.  With the infrastructure we have today, we
could probably keep the stats tables in a DSM segment; but how would
a backend get a consistent snapshot of them?

regards, tom lane



Re: [HACKERS] More stats about skipped vacuums

2017-11-25 Thread Michael Paquier
On Sat, Nov 25, 2017 at 12:55 AM, Robert Haas  wrote:
> On Tue, Nov 21, 2017 at 2:09 AM, Kyotaro HORIGUCHI
>  wrote:
>> Yes, my concern here is how many column we can allow in a stats
>> view. I think I'm a bit too warried about that.
>
> I think that's a good thing to worry about.   In the past, Tom has
> expressed reluctance to make stats tables that have a row per table
> any wider at all, IIRC.

Tom, any opinions to offer here? pg_stat_all_tables is currently at 22
columns (this takes two full lines on my terminal with a font size at
13). With the first patch of what's proposed on this thread there
would be 24 columns. Perhaps it would be time to split the vacuum
statistics into a new view like pg_stat_tables_vacuum or similar?
-- 
Michael



Re: [HACKERS] More stats about skipped vacuums

2017-11-24 Thread Robert Haas
On Tue, Nov 21, 2017 at 2:09 AM, Kyotaro HORIGUCHI
 wrote:
> Yes, my concern here is how many column we can allow in a stats
> view. I think I'm a bit too warried about that.

I think that's a good thing to worry about.   In the past, Tom has
expressed reluctance to make stats tables that have a row per table
any wider at all, IIRC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More stats about skipped vacuums

2017-11-21 Thread Michael Paquier
On Wed, Nov 22, 2017 at 1:08 PM, Kyotaro HORIGUCHI
 wrote:
> At Wed, 22 Nov 2017 08:20:22 +0900, Michael Paquier 
>  wrote in 
> 
>> On Tue, Nov 21, 2017 at 4:09 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > By the way I'm uneasy that the 'last_vacuum_index_scans' (and
>> > vacuum_fail_count in 0002 and others in 0003, 0004) is mentioning
>> > both VACUUM command and autovacuum, while last_vacuum and
>> > vacuum_count is mentioning only the command. Splitting it into
>> > vacuum/autovaccum seems nonsense but the name is confusing. Do
>> > you have any idea?
>>
>> Hm. I think that you should actually have two fields, one for manual
>> vacuum and one for autovacuum, because each is tied to respectively
>> maintenance_work_mem and autovacuum_work_mem. This way admins are able
>
> It's very convincing for me. Thanks for the suggestion.
>
>> to tune each one of those parameters depending on a look at
>> pg_stat_all_tables. So those should be named perhaps
>> last_vacuum_index_scans and last_autovacuum_index_scans?
>
> Agreed. I'll do so in the next version.

Thanks for considering the suggestion.

> # I forgot to add the version to the patch files...

Don't worry about that. That's not a problem for me I'll just keep
track of the last entry. With the room I have I'll keep focused on
0001 by the way. Others are of course welcome to look at 0002 and
onwards.
-- 
Michael



Re: [HACKERS] More stats about skipped vacuums

2017-11-21 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 22 Nov 2017 08:20:22 +0900, Michael Paquier  
wrote in 
> On Tue, Nov 21, 2017 at 4:09 PM, Kyotaro HORIGUCHI
>  wrote:
> > By the way I'm uneasy that the 'last_vacuum_index_scans' (and
> > vacuum_fail_count in 0002 and others in 0003, 0004) is mentioning
> > both VACUUM command and autovacuum, while last_vacuum and
> > vacuum_count is mentioning only the command. Splitting it into
> > vacuum/autovaccum seems nonsense but the name is confusing. Do
> > you have any idea?
> 
> Hm. I think that you should actually have two fields, one for manual
> vacuum and one for autovacuum, because each is tied to respectively
> maintenance_work_mem and autovacuum_work_mem. This way admins are able

It's very convincing for me. Thanks for the suggestion.

> to tune each one of those parameters depending on a look at
> pg_stat_all_tables. So those should be named perhaps
> last_vacuum_index_scans and last_autovacuum_index_scans?

Agreed. I'll do so in the next version.

# I forgot to add the version to the patch files...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center