date:20200729

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-29 Thread Ajin Cherian

On Wed, Jul 29, 2020 at 3:16 PM Dilip Kumar  wrote:

>
>
> Thanks, please find the rebased patch set.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com

I was running some tests on this patch. I was generally trying to see how
the patch affects logical replication when doing bulk inserts. This issue
has been raised in the past, for eg: this [1].
My test setup is:
1. Two postgres servers running - A and B
2. Create a pgbench setup on A. (pgbench -i -s 5 postgres)
3. replicate the 3 tables (schema only) on B.
4. Three publishers on A for the 3 tables of pgbench; pgbench_accounts,
pgbench_branches and pgbench_tellers;
5. Three subscribers on B for the same tables. (streaming on and off based
on the scenarios described below)

run pgbench with : pgbench -c 4 -T 100 postgres
While pgbench is running, Do a bulk insert on some other table not in the
publication list (say t1); INSERT INTO t1 (select i FROM
generate_series(1,1000) i);

Four scenarios:
1. Pgbench with logical replication enabled without bulk insert
Avg TPS (out of 10 runs): 641 TPS
2.Pgbench without logical replication enabled with bulk insert (no pub/sub)
Avg TPS (out of 10 runs): 665 TPS
3, Pgbench with logical replication enabled with bulk insert
Avg TPS (out of 10 runs): 278 TPS
4. Pgbench with logical replication streaming on with bulk insert
Avg TPS (out of 10 runs): 440 TPS

As you can see, the bulk inserts, although on a totally unaffected table,
does impact the TPS. But what is good is that, enabling streaming improves
the TPS (about 58% improvement)

[1] -
https://www.postgresql.org/message-id/flat/CAMsr%2BYE6aE6Re6smrMr-xCabRmCr%3DyzXEf2Yuv5upEDY5nMX8g%40mail.gmail.com#dbe51a181dd735eec8bb36f8a07bacf5

regards,
Ajin Cherian
Fujitsu Australia

Re: Doc patch: mention indexes in pg_inherits docs

2020-07-29 Thread Michael Paquier

On Wed, Jul 29, 2020 at 03:06:58PM +0900, Michael Paquier wrote:
> This is actually a bug fix, because we include in the docs some
> incorrect information, so it should be backpatched.  If there are no
> objections, I'll take care of that.

And done.
--
Michael


signature.asc
Description: PGP signature

Re: INSERT INTO SELECT, Why Parallelism is not selected?

2020-07-29 Thread Amit Kapila

On Wed, Jul 29, 2020 at 7:18 PM Robert Haas  wrote:
>
> I still don't agree with this as proposed.
>
> + * For now, we don't allow parallel inserts of any form not even where the
> + * leader can perform the insert.  This restriction can be uplifted once
> + * we allow the planner to generate parallel plans for inserts.  We can
>
> If I'm understanding this correctly, this logic is completely
> backwards. We don't prohibit inserts here because we know the planner
> can't generate them. We prohibit inserts here because, if the planner
> somehow did generate them, it wouldn't be safe. You're saying that
> it's not allowed because we don't try to do it yet, but actually it's
> not allowed because we want to make sure that we don't accidentally
> try to do it. That's very different.
>

Right, so how about something like: "To allow parallel inserts, we
need to ensure that they are safe to be performed in workers.  We have
the infrastructure to allow parallel inserts in general except for the
case where inserts generate a new commandid (eg. inserts into a table
having a foreign key column)."  We can extend this for tuple locking
if required as per the below discussion.  Kindly suggest if you prefer
a different wording here.

>
> + * We should be able to parallelize
> + * the later case if we can ensure that no two parallel processes can ever
> + * operate on the same page.
>
> I don't know whether this is talking about two processes operating on
> the same page at the same time, or ever within a single query
> execution. If it's the former, perhaps we need to explain why that's a
> concern for parallel query but not otherwise;
>

I am talking about the former case and I know that as per current
design it is not possible that two worker processes try to operate on
the same page but I was trying to be pessimistic so that we can ensure
that via some form of Assert.  I don't know whether it is important to
mention this case or not but for the sake of extra safety, I have
mentioned it.

-- 
With Regards,
Amit Kapila.

Re: Creating a function for exposing memory usage of backend process

2020-07-29 Thread Kasahara Tatsuhito

Hi,

On Fri, Jul 10, 2020 at 5:32 PM torikoshia  wrote:
> - whether information for identifying parent-child relation is necessary
> or it's an overkill
I think it's important to understand the parent-child relationship of
the context.
Personally, I often want to know the following two things ..

- In which life cycle is the target context? (Remaining as long as the
process is living? per query?)
- Does the target context belong to the correct (parent) context?

> - if this information is necessary, memory address is suitable or other
> means like assigning unique numbers are required
IMO, If each context can be uniquely identified (or easily guessed) by
"name" and "ident",
then I don't think the address information is necessary.
Instead, I like the way that directly shows the context name of the
parent, as in the 0005 patch.

Best regards

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

Re: Reducing WaitEventSet syscall churn

2020-07-29 Thread Thomas Munro

On Tue, Jul 14, 2020 at 6:51 PM Thomas Munro  wrote:
> In the meantime, here's a rebase of the more straightforward patches
> in the stack.  These are the ones that deal only with fixed sets of
> file descriptors, and they survive check-world on Linux,
> Linux+EXEC_BACKEND (with ASLR disabled) and FreeBSD, and at least
> check on macOS and Windows (my CI recipes need more work to get
> check-world working on those two).  There's one user-visible change
> that I'd appreciate feedback on: I propose to drop the FATAL error
> when the postmaster goes away, to make things more consistent.  See
> below for more on that.

Here's the effect of patches 0001-0003 on the number of relevant
system calls generate by "make check" on Linux and FreeBSD, according
to strace/truss -f -c:

epoll_create1: 4,825 -> 865
epoll_ctl: 12,454 -> 2,721
epoll_wait:~45k -> ~45k
close: ~81k -> ~77k

kqueue:4,618 -> 866
kevent:~54k -> ~46k
close: ~65k -> ~61k

I pushed those three patches, but will wait for more discussion on the rest.

Re: Is it useful to record whether plans are generic or custom?

2020-07-29 Thread Fujii Masao





On 2020/07/22 16:49, torikoshia wrote:

On 2020-07-20 13:57, torikoshia wrote:


As I proposed earlier in this thread, I'm now trying to add information
about generic/cudstom plan to pg_stat_statements.
I'll share the idea and the poc patch soon.


Attached a poc patch.


Thanks for the POC patch!

With the patch, when I ran "CREATE EXTENSION pg_stat_statements",
I got the following error.

ERROR:  function pg_stat_statements_reset(oid, oid, bigint) does not exist




Main purpose is to decide (1) the user interface and (2) the
way to get the plan type from pg_stat_statements.

(1) the user interface
I added a new boolean column 'generic_plan' to both
pg_stat_statements view and the member of the hash key of
pg_stat_statements.

This is because as Legrand pointed out the feature seems
useful under the condition of differentiating all the
counters for a queryid using a generic plan and the one
using a custom one.


I don't like this because this may double the number of entries in pgss.
Which means that the number of entries can more easily reach
pg_stat_statements.max and some entries will be discarded.

 

I thought it might be preferable to make a GUC to enable
or disable this feature, but changing the hash key makes
it harder.


What happens if the server was running with this option enabled and then
restarted with the option disabled? Firstly two entries for the same query
were stored in pgss because the option was enabled. But when it's disabled
and the server is restarted, those two entries should be merged into one
at the startup of server? If so, that's problematic because it may take
a long time.

Therefore I think that it's better and simple to just expose the number of
times generic/custom plan was chosen for each query.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-29 Thread Masahiko Sawada

On Thu, 30 Jul 2020 at 12:24, Fujii Masao  wrote:
>
>
>
> On 2020/07/30 10:46, Michael Paquier wrote:
> > On Thu, Jul 30, 2020 at 08:44:26AM +0900, Fujii Masao wrote:
> >> Isn't it better to add the comment explaining why toast tables are
> >> excluded from the tab completion for vacuum while they are vacuumable?
> >
> > Sounds sensible, still it does not apply only to vacuum.  I would go
> > as far as just adding a comment at the beginning of the block for
> > schema queries:
>
> Yes, that seems better.

Agreed.

> BTW, one thing I think a bit strange is that indexes for toast tables
> are included in tab-completion for REINDEX, for example. That is,
> "REINDEX INDEX" displays "pg_toast.", and "REINDEX INDEX pg_toast."
> displays indexes for toast tables. Maybe it's better to exclude them,
> too. But there seems no simple way to do that.
> So I'm ok with this current situation.

Yeah, that's the reason why I mentioned about toast tables. "VACUUM
" displays "pg_toast." but, "VACUUM pg_to" doesn't
complement anything.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: IDEA: pg_stat_statements tracking utility statements by tag?

2020-07-29 Thread Julien Rouhaud

On Thu, Jul 30, 2020 at 3:54 AM Michael Paquier  wrote:
>
> On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:
> > There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
> > PREPARED that should be normalized too.  I also don't think that we
> > really want to have different entries for begin / Begin / BEGIN /
> > bEgin and similar for many other commands, as the hash is computed
> > based on the query text.
>
> Hmm.  Do we really want to those commands fully normalized all the
> time?  There may be applications that care about the stats of some
> commands that are for example prefixed the same way and would prefer
> group those things together.  By fully normalizing those commands all
> the time, we would lose this option.
>
> An example.  The ODBC driver uses its own grammar for internal
> savepoint names, aka _EXEC_SVP_%p.  If you mix that with a second
> application that has its own naming policy for savepoints it would not
> be possible anymore to make the difference in the stats between what
> one or the other do.

But if you have an OLTP application that uses ODBC, won't you already
have 80+% of pgss entries being savepoint orders, which is really not
helpful at all?  We'd technically lose the ability to group such
commands together, but in most cases the current behavior is quite
harmful.

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.

2020-07-29 Thread Peter Geoghegan

On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby  wrote:
> So my 2ndary suggestion is to conditionalize based on the method rather than
> value of space used.

What's the advantage of doing it that way?

-- 
Peter Geoghegan

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.

2020-07-29 Thread Justin Pryzby

On Wed, Jul 29, 2020 at 08:35:08PM -0700, Peter Geoghegan wrote:
> AFAICT sort (and IncrSort) don't fail to show a field value if it is
> zero. Rather, they consistently show "space used" (in non-text
> format), which can be either memory or disk space. Technically they
> don't violate the letter of the law that you cite. (Though admittedly
> this is a legalistic loophole -- the "space" value presumably has to
> be interpreted according to different rules for programs that consume
> non-text EXPLAIN output.)

Sort shows this:
 Sort Method: "external merge"   +
 Sort Space Used: 19520  +
 Sort Space Type: "Disk" +

Incremental sort shows this:
   Sort Methods Used:+
 - "external merge"  +
   Sort Space Disk:  +
 Average Sort Space Used: 128+
 Peak Sort Space Used: 128   +

So my 2ndary suggestion is to conditionalize based on the method rather than
value of space used.

--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2830 +2830 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo 
*groupInfo,
-   if (groupInfo->maxMemorySpaceUsed > 0)
+   if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT)
@@ -2847 +2847 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo 
*groupInfo,
-   if (groupInfo->maxDiskSpaceUsed > 0)
+   if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT)

-- 
Justin

Re: PG 13 release notes, first draft

2020-07-29 Thread David G. Johnston

On Wednesday, July 29, 2020, Peter Geoghegan  wrote:

> On Wed, Jul 29, 2020 at 6:30 PM Bruce Momjian  wrote:
> > > There should be a note about this in the Postgres 13 release notes,
> > > for the usual reasons. More importantly, the "Allow hash aggregation
> > > to use disk storage for large aggregation result sets" feature should
> > > reference the new GUC directly. Users should be advised that the GUC
> > > may be useful in cases where they upgrade and experience a performance
> > > regression linked to slower hash aggregation. Just including a
> > > documentation link for the GUC would be very helpful.
> >
> > I came up with the attached patch.
>
> I was thinking something along like the following (after the existing
> sentence about avoiding hash aggs in the planner):
>
> If you find that hash aggregation is slower than in previous major
> releases of PostgreSQL, it may be useful to increase the value of
> hash_mem_multiplier. This allows hash aggregation to use more memory
> without affecting competing query operations that are generally less
> likely to put any additional memory to good use.
>
>
How about adding wording for GROUP BY as well to cater to users who are
more comfortable thinking in terms of SQL statements instead of execution
plans?

David J.

Re: HashAgg's batching counter starts at 0, but Hash's starts at 1.

2020-07-29 Thread Peter Geoghegan

On Mon, Jul 27, 2020 at 8:36 PM Justin Pryzby  wrote:
> I don't know of a guideline for text format, but the issues I've raised for
> HashAgg and IncrSort are based on previous commits which change to "show field
> even if its value is zero" for non-text format:

But the non-text format for IncrSort shows about the same information
as sort, broken out by group. What's missing if you assume that sort
is the gold standard?

The objection to your argument from James (which could just as easily
apply to regular sort from earlier releases) is that accurate
information just isn't available as a practical matter. This is due to
tuplesort implementation limitations that cannot be fixed now. See the
comment block in tuplesort_get_stats() for an explanation. The hard
part is showing memory used by external sorts.

It's true that "Disk" is specifically shown by sort nodes output in
text explain format, but you're talking about non-text formats so
that's not really relevant

AFAICT sort (and IncrSort) don't fail to show a field value if it is
zero. Rather, they consistently show "space used" (in non-text
format), which can be either memory or disk space. Technically they
don't violate the letter of the law that you cite. (Though admittedly
this is a legalistic loophole -- the "space" value presumably has to
be interpreted according to different rules for programs that consume
non-text EXPLAIN output.)

Have I missed something?

-- 
Peter Geoghegan

Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?

2020-07-29 Thread David Pirotte

On Wed, Jul 29, 2020 at 9:41 PM Dave Cramer  wrote:

> For logical replication there is no need to implement this, but others are
> using the pgoutput plugin for Change Data Capture. The reason they are
> using pgoutput is because it is guaranteed to be available as it is in core
> postgres.
>
> Implementing LogicalDecodeMessageCB provides some synchronization facility
> that is not easily replicated.
>
> Thoughts ?
>

Attached is a draft patch that adds this functionality into the pgoutput
plugin. A slot consumer can pass 'messages' as an option to include
logical messages from pg_logical_emit_message in the replication flow.

FWIW, we have been using pg_logical_emit_message to send application-level
events alongside our change-data-capture for about two years, and we would
move this part of our stack to pgoutput if message support was available.

Looking forward to discussion and feedback.

Cheers,
Dave

0001-Add-logical-decoding-messages-to-pgoutput.patch.gz
Description: application/gzip

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-29 Thread Fujii Masao





On 2020/07/30 10:46, Michael Paquier wrote:

On Thu, Jul 30, 2020 at 08:44:26AM +0900, Fujii Masao wrote:

Isn't it better to add the comment explaining why toast tables are
excluded from the tab completion for vacuum while they are vacuumable?


Sounds sensible, still it does not apply only to vacuum.  I would go
as far as just adding a comment at the beginning of the block for
schema queries:


Yes, that seems better.
BTW, one thing I think a bit strange is that indexes for toast tables
are included in tab-completion for REINDEX, for example. That is,
"REINDEX INDEX" displays "pg_toast.", and "REINDEX INDEX pg_toast."
displays indexes for toast tables. Maybe it's better to exclude them,
too. But there seems no simple way to do that.
So I'm ok with this current situation.



"Never include toast tables in any of those queries to avoid
unnecessary bloat in the completions."


The patch looks good to me except that.


Indeed.  FWIW, I would also adjust the comment on top of
Query_for_list_of_indexables to not say "index creation", but just
"supporting indexing" instead.

Fujii-san, perhaps you would prefer taking care of this patch?  I am
fine to do it if you wish.


Of course I'm fine if you work on this patch. So please feel free to do that!


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: PG 13 release notes, first draft

2020-07-29 Thread Peter Geoghegan

On Wed, Jul 29, 2020 at 7:48 PM Bruce Momjian  wrote:
> Well, that seems to be repeating what is already in the docs for
> hash_mem_multiplier, which I try to avoid.  One other direction is to
> put something in the incompatibilities section.  Does that make sense?

I would prefer to put it next to the hash agg item itself. It's more
likely to be noticed there, and highlighting it a little seems
warranted.

OTOH, this may not be a problem at all for many individual users.
Framing it as a tip rather than a compatibility item gets that across.

-- 
Peter Geoghegan

Re: PG 13 release notes, first draft

2020-07-29 Thread Bruce Momjian

On Wed, Jul 29, 2020 at 07:00:43PM -0700, Peter Geoghegan wrote:
> On Wed, Jul 29, 2020 at 6:30 PM Bruce Momjian  wrote:
> > > There should be a note about this in the Postgres 13 release notes,
> > > for the usual reasons. More importantly, the "Allow hash aggregation
> > > to use disk storage for large aggregation result sets" feature should
> > > reference the new GUC directly. Users should be advised that the GUC
> > > may be useful in cases where they upgrade and experience a performance
> > > regression linked to slower hash aggregation. Just including a
> > > documentation link for the GUC would be very helpful.
> >
> > I came up with the attached patch.
> 
> I was thinking something along like the following (after the existing
> sentence about avoiding hash aggs in the planner):
> 
> If you find that hash aggregation is slower than in previous major
> releases of PostgreSQL, it may be useful to increase the value of
> hash_mem_multiplier. This allows hash aggregation to use more memory
> without affecting competing query operations that are generally less
> likely to put any additional memory to good use.

Well, that seems to be repeating what is already in the docs for
hash_mem_multiplier, which I try to avoid.  One other direction is to
put something in the incompatibilities section.  Does that make sense?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: PG 13 release notes, first draft

2020-07-29 Thread Peter Geoghegan

On Wed, Jul 29, 2020 at 6:30 PM Bruce Momjian  wrote:
> > There should be a note about this in the Postgres 13 release notes,
> > for the usual reasons. More importantly, the "Allow hash aggregation
> > to use disk storage for large aggregation result sets" feature should
> > reference the new GUC directly. Users should be advised that the GUC
> > may be useful in cases where they upgrade and experience a performance
> > regression linked to slower hash aggregation. Just including a
> > documentation link for the GUC would be very helpful.
>
> I came up with the attached patch.

I was thinking something along like the following (after the existing
sentence about avoiding hash aggs in the planner):

If you find that hash aggregation is slower than in previous major
releases of PostgreSQL, it may be useful to increase the value of
hash_mem_multiplier. This allows hash aggregation to use more memory
without affecting competing query operations that are generally less
likely to put any additional memory to good use.

-- 
Peter Geoghegan

Re: IDEA: pg_stat_statements tracking utility statements by tag?

2020-07-29 Thread Michael Paquier

On Wed, Jul 29, 2020 at 06:35:41PM +0200, Julien Rouhaud wrote:
> There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
> PREPARED that should be normalized too.  I also don't think that we
> really want to have different entries for begin / Begin / BEGIN /
> bEgin and similar for many other commands, as the hash is computed
> based on the query text.

Hmm.  Do we really want to those commands fully normalized all the
time?  There may be applications that care about the stats of some
commands that are for example prefixed the same way and would prefer
group those things together.  By fully normalizing those commands all
the time, we would lose this option.

An example.  The ODBC driver uses its own grammar for internal
savepoint names, aka _EXEC_SVP_%p.  If you mix that with a second
application that has its own naming policy for savepoints it would not
be possible anymore to make the difference in the stats between what
one or the other do.
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-29 Thread Michael Paquier

On Thu, Jul 30, 2020 at 08:44:26AM +0900, Fujii Masao wrote:
> Isn't it better to add the comment explaining why toast tables are
> excluded from the tab completion for vacuum while they are vacuumable?

Sounds sensible, still it does not apply only to vacuum.  I would go
as far as just adding a comment at the beginning of the block for
schema queries:
"Never include toast tables in any of those queries to avoid
unnecessary bloat in the completions."

> The patch looks good to me except that.

Indeed.  FWIW, I would also adjust the comment on top of
Query_for_list_of_indexables to not say "index creation", but just
"supporting indexing" instead.

Fujii-san, perhaps you would prefer taking care of this patch?  I am
fine to do it if you wish.
--
Michael

signature.asc
Description: PGP signature

Re: PG 13 release notes, first draft

2020-07-29 Thread Bruce Momjian

On Wed, Jul 29, 2020 at 03:34:22PM -0700, Peter Geoghegan wrote:
> Hi Bruce,
> 
> On Fri, Jun 26, 2020 at 3:24 PM Bruce Momjian  wrote:
> > Patch attached and applied to PG 13.
> 
> I committed the hash_mem_multiplier GUC to Postgres 13 just now.
> 
> There should be a note about this in the Postgres 13 release notes,
> for the usual reasons. More importantly, the "Allow hash aggregation
> to use disk storage for large aggregation result sets" feature should
> reference the new GUC directly. Users should be advised that the GUC
> may be useful in cases where they upgrade and experience a performance
> regression linked to slower hash aggregation. Just including a
> documentation link for the GUC would be very helpful.

I came up with the attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/release-13.sgml b/doc/src/sgml/release-13.sgml
new file mode 100644
index 18e6497..2586645
*** a/doc/src/sgml/release-13.sgml
--- b/doc/src/sgml/release-13.sgml
*** Author: Jeff Davis 
  Previously, hash aggregation was avoided if it was expected to use
! more than  memory.
 

  
--- 649,657 
  
 
  Previously, hash aggregation was avoided if it was expected to use
! more than  memory.  To use more
! memory for hash query operations, increase .

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-07-29 Thread Michael Paquier

On Wed, Jul 29, 2020 at 11:34:07PM +0200, Daniel Gustafsson wrote:
> Extreme cases for sure, but more importantly, there should be no cases when
> this would cause an increase wrt the status quo.

Yep.

> Maybe it'd be worth pre-computing by a first pass which tracks pinned objects
> in a bitmap; with a second pass which then knows how many and which to insert
> into slots?

Or it could be possible to just rebuild a new list of dependencies
before insertion into the catalog.  No objections with a bitmap, any
approach would be fine here as long as there is a first pass on the
item list.

> Fair enough, let's break out pg_depend and I'll have another go at that.

Thanks.  Attached is a polished version of the patch that I intend to
commit for pg_attribute and pg_shdepend.  Let's look again at
pg_depend later, as there are also links with the handling of
dependencies for ALTER TABLE mainly.
--
Michael
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index cbfdfe2abe..d31141c1a2 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -93,10 +93,11 @@ extern void heap_truncate_check_FKs(List *relations, bool tempTables);
 
 extern List *heap_truncate_find_FKs(List *relationIds);
 
-extern void InsertPgAttributeTuple(Relation pg_attribute_rel,
-   Form_pg_attribute new_attribute,
-   Datum attoptions,
-   CatalogIndexState indstate);
+extern void InsertPgAttributeTuples(Relation pg_attribute_rel,
+	TupleDesc tupdesc,
+	Oid new_rel_oid,
+	Datum *attoptions,
+	CatalogIndexState indstate);
 
 extern void InsertPgClassTuple(Relation pg_class_desc,
 			   Relation new_rel_desc,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 8be303870f..a7e2a9b26b 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -19,6 +19,7 @@
 #define INDEXING_H
 
 #include "access/htup.h"
+#include "nodes/execnodes.h"
 #include "utils/relcache.h"
 
 /*
@@ -36,6 +37,10 @@ extern void CatalogCloseIndexes(CatalogIndexState indstate);
 extern void CatalogTupleInsert(Relation heapRel, HeapTuple tup);
 extern void CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
 	   CatalogIndexState indstate);
+extern void CatalogTuplesMultiInsertWithInfo(Relation heapRel,
+			 TupleTableSlot **slot,
+			 int ntuples,
+			 CatalogIndexState indstate);
 extern void CatalogTupleUpdate(Relation heapRel, ItemPointer otid,
 			   HeapTuple tup);
 extern void CatalogTupleUpdateWithInfo(Relation heapRel,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8df2716de4..5eef225f5c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2164,8 +2164,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		RelationPutHeapTuple(relation, buffer, heaptuples[ndone], false);
 
 		/*
-		 * Note that heap_multi_insert is not used for catalog tuples yet, but
-		 * this will cover the gap once that is the case.
+		 * For logical decoding we need combocids to properly decode the
+		 * catalog.
 		 */
 		if (needwal && need_cids)
 			log_heap_new_cid(relation, heaptuples[ndone]);
@@ -2180,8 +2180,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			RelationPutHeapTuple(relation, buffer, heaptup, false);
 
 			/*
-			 * We don't use heap_multi_insert for catalog tuples yet, but
-			 * better be prepared...
+			 * For logical decoding we need combocids to properly decode the
+			 * catalog.
 			 */
 			if (needwal && need_cids)
 log_heap_new_cid(relation, heaptup);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3985326df6..2a18dca34d 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -710,70 +710,122 @@ CheckAttributeType(const char *attname,
 }
 
 /*
- * InsertPgAttributeTuple
- *		Construct and insert a new tuple in pg_attribute.
+ * Cap the maximum amount of bytes allocated for InsertPgAttributeTuples()
+ * slots.
+ */
+#define MAX_PGATTRIBUTE_INSERT_BYTES 65535
+
+/*
+ * InsertPgAttributeTuples
+ *		Construct and insert a set of tuples in pg_attribute.
  *
- * Caller has already opened and locked pg_attribute.  new_attribute is the
- * attribute to insert.  attcacheoff is always initialized to -1, attacl,
- * attfdwoptions and attmissingval are always initialized to NULL.
+ * Caller has already opened and locked pg_attribute. tupdesc contains the
+ * attributes to insert.  attcacheoff is always initialized to -1, attacl,
+ * attfdwoptions and attmissingval are always initialized to NULL.  attoptions
+ * must contain the same number of elements as tupdesc, or be NULL.
  *
  * indstate is the index state for CatalogTupleInsertWithInfo.  It can be
  * passed as NULL, in which case we'll fetch the necessary info.  (Don't do
  * this when inserting multiple attributes, because it's a tad more
  * expensi

Re: Yet another fast GiST build (typo)

2020-07-29 Thread Thomas Munro

On Fri, Jul 10, 2020 at 6:55 PM Andrey M. Borodin  wrote:
> Thanks! Fixed.

It's not a bug, but I think those 64 bit constants should be wrapped
in UINT64CONST(), following our convention.

I'm confused about these two patches: 0001 introduces
gist_point_fastcmp(), but then 0002 changes it to gist_bbox_fastcmp().
Maybe you intended to keep both of them?  Also 0002 seems to have
fixups for 0001 squashed into it.

Re: Default setting for enable_hashagg_disk

2020-07-29 Thread Peter Geoghegan

On Mon, Jul 27, 2020 at 5:55 PM Peter Geoghegan  wrote:
> Pushed the hashagg_avoid_disk_plan patch -- thanks!

Pushed the hash_mem_multiplier patch as well -- thanks again!

As I've said before, I am not totally opposed to adding a true escape
hatch. That has not proven truly necessary just yet. For now, my
working assumption is that the problem on the table has been
addressed.

-- 
Peter Geoghegan

Re: HyperLogLog.c and pg_leftmost_one_pos32()

2020-07-29 Thread Peter Geoghegan

On Wed, Jul 29, 2020 at 10:08 AM Jeff Davis  wrote:
> Is there a reason that HyperLogLog doesn't use pg_leftmost_one_pos32()?

Yes: HyperLogLog predates pg_leftmost_one_pos32().

> I tried the following patch and some brief performance tests seem to
> show an improvement.

Makes sense.

How did you test this? What kind of difference are we talking about? I
ported this code from the upstream C++ as part of the original
abbreviated keys commit. Note that the cardinality of abbreviated keys
are displayed when you set "trace_sort = on".

> This came up because my recent commit 9878b643 uses HLL for estimating
> the cardinality of spill files, which solves a few annoyances with
> overpartitioning[1].

I think that you should change back the rhs() variable names to match
HyperLogLog upstream (as well as the existing rhs() comments).

> I think it's overall an improvement, but
> addHyperLogLog() itself seemed to show up as a cost, so it can hurt
> spilled-but-still-in-memory cases. I'd also like to backpatch this to
> 13 (as I already did for 9878b643), if that's acceptable.

I still wonder if it was ever necessary to add HLL to abbreviated
keys. It only served to avoid some pretty narrow worse cases, at the
expense of typical cases. Given that the existing users of HLL are
pretty narrow, and given the importance of preserving the favorable
performance characteristics of hash aggregate, I'm inclined to agree
that it's worth backpatching to 13 now. Assuming it is a really
measurable cost in practice.

-- 
Peter Geoghegan

Re: [PATCH] Initial progress reporting for COPY command

2020-07-29 Thread Fujii Masao





On 2020/07/29 3:30, Pavel Stehule wrote:



út 28. 7. 2020 v 20:25 odesílatel Josef Šimánek mailto:josef.sima...@gmail.com>> napsal:

Thanks for the info. I am waiting for review. Is there any summary of 
requested changes needed?


Maybe it is just noise - you wrote so you will resend a patch to different 
thread




I tried to reattach the thread there. I'll prepare a new patch soon, what 
should I do? Just attach it again?


Yeah, since I read this message, I was thinking that new patch will be
posted. But, Josef, if the situation was changed, could you correct me?
Anyway the patch seems not to be applied cleanly, so at least it needs to
be updated to address that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Missing CFI in hlCover()?

2020-07-29 Thread Tom Lane

I wrote:
> (I wonder if we need to try to make it faster.  I'd supposed that the
> loop was cheap enough to be a non-problem, but with large enough
> documents maybe not?  It seems like converting to a hash table could
> be worthwhile for a large doc.)

OK, I dug into Stephen's test case off-list.  While some CFIs would
be a good idea, that's just band-aid'ing the symptom.  The actual
problem is that hlCover() is taking way too much time.  The test case
boils down to "common_word & rare_word" matched to a very long document,
wherein the rare_word appears only near the front.  Once we're past
that match, hlCover() tries all the remaining matches for common_word,
of which there are plenty ... and for each one, it scans clear to the
end of the document, looking vainly for a substring that will satisfy
the "common_word & rare_word" query.  So obviously, this is O(N^2)
in the length of the document :-(.

I have to suppose that I introduced this problem in c9b0c678d, since
AFAIR we weren't getting ts_headline-takes-forever complaints before
that.  It's not immediately obvious why the preceding algorithm doesn't
have a similar issue, but then there's not anything at all that was
obvious about the preceding algorithm.

The most obvious way to fix the O(N^2) hazard is to put a limit on the
length of "cover" (matching substring) that we'll consider.  For the
mark_hl_words caller, we won't highlight more than max_words words
anyway, so it would be reasonable to bound covers to that length or
some small multiple of it.  The other caller mark_hl_fragments is
willing to highlight up to max_fragments of up to max_words each, and
there can be some daylight between the fragments, so it's not quite
clear what the longest reasonable match is.  Still, I doubt it's
useful to show a headline consisting of a few words from the start of
the document and a few words from thousands of words later, so a limit
of max_fragments times max_words times something would probably be
reasonable.

We could hard-code a rule like that, or we could introduce a new
explicit parameter for the maximum cover length.  The latter would be
more flexible, but we need something back-patchable and I'm concerned
about the compatibility hazards of adding a new parameter in minor
releases.  So on the whole I propose hard-wiring a multiplier of,
say, 10 for both these cases.

regards, tom lane

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-29 Thread Fujii Masao





On 2020/07/30 3:33, Justin Pryzby wrote:

On Wed, Jul 29, 2020 at 03:21:19PM +0900, Michael Paquier wrote:

On Wed, Jul 29, 2020 at 01:27:07PM +0900, Masahiko Sawada wrote:

Good catch. The patch looks good to me.


While this patch is logically correct.  I think that we should try to
not increase more the number of queries used to scan pg_class
depending on a list of relkinds.  For example, did you notice that
your new Query_for_list_of_vacuumables becomes the same query as
Query_for_list_of_indexables?  You can make your patch more simple.


I didn't notice.  There's an argument for keeping them separate, but as long as
there's a #define in between, this is fine, too.

On Wed, Jul 29, 2020 at 08:05:57PM +0900, Michael Paquier wrote:

On Wed, Jul 29, 2020 at 06:41:16PM +0900, Masahiko Sawada wrote:

whereas Query_for_list_of_vacuumables should search for:

RELKIND_RELATION
RELKIND_PARTITIONED_TABLE
RELKIND_MATVIEW
RELKIND_TOASTVALUE


FWIW, I don't think that we should make toast relations suggested to
the user at all for any command.  This comes down to the same point
that we don't have pg_toast in search_path, and going down to this
level of operations is an expert-level mode, not something we should
recommend to the average user in psql IMO.


Right.  Tom's response to that suggestion a couple years ago I thought was
pretty funny (I picture Dr. Claw at his desk using psql tab completion being
presented with a list of pg_toast.pg_toast_NN OIDs: "which TOAST table
should I vacuum next..")

https://www.postgresql.org/message-id/14255.1536781...@sss.pgh.pa.us
|I don't actually think that's a good idea.  It's more likely to clutter
|peoples' completion lists than offer anything they want.  Even if someone
|actually does want to vacuum a toast table, they are not likely to be
|entering its name via tab completion; they're going to have identified
|which table they want via some query, and then they'll be doing something
|like copy-and-paste out of a query result.


Isn't it better to add the comment explaining why toast tables are
excluded from the tab completion for vacuum while they are vacuumable?
The patch looks good to me except that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: PG 13 release notes, first draft

2020-07-29 Thread Peter Geoghegan

Hi Bruce,

On Fri, Jun 26, 2020 at 3:24 PM Bruce Momjian  wrote:
> Patch attached and applied to PG 13.

I committed the hash_mem_multiplier GUC to Postgres 13 just now.

There should be a note about this in the Postgres 13 release notes,
for the usual reasons. More importantly, the "Allow hash aggregation
to use disk storage for large aggregation result sets" feature should
reference the new GUC directly. Users should be advised that the GUC
may be useful in cases where they upgrade and experience a performance
regression linked to slower hash aggregation. Just including a
documentation link for the GUC would be very helpful.

Thanks
-- 
Peter Geoghegan

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-29 Thread Thomas Munro

On Thu, Jul 30, 2020 at 1:36 AM Robert Haas  wrote:
> On Wed, Jul 29, 2020 at 3:23 AM Thomas Munro  wrote:
> > On Tue, Jul 14, 2020 at 9:12 AM Robert Haas  wrote:
> > > A number of EDB customers have had this error crop on their tables for
> > > reasons that we have usually not been able to determine. In many
> >
> > Do you happen to know if they ever used the
> > snapshot-too-old feature?
>
> I don't have any reason to believe that they did. Why?

Nothing specific, I was just contemplating the problems with that
feature and the patches[1] proposed so far to fix some of them, and
what types of corruption might be possible due to that stuff, and it
occurred to me to ask if you'd thought about that in connection to
these reports.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-07-29 Thread Daniel Gustafsson

> On 29 Jul 2020, at 10:32, Michael Paquier  wrote:
> 
> On Wed, Jul 01, 2020 at 06:24:18PM +0900, Michael Paquier wrote:
>> I am not sure either, still it looks worth thinking about it.
>> Attached is a rebased version of the last patch.  What this currently
>> holds is just the switch to heap_multi_insert() for the three catalogs
>> pg_attribute, pg_depend and pg_shdepend.  One point that looks worth
>> debating about is to how much to cap the data inserted at once.  This
>> uses 64kB for all three, with a number of slots chosen based on the
>> size of each record, similarly to what we do for COPY.
> 
> I got an extra round of review done for this patch.  

Thanks!

> While on it, I have done some measurements to see the difference in
> WAL produced and get an idea of the gain.

> On HEAD, with a table that has 1300 attributes, this leads to 563kB of 
> WAL produced.  With the patch, we get down to 505kB.  That's an
> extreme case of course, but that's nice a nice gain.
> 
> A second test, after creating a database from a template that has
> roughly 10k entries in pg_shdepend (10k empty tables actually), showed
> a reduction from 2158kB to 1762kB in WAL.

Extreme cases for sure, but more importantly, there should be no cases when
this would cause an increase wrt the status quo.

> Finally comes the third catalog, pg_depend, and there is one thing
> that makes me itching about this part.  We do a lot of useless work
> for the allocation and destruction of the slots when there are pinned
> dependencies, and there can be a lot of them.  Just by running the
> main regression tests, it is easy to see that in 0003 we still do a
> lot of calls of recordMultipleDependencies() for one single
> dependency, and that most of these are actually pinned.  So we finish
> by doing a lot of slot manipulation to insert nothing at the end,
> contrary to the counterparts with pg_shdepend and pg_attribute.  

Maybe it'd be worth pre-computing by a first pass which tracks pinned objects
in a bitmap; with a second pass which then knows how many and which to insert
into slots?

> In
> short, I think that for now it would be fine to commit a patch that
> does the multi-INSERT optimization for pg_attribute and pg_shdepend,
> but that we need more careful work for pg_depend.

Fair enough, let's break out pg_depend and I'll have another go at that.

cheers ./daniel

Re: PATCH: Add uri percent-encoding for binary data

2020-07-29 Thread Daniel Gustafsson

> On 1 Jul 2020, at 16:58, Alvaro Herrera  wrote:
> 
> On 2020-Jul-01, Daniel Gustafsson wrote:
> 
>>> On 19 Mar 2020, at 08:55, Daniel Gustafsson  wrote:
>> 
>>> With no response for 2 weeks during the commitfest, I propose to move this 
>>> to
>>> the next CF to allow time for discussions.
>> 
>> This patch no longer applies, the failing hunk being in the docs part.  As
>> stated in my review earlier in the thread I don't think this feature is
>> complete enough in its current form; having hacked on it a bit, what are your
>> thoughts Alvaro?
> 
> If the author (or some other person interested in the feature) submits a
> version addressing the feedback, by all means let's consider it further;
> but if nothing happens during this commitfest, I'd say we close as RwF
> at end of July.

As per discussion, this entry is closed as "Returned with Feedback".

cheers ./daniel

Re: new heapcheck contrib module

2020-07-29 Thread Robert Haas

On Mon, Jul 20, 2020 at 5:02 PM Mark Dilger
 wrote:
> I've made the options 'all-visible', 'all-frozen', and 'none'.  It defaults 
> to 'none'.

That looks nice.

> > I guess that
> > could still be expensive if there's a lot of them, but needing
> > ShareUpdateExclusiveLock rather than only AccessShareLock is a little
> > unfortunate.
>
> I welcome strategies that would allow for taking a lesser lock.

I guess I'm not seeing why you need any particular strategy here. Say
that at the beginning you note the starting relfrozenxid of the table
-- I think I would lean toward just ignoring datfrozenxid and the
cluster-wide value completely. You also note the current value of the
transaction ID counter. Those are the two ends of the acceptable
range.

Let's first consider the oldest acceptable XID, bounded by
relfrozenxid. If you see a value that is older than the relfrozenxid
value that you noted at the start, it is definitely invalid. If you
see a newer value, it could still be older than the table's current
relfrozenxid, but that doesn't seem very worrisome. If the user
vacuumed the table while they were running this tool, they can always
run the tool again afterward if they wish. Forcing the vacuum to wait
by taking ShareUpdateExclusiveLock doesn't actually solve anything
anyway: you STILL won't notice any problems the vacuum introduces, and
in fact you are now GUARANTEED not to notice them, plus now the vacuum
happens later.

Now let's consider the newest acceptable XID, bounded by the value of
the transaction ID counter. Any time you see a newer XID than the last
value of the transaction ID counter that you observed, you go observe
it again. If the value from the table still looks invalid, then you
complain about it. Either way, you remember the new observation and
check future tuples against that value. I think the patch is already
doing this anyway; if it weren't, you'd need an even stronger lock,
one sufficient to prevent any insert/update/delete activity on the
table altogether.

Maybe I'm just being dense here -- exactly what problem are you worried about?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-29 Thread Justin Pryzby

On Wed, Jul 29, 2020 at 03:21:19PM +0900, Michael Paquier wrote:
> On Wed, Jul 29, 2020 at 01:27:07PM +0900, Masahiko Sawada wrote:
> > Good catch. The patch looks good to me.
> 
> While this patch is logically correct.  I think that we should try to
> not increase more the number of queries used to scan pg_class
> depending on a list of relkinds.  For example, did you notice that
> your new Query_for_list_of_vacuumables becomes the same query as
> Query_for_list_of_indexables?  You can make your patch more simple.

I didn't notice.  There's an argument for keeping them separate, but as long as
there's a #define in between, this is fine, too.

On Wed, Jul 29, 2020 at 08:05:57PM +0900, Michael Paquier wrote:
> On Wed, Jul 29, 2020 at 06:41:16PM +0900, Masahiko Sawada wrote:
> > whereas Query_for_list_of_vacuumables should search for:
> > 
> > RELKIND_RELATION
> > RELKIND_PARTITIONED_TABLE
> > RELKIND_MATVIEW
> > RELKIND_TOASTVALUE
> 
> FWIW, I don't think that we should make toast relations suggested to
> the user at all for any command.  This comes down to the same point
> that we don't have pg_toast in search_path, and going down to this
> level of operations is an expert-level mode, not something we should
> recommend to the average user in psql IMO.

Right.  Tom's response to that suggestion a couple years ago I thought was
pretty funny (I picture Dr. Claw at his desk using psql tab completion being
presented with a list of pg_toast.pg_toast_NN OIDs: "which TOAST table
should I vacuum next..")

https://www.postgresql.org/message-id/14255.1536781...@sss.pgh.pa.us
|I don't actually think that's a good idea.  It's more likely to clutter
|peoples' completion lists than offer anything they want.  Even if someone
|actually does want to vacuum a toast table, they are not likely to be
|entering its name via tab completion; they're going to have identified
|which table they want via some query, and then they'll be doing something
|like copy-and-paste out of a query result.

-- 
Justin
>From c5b356797fc092aec076cca9faa83ea086516350 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 28 Jul 2020 11:56:26 -0500
Subject: [PATCH v2] Tab completion for VACUUM of partitioned tables

---
 src/bin/psql/tab-complete.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 272f799c24..b464aceea4 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -573,8 +573,11 @@ static const SchemaQuery Query_for_list_of_indexables = {
 	.result = "pg_catalog.quote_ident(c.relname)",
 };
 
-/* Relations supporting VACUUM */
-static const SchemaQuery Query_for_list_of_vacuumables = {
+/* Relations supporting VACUUM are currently same as those supporting indexing */
+#define Query_for_list_of_vacuumables Query_for_list_of_indexables
+
+/* Relations supporting CLUSTER */
+static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
@@ -584,9 +587,6 @@ static const SchemaQuery Query_for_list_of_vacuumables = {
 	.result = "pg_catalog.quote_ident(c.relname)",
 };
 
-/* Relations supporting CLUSTER are currently same as those supporting VACUUM */
-#define Query_for_list_of_clusterables Query_for_list_of_vacuumables
-
 static const SchemaQuery Query_for_list_of_constraints_with_schema = {
 	.catname = "pg_catalog.pg_constraint c",
 	.selcondition = "c.conrelid <> 0",
-- 
2.17.0

Re: proposal: unescape_text function

2020-07-29 Thread Asif Rehman

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi,

The patch looks good to me.

The new status of this patch is: Ready for Committer

Re: Online checksums patch - once again

2020-07-29 Thread Robert Haas

On Mon, Jun 22, 2020 at 8:27 AM Daniel Gustafsson  wrote:
> Attached is a new version of the online checksums patch which, I hope, address
> most of the concerns raised in previous reviews.  There has been a fair amount
> of fiddling done, so below is a summary of what has been done.

Here are a bunch of comments based on a partial read-through of this
patch. The most serious concerns, around synchronization, are down
toward at the bottom. Sorry this is a bit eclectic as a review, but I
wrote things down as I read through the patch more or less in the
order I ran across them.

Regarding disable_data_checksums(), I disagree with ereport(LOG, ...)
here. If you want to indicate to the caller whether or not a state
change occurred, you could consider returning a Boolean instead of
void. If you want to do it with log messages, I vote for NOTICE, not
LOG. Maybe NOTICE is better, because enable_data_checksums() seems to
want to convey more information that you can represent in a Boolean,
but then it should use NOTICE consistently, not a mix of NOTICE and
LOG.

Formatting needs work for project style: typically no braces around
single statements, "ereport(WHATEVER," should always have a line break
at that point.

+ * cluster, which was not initialized with checksums, this worker will ensure

"which was not initialized with checksums" => "that does not running
with checksums enabled"?

+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and controlfile, no changes are performed
+ * on the data pages or in the catalog.

Comma splice. Either write "controlfile; no" or "controlfile, and no".

My spell-checker complains that controfile, clusterwide, inprogress,
and endstate are not words. I think you should think about inserting
spaces or, in the case of cluster-wide, a dash, unless they are being
used as literals, in which case perhaps those instances should be
quoted. "havent" needs an apostrophe.

+ * DataChecksumsWorker will compile a list of databases which exists at the

which exist

+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum, this will generate

Comma splice. Split into two sentences.

+ * In case checksums have been enabled and later disabled, when re-enabling
+ * pg_class.relhaschecksums will be reset to false before entering inprogress
+ * mode to ensure that all relations are re-processed.

"If checksums are enabled, then disabled, and then re-enabled, every
relation's pg_class.relhaschecksums field will be reset to false
before entering the in-progress mode."

+ * Disabling checksums is done as an immediate operation as it only updates

s/done as //

+ * to pg_class.relhaschecksums is performed as it only tracks state during

is performed -> are necessary

+ * Access to other members can be done without a lock, as while they are
+ * in shared memory, they are never concurrently accessed. When a worker
+ * is running, the launcher is only waiting for that worker to finish.

The way this is written, it sounds like you're saying that concurrent
access might be possible when this structure isn't in shared memory.
But since it's called DatachecksumsWorkerShmemStruct that's not likely
a correct conclusion, so I think it needs rephrasing.

+ if (DatachecksumsWorkerShmem->launcher_started &&
!DatachecksumsWorkerShmem->abort)
+ started = true;

Why not started = a && b instead of started = false; if (a && b) started = true?

+ {
+ LWLockRelease(DatachecksumsWorkerLock);
+ ereport(ERROR,
+ (errmsg("data checksums worker has been aborted")));
+ }

Errors always release LWLocks, so this seems unnecessary. Also, the
error message looks confusing from a user perspective. What does it
mean if I ask you to make me a cheeseburger and you tell me the
cheeseburger has been eaten? I'm asking for a *new* cheeseburger (or
in this case, a new worker).

I wonder why this thing is inventing a brand new way of aborting a
worker, anyway. Why not just keep track of the PID and send it SIGINT
and have it use CHECK_FOR_INTERRUPTS()? That's already sprinkled all
over the code, so it's likely to work better than some brand-new
mechanism that will probably have checks in a lot fewer places.

+ vacuum_delay_point();

Huh? Why?

+ elog(DEBUG2,
+ "background worker \"datachecksumsworker\" starting to process relation %u",
+ relationId);

This and similar messages seem likely they refer needlessly to
internals, e.g. this could be "adding checksums to relation with OID
%u" without needing to reference background workers or
datachecksumworker. It would be even better if we could find a way to
report relation names.

+ * so when the cluster comes back up processing will habe to be resumed.

habe -> have

+ ereport(FATAL,
+ (errmsg("cannot enable checksums without the postmaster process"),
+ errhint("Restart the database and restart the checksumming process
by calling pg_enable_data_checksums().")));

Re: logical replication empty transactions

2020-07-29 Thread Rahila Syed


Hi,

Please see below review of the 
0001-Skip-empty-transactions-for-logical-replication.patch


The make check passes.


 +               /* output BEGIN if we haven't yet */
 +               if (!data->xact_wrote_changes)
 +                       pgoutput_begin(ctx, txn);
 +
 +               data->xact_wrote_changes = true;
 +
IMO, xact_wrote_changes flag is better set inside the if condition as it 
does not need to

be set repeatedly in subsequent calls to the same function.



* Stash BEGIN data in plugin's 
LogicalDecodingContext.output_plugin_private when plugin's begin 
callback called, don't write anything to the outstream
* Write out BEGIN message lazily when any other callback generates a 
message that does need to be written out
* If no BEGIN written by the time COMMIT callback called, discard the 
COMMIT too. Check if sync rep enabled. if it is, 
call LogicalDecodingContext.update_progress
from within the output plugin commit handler, otherwise just ignore 
the commit totally. Probably by calling OutputPluginUpdateProgress().




I think the code in the patch is similar to what has been described by 
Craig in the above snippet,
except instead of stashing the BEGIN message and sending the message 
lazily, it simply maintains a flag
in LogicalDecodingContext.output_plugin_private which defers calling 
output plugin's begin callback,

until any other callback actually generates a remote write.

Also, the patch does not contain the last part where he describes 
having OutputPluginUpdateProgress()

for synchronous replication enabled transactions.
However, some basic testing suggests that the patch does not have any 
notable adverse effect on

either the replication lag or the sync_rep performance.

I performed tests by setting up publisher and subscriber on the same 
machine with synchronous_commit = on and

ran pgbench -c 12 -j 6 -T 300 on unpublished pgbench tables.

I see that  confirmed_flush_lsn is catching up just fine without any 
notable delay as compared to the test results without

the patch.

Also, the TPS for synchronous replication of empty txns with and without 
the patch remains similar.


Having said that, these are initial findings and I understand better 
performance tests are required to measure
reduction in consumption of network bandwidth and impact on synchronous 
replication and replication lag.


Thank you,
Rahila Syed

Re: Default setting for enable_hashagg_disk

2020-07-29 Thread Jeff Davis

On Sat, 2020-07-25 at 17:52 -0700, Peter Geoghegan wrote:
> BTW, your HLL patch ameliorates the problem with my extreme "sorted
> vs
> random input" test case from this morning [1] (the thing that I just
> discussed with Tomas). Without the HLL patch the sorted case had 2424
> batches. With the HLL patch it has 20. That at least seems like a
> notable improvement.

Committed.

Though I did notice some overhead for spilled-but-still-in-memory cases
due to addHyperLogLog() itself. It seems that it can be mostly
eliminated with [1], though I'll wait to see if there's an objection
because that would affect other users of HLL.

Regards,
Jeff Davis

[1] 
https://www.postgresql.org/message-id/17068336d300fab76dd6131cbe1996df450dde38.ca...@j-davis.com

HyperLogLog.c and pg_leftmost_one_pos32()

2020-07-29 Thread Jeff Davis

Is there a reason that HyperLogLog doesn't use pg_leftmost_one_pos32()?

I tried the following patch and some brief performance tests seem to
show an improvement.

This came up because my recent commit 9878b643 uses HLL for estimating
the cardinality of spill files, which solves a few annoyances with
overpartitioning[1]. I think it's overall an improvement, but
addHyperLogLog() itself seemed to show up as a cost, so it can hurt
spilled-but-still-in-memory cases. I'd also like to backpatch this to
13 (as I already did for 9878b643), if that's acceptable.

Regards,
Jeff Davis

[1] 
https://www.postgresql.org/message-id/cah2-wznidojad-zbobnfozda5rtcs0jlsqczkdnu+ou1ngy...@mail.gmail.com
diff --git a/src/backend/lib/hyperloglog.c b/src/backend/lib/hyperloglog.c
index a5cc1f8b83b..5ef0cc9c449 100644
--- a/src/backend/lib/hyperloglog.c
+++ b/src/backend/lib/hyperloglog.c
@@ -49,6 +49,7 @@
 #include 
 
 #include "lib/hyperloglog.h"
+#include "port/pg_bitutils.h"
 
 #define POW_2_32			(4294967296.0)
 #define NEG_POW_2_32		(-4294967296.0)
@@ -240,13 +241,15 @@ estimateHyperLogLog(hyperLogLogState *cState)
 static inline uint8
 rho(uint32 x, uint8 b)
 {
-	uint8		j = 1;
+	int p;
 
-	while (j <= b && !(x & 0x8000))
-	{
-		j++;
-		x <<= 1;
-	}
+	if (x == 0)
+		return b + 1;
+
+	p = 32 - pg_leftmost_one_pos32(x);
+
+	if (p > b)
+		return b + 1;
 
-	return j;
+	return p;
 }

Re: IDEA: pg_stat_statements tracking utility statements by tag?

2020-07-29 Thread Julien Rouhaud

On Wed, Jul 29, 2020 at 5:29 PM Martijn van Oosterhout
 wrote:
>
>
> On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud  wrote:
>>
>> On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao  
>> wrote:
>> >
>> >
>> > Or, we should extend the existing query normalization to handle also DDL?
>>
>> +1, introducing DDL normalization seems like a better way to go in the
>> long run.  Defining what should and shouldn't be normalized can be
>> tricky though.
>
>
> In principle, the only thing that really needs to be normalised is 
> SAVEPOINT/CURSOR names which are essentially random strings which have no 
> effect on the result. Most other stuff is material to the query.
>
> That said, I think "aggregate by tag" has value all by itself. Being able to 
> collapse all CREATE TABLES into a single line can be useful in some 
> situations.

There's at least PREPARE TRANSACTION / COMMIT PREPARED / ROLLBACK
PREPARED that should be normalized too.  I also don't think that we
really want to have different entries for begin / Begin / BEGIN /
bEgin and similar for many other commands, as the hash is computed
based on the query text.

Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-29 Thread Robert Haas

On Fri, Jul 24, 2020 at 3:12 PM Soumyadeep Chakraborty
 wrote:
> Ah yes. We should then have ALTER SYSTEM WAL {PERMIT|PROHIBIT}. I don't
> think we should say "READ ONLY" if we still allow on-disk file changes
> after the ALTER SYSTEM command returns (courtesy dirty buffer flushes)
> because it does introduce confusion, especially to an audience not privy
> to this thread. When people hear "read-only" they may think of static on-disk
> files immediately.

They might think of a variety of things that are not a correct
interpretation of what the feature does, but I think the way to handle
that is to document it properly. I don't think making WAL a grammar
keyword just for this is a good idea. I'm not totally stuck on this
particular syntax if there's consensus on something else, but I
seriously doubt that there will be consensus around adding parser
keywords for this.

> I don't have enough context to enumerate use cases for the advantages or
> opportunities that would come with an assurance that the cluster's files
> are frozen (and not covered by any existing utilities), but surely there
> are some? Like the possibility of pg_upgrade on a running server while
> it can entertain read-only queries? Surely, that's a nice one!

I think that this feature is plenty complicated enough already, and we
shouldn't make it more complicated to cater to additional use cases,
especially when those use cases are somewhat uncertain and would
probably require additional work in other parts of the system.

For instance, I think it would be great to have an option to start the
postmaster in a strictly "don't write ANYTHING" mode where regardless
of the cluster state it won't write any data files or any WAL or even
the control file. It would be useful for poking around on damaged
clusters without making things worse. And it's somewhat related to the
topic of this thread, but it's not THAT closely related. It's better
to add features one at a time; you can always add more later, but if
you make the individual ones too big and hard they don't get done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: IDEA: pg_stat_statements tracking utility statements by tag?

2020-07-29 Thread Martijn van Oosterhout

On Wed, 29 Jul 2020 at 15:40, Julien Rouhaud  wrote:

> On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao 
> wrote:
> >
> >
> > Or, we should extend the existing query normalization to handle also DDL?
>
> +1, introducing DDL normalization seems like a better way to go in the
> long run.  Defining what should and shouldn't be normalized can be
> tricky though.
>

In principle, the only thing that really needs to be normalised is
SAVEPOINT/CURSOR names which are essentially random strings which have no
effect on the result. Most other stuff is material to the query.

That said, I think "aggregate by tag" has value all by itself. Being able
to collapse all CREATE TABLES into a single line can be useful in some
situations.

Ideally the results of FETCH "cursor" should be combined with the DECLARE,
but I really don't know how to go about that.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

Re: Nicer error when connecting to standby with hot_standby=off

2020-07-29 Thread Fujii Masao





On 2020/04/03 22:49, James Coleman wrote:

On Thu, Apr 2, 2020 at 5:53 PM David Zhang  wrote:


The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

I applied the patch to the latest master branch and run a test below. The error 
messages have been separated. Below is the test steps.

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> 
/tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> 
/tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> 
/tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error 
messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT 
txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL:  the database system is 
starting up
...

### after the patch, got different messages, one message indicates hot_standby 
is off
psql: error: could not connect to server: FATAL:  the database system is 
starting up
...
psql: error: could not connect to server: FATAL:  the database system is up, 
but hot_standby is off
...


Thanks for the review and testing!


Thanks for the patch! Here is the comment from me.

+   else if (!FatalError && pmState == PM_RECOVERY)
+   return CAC_STANDBY; /* connection disallowed on non-hot 
standby */

Even if hot_standby is enabled, pmState seems to indicate PM_RECOVERY
until recovery has reached a consistent state. No? So, if my understanding
is right, "FATAL:  the database system is up, but hot_standby is off" can
be logged even when hot_standby is on. Which sounds very confusing.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pg_stat_statements: rows not updated for CREATE TABLE AS SELECT statements

2020-07-29 Thread Fujii Masao





On 2020/05/06 22:49, Asif Rehman wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The patch applies cleanly and works as expected.


Thanks for the review and test!
Since this patch was marked as Ready for Committer, I pushed it.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Include access method in listTables output

2020-07-29 Thread Georgios




‐‐‐ Original Message ‐‐‐
On Saturday, July 25, 2020 2:41 AM, vignesh C  wrote:

> On Thu, Jul 16, 2020 at 7:35 PM Georgios gkokola...@protonmail.com wrote:
>
> > ‐‐‐ Original Message ‐‐‐
> > On Saturday, July 11, 2020 3:16 PM, vignesh C vignes...@gmail.com wrote:
> >
> > > On Mon, Jul 6, 2020 at 1:24 PM Georgios gkokola...@protonmail.com wrote:
> > >
> > > > ‐‐‐ Original Message ‐‐‐
> > > > On Monday, July 6, 2020 3:12 AM, Michael Paquier mich...@paquier.xyz 
> > > > wrote:
> > > >
> > > > > On Sun, Jul 05, 2020 at 07:13:10AM +0530, vignesh C wrote:
> > > > >
> > > > > > I'm not sure if we should include showViews, I had seen that the
> > > > > > access method was not getting selected for view.
> > > > >
> > > > > +1. These have no physical storage, so you are looking here for
> > > > > relkinds that satisfy RELKIND_HAS_STORAGE().
> > > >
> > > > Thank you for the review.
> > > > Find attached v4 of the patch.
> > >
> > > Thanks for fixing the comments.
> > > Patch applies cleanly, make check & make check-world passes.
> > > I was not sure if any documentation change is required or not for this
> > > in doc/src/sgml/ref/psql-ref.sgml. Thoughts?
> >
> > Thank you for the comment. I added a line in docs. Admittedly, might need 
> > tweaking.
> > Please find version 5 of the patch attached.
>
> Most changes looks fine, minor comments:

Thank you for your comments. Please find the individual answers inline for 
completeness.

I'm having issues understanding where you are going with the reviews, can you 
fully describe
what is it that you wish to be done?

>
> \pset tuples_only false
> -- check conditional tableam display
> --- Create a heap2 table am handler with heapam handler
> +\pset expanded off
> +CREATE SCHEMA conditional_tableam_display;
> +CREATE ROLE conditional_tableam_display_role;
> +ALTER SCHEMA conditional_tableam_display OWNER TO
> conditional_tableam_display_role;
> +SET search_path TO conditional_tableam_display;
> CREATE ACCESS METHOD heap_psql TYPE TABLE HANDLER heap_tableam_handler;
>
> This comment might have been removed unintentionally, do you want to
> add it back?


It was intentional as heap2 table am handler is not getting created. With
the code additions the comment does seem out of place and thus removed.

>
> +-- access method column should not be displayed for sequences
> +\ds+
>
> -  List of relations
>
>
> -   Schema | Name | Type | Owner | Persistence | Size | Description
> ++--+--+---+-+--+-
> +(0 rows)
>
> We can include one test for view.

The list of cases in the test for both including and excluding storage is by no 
means
exhaustive. I thought that this is acceptable. Adding a test for the view, will 
still
not make it the test exhaustive. Are you sure you only suggest the view to be 
included
or you would suggest an exhaustive list of tests.

>
> -   if (pset.sversion >= 12 && !pset.hide_tableam &&
>
> -   (showTables || showMatViews || showIndexes))
>
> -   appendPQExpBuffer(&buf,
>
> -   ",\n am.amname as \"%s\"",
>
> -   gettext_noop("Access Method"));
>
> -   /*
> -   As of PostgreSQL 9.0, use pg_table_size() to show a more accurate
> -   size of a table, including FSM, VM and TOAST tables.
> @@ -3772,6 +3778,12 @@ listTables(const char *tabtypes, const char
> *pattern, bool verbose, bool showSys
> appendPQExpBufferStr(&buf,
> "\nFROM pg_catalog.pg_class c"
> "\n LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace");
>
> -
> -   if (pset.sversion >= 12 && !pset.hide_tableam &&
>
> -   (showTables || showMatViews || showIndexes))
>
> If conditions in both the places are the same, do you want to make it a 
> macro?

No, I would rather not if you may. I think that a macro would not add to the 
readability
rather it would remove from it. Those are two simple conditionals used in the 
same function
very close to each other.


>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>

Re: Making CASE error handling less surprising

2020-07-29 Thread Robert Haas

On Fri, Jul 24, 2020 at 1:18 PM Tom Lane  wrote:
> The real bottom line is: if you don't want to do this, how else do
> you want to fix the problem?  I'm no longer willing to deny that
> there is a problem.

That's the wrong question. The right question is whether we're
sufficiently certain that a particular proposal is an improvement over
the status quo to justify changing something. It's better to do
nothing than to do something that makes some cases better and other
cases worse, because then instead of users having a problem with this,
they have a variety of different problems depending on which release
they are running.  IMHO, changing the semantics of something like this
is really scary and should be approached with great caution.

You don't have to deny that something is a problem in order to admit
that you might not have a perfect solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Making CASE error handling less surprising

2020-07-29 Thread Robert Haas

On Sun, Jul 26, 2020 at 1:27 PM Chris Travers  wrote:
> The first (probably the best) would be a solution along the lines of yours 
> along with a session-level GUC variable which could determine whether case 
> branches can fold constants.

Bluntly, that seems like a terrible idea. It's great if you are an
expert DBA, because then you can adjust the behavior on your own
system according to what works best for you. But if you are trying to
write portable code that will work on any PostgreSQL instance, you now
have to remember to test it with every possible value of the GUC and
make sure it behaves the same way under all of them. That is a major
burden on authors of tools and extensions, and if we add even three or
four such GUCs with three or four possible values each, there are
suddenly dozens or even hundreds of possible combinations to test. I
think that adding GUCs for this kind of thing is a complete
non-starter for that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: INSERT INTO SELECT, Why Parallelism is not selected?

2020-07-29 Thread Robert Haas

On Sun, Jul 26, 2020 at 7:24 AM Amit Kapila  wrote:
> No, "git diff --check" doesn't help.  I have tried pgindent but that
> also doesn't help neither was I expecting it to help.  I am still not
> able to figure out how I goofed up this but will spend some more time
> on this.  In the meantime, I have updated the patch to improve the
> comments as suggested by Robert.  Do let me know if you want to
> edit/add something more?

I still don't agree with this as proposed.

+ * For now, we don't allow parallel inserts of any form not even where the
+ * leader can perform the insert.  This restriction can be uplifted once
+ * we allow the planner to generate parallel plans for inserts.  We can

If I'm understanding this correctly, this logic is completely
backwards. We don't prohibit inserts here because we know the planner
can't generate them. We prohibit inserts here because, if the planner
somehow did generate them, it wouldn't be safe. You're saying that
it's not allowed because we don't try to do it yet, but actually it's
not allowed because we want to make sure that we don't accidentally
try to do it. That's very different.

+ * parallelize inserts unless they generate a new commandid (ex. inserts
+ * into a table having foreign key column) or lock tuples (ex. statements
+ * like Insert .. Select For Update).

I understand the part about generating new command IDs, but not the
part about locking tuples. Why would that be a problem? Can it better
explained here?

Examples in comments are typically introduced with e.g., not ex.

+ * We should be able to parallelize
+ * the later case if we can ensure that no two parallel processes can ever
+ * operate on the same page.

I don't know whether this is talking about two processes operating on
the same page at the same time, or ever within a single query
execution. If it's the former, perhaps we need to explain why that's a
concern for parallel query but not otherwise; if it's the latter, that
seems impossible to guarantee and imagining that we'll ever be able to
do so seems like wishful thinking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: IDEA: pg_stat_statements tracking utility statements by tag?

2020-07-29 Thread Julien Rouhaud

On Wed, Jul 29, 2020 at 2:42 PM Fujii Masao  wrote:
>
> On 2020/07/29 18:24, Martijn van Oosterhout wrote:
> > Hoi hackers,
> >
> > We've been using the pg_stat_statements extension to get an idea of the 
> > queries used in the database, but the table is being filled with entries 
> > like:
> >
> > SAVEPOINT sa_savepoint_NNN;
> > RELEASE SAVEPOINT sa_savepoint_NNN;
> > DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
> > FETCH FORWARD 250 FROM "c_7f9451b03908_5"
> >
> > Since the unique id is different for each query, the aggregation does 
> > nothing and there are quite a lot of these drowning out the normal queries 
> > (yes, I'm aware this is an issue of itself). The only way to deal with this 
> > is "pg_stat_statements.track_utility=off". However, it occurs to me that if 
> > you just tracked the tags rather than the full query text you could at 
> > least track the number of such queries and how much time they take. So the 
> > above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and 
> > (I guess) FETCH respectively. But it would also catch DDL.
> >
> > Does this sound like something for which a patch would be accepted?
>
> Or, we should extend the existing query normalization to handle also DDL?

+1, introducing DDL normalization seems like a better way to go in the
long run.  Defining what should and shouldn't be normalized can be
tricky though.

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-29 Thread Robert Haas

On Wed, Jul 29, 2020 at 3:23 AM Thomas Munro  wrote:
> On Tue, Jul 14, 2020 at 9:12 AM Robert Haas  wrote:
> > A number of EDB customers have had this error crop on their tables for
> > reasons that we have usually not been able to determine. In many
>
> Do you happen to know if they ever used the
> snapshot-too-old feature?

I don't have any reason to believe that they did. Why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: IDEA: pg_stat_statements tracking utility statements by tag?

2020-07-29 Thread Fujii Masao





On 2020/07/29 18:24, Martijn van Oosterhout wrote:

Hoi hackers,

We've been using the pg_stat_statements extension to get an idea of the queries 
used in the database, but the table is being filled with entries like:

SAVEPOINT sa_savepoint_NNN;
RELEASE SAVEPOINT sa_savepoint_NNN;
DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
FETCH FORWARD 250 FROM "c_7f9451b03908_5"

Since the unique id is different for each query, the aggregation does nothing and there 
are quite a lot of these drowning out the normal queries (yes, I'm aware this is an issue 
of itself). The only way to deal with this is 
"pg_stat_statements.track_utility=off". However, it occurs to me that if you 
just tracked the tags rather than the full query text you could at least track the number 
of such queries and how much time they take. So the above queries would be tracked under 
SAVEPOINT, RELEASE, DECLARE CURSOR and (I guess) FETCH respectively. But it would also 
catch DDL.

Does this sound like something for which a patch would be accepted?


Or, we should extend the existing query normalization to handle also DDL?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Should we remove a fallback promotion? take 2

2020-07-29 Thread Fujii Masao





On 2020/07/28 20:35, Hamid Akhtar wrote:

There have been no real objections on this patch and it seems to work.


Thanks! So I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: factorial function/phase out postfix operators?

2020-07-29 Thread John Naylor

On Wed, Jul 22, 2020 at 8:47 AM Mark Dilger
 wrote:
>
>
>
> > On Jul 18, 2020, at 1:00 AM, John Naylor  
> > wrote:
> >
> > pg_get_keywords() should probably have a column to display ability to
> > act as a bare col label. Perhaps a boolean? If so, what do you think
> > of using true/false for the new field in kwlist.h as well?

Hi Mark, sorry for the delay.

> I have broken this into its own patch.  I like using a BARE_LABEL / 
> EXPLICIT_LABEL in kwlist.h because it is self-documenting.  I don't care 
> about the *exact* strings that we choose for that, but using TRUE/FALSE in 
> kwlist.h makes it harder for a person adding a new keyword to know what to 
> place there.  If they guess "FALSE", and also don't know about adding the new 
> keyword to the bare_label_keyword rule in gram.y, then those two mistakes 
> will agree with each other and the person adding the keyword won't likely 
> know they did it wrong.  It is simple enough for gen_keywordlist.pl to 
> convert between what we use in kwlist.h and a boolean value for kwlist_d.h, 
> so I did it that way.

Sounds fine to me.

> > In the bikeshedding department, it seems "implicit" was chosen because
> > it was distinct from "bare". I think "bare" as a descriptor should be
> > kept throughout for readability's sake. Maybe BareColLabel could be
> > "IDENT or bare_label_keyword" for example. Same for the $status var.
>
> The category "bare_label_keyword" is used in v3.  As for the $status var, I 
> don't want to name that $bare, as I didn't go with your idea about using a 
> boolean.  $status = "BARE_LABEL" vs "EXPLICIT_LABEL" makes sense to me, more 
> than $bare = "BARE_LABEL" vs "EXPLICIT_LABEL" does.  "status" is still a bit 
> vague, so more bikeshedding is welcome.

Yeah, it's very generic, but it's hard to find a short word for
"can-be-used-as-a-bare-column-label-ness".

> This patch does not attempt to remove pre-existing postfix operators from 
> existing databases, so users upgrading to the new major version who have 
> custom postfix operators will find that pg_upgrade chokes trying to recreate 
> the postfix operator.  That's not great, but perhaps there is nothing 
> automated that we could do for them that would be any better.

I'm thinking it would be good to have something like

select oid from pg_operator where oprright = 0 and oid >= FirstNormalObjectId;

in the pre-upgrade check.

Other comments:

0001:

+ errhint("postfix operator support has been discontinued")));

This language seems more appropriate for release notes -- I would word
the hint in the present, as in "postfix operators are not supported".
Ditto the words "discontinuation", "has been removed", and "no longer
works" elsewhere in the patch.

+SELECT -5!;
+SELECT -0!;
+SELECT 0!;
+SELECT 100!;

I think one negative and one non-negative case is enough to confirm
the syntax error.

- gram.y still contains "POSTFIXOP" and "postfix-operator".

- parse_expr.c looks like it has some now-unreachable code.


0002:

+ * All keywords can be used explicitly as a column label in expressions
+ * like 'SELECT 1234 AS keyword', but only some keywords can be used
+ * implicitly as column labels in expressions like 'SELECT 1234 keyword'.
+ * Those that can be used implicitly should be listed here.

In my mind, "AS" is the thing that's implied when not present, so we
should reword this to use the "bare" designation when talking about
the labels. I think there are contexts elsewhere where the implicit
column label is "col1, col2, col3...". I can't remember offhand where
that is though.

- * kwlist.h's table from one source of truth.)
+ * kwlist.h's table from a common master list.)

Off topic.


0003:

First off, I get a crash when trying

select * from pg_get_keywords();

and haven't investigated further. I don't think the returned types
match, though.

Continuing on, I think 2 and 3 can be squashed together. If anything,
it should make revisiting cosmetic decisions easier.

+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "bare",
+BOOLOID, -1, 0);

Perhaps something a bit meatier for the user-visible field name. I
don't have a great suggestion.

-  proname => 'pg_get_keywords', procost => '10', prorows => '400',
+  proname => 'pg_get_keywords', procost => '10', prorows => '450',

Off topic for this patch. Not sure it matters much, either.

"EXPLICIT_LABEL" -- continuing my line of thought above, all labels
are explicit, that's why they're called labels. Brainstorm:

EXPLICIT_AS_LABEL
EXPLICIT_AS
NON_BARE_LABEL
*shrug*

+ # parser/kwlist.h lists each keyword as either bare or
+ # explicit, but ecpg neither needs nor has any such

PL/pgSQL also uses this script, so maybe just phrase it to exclude the
core keyword list.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Patch] ALTER SYSTEM READ ONLY

2020-07-29 Thread Amul Sul

On Fri, Jul 24, 2020 at 10:40 PM Soumyadeep Chakraborty <
soumyadeep2...@gmail.com> wrote:

> On Thu, Jul 23, 2020 at 10:14 PM Amul Sul  wrote:
> >
> > On Fri, Jul 24, 2020 at 6:28 AM Soumyadeep Chakraborty <
> soumyadeep2...@gmail.com> wrote:
> > > In case it is necessary, the patch set does not wait for the
> checkpoint to
> > > complete before marking the system as read-write. Refer:
> > >
> > > /* Set final state by clearing in-progress flag bit */
> > > if (SetWALProhibitState(wal_state &
> > ~(WALPROHIBIT_TRANSITION_IN_PROGRESS)))
> > > {
> > >   if ((wal_state & WALPROHIBIT_STATE_READ_ONLY) != 0)
> > > ereport(LOG, (errmsg("system is now read only")));
> > >   else
> > >   {
> > > /* Request checkpoint */
> > > RequestCheckpoint(CHECKPOINT_IMMEDIATE);
> > > ereport(LOG, (errmsg("system is now read write")));
> > >   }
> > > }
> > >
> > > We should RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT)
> before
> > > we SetWALProhibitState() and do the ereport(), if we have a read-write
> > > state change request.
> > >
> > +1, I too have the same question.
> >
> >
> >
> > FWIW, I don't we can request CHECKPOINT_WAIT for this place, otherwise,
> it
> > think
> > it will be deadlock case -- checkpointer process waiting for itself.
>
> We should really just call CreateCheckPoint() here instead of
> RequestCheckpoint().
>
>
The only setting flag would have been enough for now, the next loop of
CheckpointerMain() will anyway be going to call CreateCheckPoint() without
waiting.  I used RequestCheckpoint() to avoid duplicate flag setting code.
Also, I think RequestCheckpoint() will be better so that we don't need to
deal
will the standalone backend, the only imperfection is it will unnecessary
signal
itself, that would be fine I guess.

> > 3. Some of the functions that were added such as GetWALProhibitState(),
> > > IsWALProhibited() etc could be declared static inline.
> > >
> > IsWALProhibited() can be static but not GetWALProhibitState() since it
> > needed to
> > be accessible from other files.
>
> If you place a static inline function in a header file, it will be
> accessible from other files. E.g. pg_atomic_* functions.
>

Well, the current patch set also has few inline functions in the header
file.
But, I don't think we can do the same for GetWALProhibitState() without
changing
the XLogCtl structure scope which is local to xlog.c file and the changing
XLogCtl
scope would be a bad idea.

Regards,
Amul

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-29 Thread Michael Paquier

On Wed, Jul 29, 2020 at 06:41:16PM +0900, Masahiko Sawada wrote:
> whereas Query_for_list_of_vacuumables should search for:
> 
> RELKIND_RELATION
> RELKIND_PARTITIONED_TABLE
> RELKIND_MATVIEW
> RELKIND_TOASTVALUE

FWIW, I don't think that we should make toast relations suggested to
the user at all for any command.  This comes down to the same point
that we don't have pg_toast in search_path, and going down to this
level of operations is an expert-level mode, not something we should
recommend to the average user in psql IMO.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Tab completion for VACUUM of partitioned tables

2020-07-29 Thread Masahiko Sawada

On Wed, 29 Jul 2020 at 15:21, Michael Paquier  wrote:
>
> On Wed, Jul 29, 2020 at 01:27:07PM +0900, Masahiko Sawada wrote:
> > Good catch. The patch looks good to me.
>
> While this patch is logically correct.  I think that we should try to
> not increase more the number of queries used to scan pg_class
> depending on a list of relkinds.  For example, did you notice that
> your new Query_for_list_of_vacuumables becomes the same query as
> Query_for_list_of_indexables?

Oh, I didn't realize that.

Looking at target relation kinds for operations in-depth, I think that
the relation list for index creation and the relation list for vacuum
is different.

Query_for_list_of_indexables should search for:

RELKIND_RELATION
RELKIND_PARTITIONED_TABLE
RELKIND_MATVIEW

whereas Query_for_list_of_vacuumables should search for:

RELKIND_RELATION
RELKIND_PARTITIONED_TABLE
RELKIND_MATVIEW
RELKIND_TOASTVALUE

Also, Query_for_list_of_clusterables is further different from the
above two lists. It should search for:

RELKIND_RELATION
RELKIND_MATVIEW
RELKIND_TOASTVALUE

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH]Fix ja.po error

2020-07-29 Thread Michael Paquier

On Wed, Jul 29, 2020 at 08:42:27AM +, Lu, Chenyang wrote:
> When I was using PostgreSQL, I noticed that the output of the
> Japanese messages was inconsistent with the English messages. 
> The Japanese message needs to be modified,so I made the patch.

Indeed, good catch.  This needs to be applied to the translation
repository first though:
https://git.postgresql.org/gitweb/?p=pgtranslation/messages.git;a=summary

I am adding Alvaro and Peter in CC as they take care of that usually
(I don't think I have an access to this repo).
--
Michael


signature.asc
Description: PGP signature

IDEA: pg_stat_statements tracking utility statements by tag?

2020-07-29 Thread Martijn van Oosterhout

Hoi hackers,

We've been using the pg_stat_statements extension to get an idea of the
queries used in the database, but the table is being filled with entries
like:

SAVEPOINT sa_savepoint_NNN;
RELEASE SAVEPOINT sa_savepoint_NNN;
DECLARE "c_7f9451c4dcc0_5" CURSOR WITHOUT HOLD ...
FETCH FORWARD 250 FROM "c_7f9451b03908_5"

Since the unique id is different for each query, the aggregation does
nothing and there are quite a lot of these drowning out the normal queries
(yes, I'm aware this is an issue of itself). The only way to deal with this
is "pg_stat_statements.track_utility=off". However, it occurs to me that if
you just tracked the tags rather than the full query text you could at
least track the number of such queries and how much time they take. So the
above queries would be tracked under SAVEPOINT, RELEASE, DECLARE CURSOR and
(I guess) FETCH respectively. But it would also catch DDL.

Does this sound like something for which a patch would be accepted?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

[PATCH]Fix ja.po error

2020-07-29 Thread Lu, Chenyang

Hi,hackers

When I was using PostgresQL, I noticed that the output of the Japanese messages 
was inconsistent with the English messages.
The Japanese message needs to be modified,so I made the patch.


See the attachment for the patch.


Best regards





ja_po.patch
Description: ja_po.patch

Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-07-29 Thread Andrey V. Lepikhov


On 7/29/20 1:03 PM, Amit Langote wrote:

Hi Andrey,

Thanks for updating the patch.  I will try to take a look later.

On Wed, Jul 22, 2020 at 6:09 PM Andrey V. Lepikhov
 wrote:

On 7/16/20 2:14 PM, Amit Langote wrote:

* Why the "In" in these API names?

+   /* COPY a bulk of tuples into a foreign relation */
+   BeginForeignCopyIn_function BeginForeignCopyIn;
+   EndForeignCopyIn_function EndForeignCopyIn;
+   ExecForeignCopyIn_function ExecForeignCopyIn;


I used an analogy from copy.c.


Hmm, if we were going to also need *ForeignCopyOut APIs, maybe it
makes sense to have "In" here, but maybe we don't, so how about
leaving out the "In" for clarity?

Ok, sounds good.



* I see that the remote copy is performed from scratch on every call
of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
send the `COPY remote_table FROM STDIN` in
postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
when there are no errors during the copy?


It is not possible. FDW share one connection between all foreign
relations from a server. If two or more partitions will be placed at one
foreign server you will have problems with concurrent COPY command.


Ah, you're right.  I didn't consider multiple foreign partitions
pointing to the same server.  Indeed, we would need separate
connections to a given server to COPY to multiple remote relations on
that server in parallel.


May be we can create new connection for each partition?


Yeah, perhaps, although it sounds like something that might be more
generally useful and so we should work on that separately if at all.

I will try to prepare a separate patch.



I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
and sending `COPY remote_table FROM STDIN` only once instead of on
every flush -- and I see significant speedup.  Please check the
attached patch that applies on top of yours.


I integrated first change and rejected the second by the reason as above.


Thanks.

Will send more comments after reading the v5 patch.


Ok. I'll be waiting for the end of your review.

--
regards,
Andrey Lepikhov
Postgres Professional

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-07-29 Thread Michael Paquier

On Wed, Jul 01, 2020 at 06:24:18PM +0900, Michael Paquier wrote:
> I am not sure either, still it looks worth thinking about it.
> Attached is a rebased version of the last patch.  What this currently
> holds is just the switch to heap_multi_insert() for the three catalogs
> pg_attribute, pg_depend and pg_shdepend.  One point that looks worth
> debating about is to how much to cap the data inserted at once.  This
> uses 64kB for all three, with a number of slots chosen based on the
> size of each record, similarly to what we do for COPY.

I got an extra round of review done for this patch.  One spot was
missed in heap_multi_insert() for a comment telling catalogs not using
multi inserts.  After some consideration, I think that using 64kB as a
base number to calculate the number of slots should be fine, similarly
to COPY.

While on it, I have done some measurements to see the difference in
WAL produced and get an idea of the gain.  For example, this function
would create one table with a wanted number of attributes:
CREATE OR REPLACE FUNCTION create_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
  query text;
BEGIN
  query := 'CREATE TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
query := query || 'a_' || i::text || ' int';
IF i != num_cols THEN
  query := query || ', ';
END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;

On HEAD, with a table that has 1300 attributes, this leads to 563kB of 
WAL produced.  With the patch, we get down to 505kB.  That's an
extreme case of course, but that's nice a nice gain.

A second test, after creating a database from a template that has
roughly 10k entries in pg_shdepend (10k empty tables actually), showed
a reduction from 2158kB to 1762kB in WAL.

Finally comes the third catalog, pg_depend, and there is one thing
that makes me itching about this part.  We do a lot of useless work
for the allocation and destruction of the slots when there are pinned
dependencies, and there can be a lot of them.  Just by running the
main regression tests, it is easy to see that in 0003 we still do a
lot of calls of recordMultipleDependencies() for one single
dependency, and that most of these are actually pinned.  So we finish
by doing a lot of slot manipulation to insert nothing at the end,
contrary to the counterparts with pg_shdepend and pg_attribute.  In
short, I think that for now it would be fine to commit a patch that
does the multi-INSERT optimization for pg_attribute and pg_shdepend,
but that we need more careful work for pg_depend.  For example we
could go through all the dependencies first and recalculate the number
of slots to use depending on what is pinned or not, but this would
make sense actually when more dependencies are inserted at once in
more code paths, mostly for ALTER TABLE.  So this needs more
consideration IMO.

Thoughts?
--
Michael

signature.asc
Description: PGP signature

Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-07-29 Thread Amit Langote

Hi Andrey,

Thanks for updating the patch.  I will try to take a look later.

On Wed, Jul 22, 2020 at 6:09 PM Andrey V. Lepikhov
 wrote:
> On 7/16/20 2:14 PM, Amit Langote wrote:
> > * Why the "In" in these API names?
> >
> > +   /* COPY a bulk of tuples into a foreign relation */
> > +   BeginForeignCopyIn_function BeginForeignCopyIn;
> > +   EndForeignCopyIn_function EndForeignCopyIn;
> > +   ExecForeignCopyIn_function ExecForeignCopyIn;
>
> I used an analogy from copy.c.

Hmm, if we were going to also need *ForeignCopyOut APIs, maybe it
makes sense to have "In" here, but maybe we don't, so how about
leaving out the "In" for clarity?

> > * I see that the remote copy is performed from scratch on every call
> > of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
> > send the `COPY remote_table FROM STDIN` in
> > postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
> > when there are no errors during the copy?
>
> It is not possible. FDW share one connection between all foreign
> relations from a server. If two or more partitions will be placed at one
> foreign server you will have problems with concurrent COPY command.

Ah, you're right.  I didn't consider multiple foreign partitions
pointing to the same server.  Indeed, we would need separate
connections to a given server to COPY to multiple remote relations on
that server in parallel.

> May be we can create new connection for each partition?

Yeah, perhaps, although it sounds like something that might be more
generally useful and so we should work on that separately if at all.

> > I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
> > and sending `COPY remote_table FROM STDIN` only once instead of on
> > every flush -- and I see significant speedup.  Please check the
> > attached patch that applies on top of yours.
>
> I integrated first change and rejected the second by the reason as above.

Thanks.

Will send more comments after reading the v5 patch.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: [Patch] Optimize dropping of relation buffers using dlist

2020-07-29 Thread Konstantin Knizhnik





On 17.06.2020 09:14, k.jami...@fujitsu.com wrote:

Hi,

Since the last posted version of the patch fails, attached is a rebased version.
Written upthread were performance results and some benefits and challenges.
I'd appreciate your feedback/comments.

Regards,
Kirk Jamison
As far as i understand this patch can provide significant improvement of 
performance only in case of
recovery  of truncates of large number of tables. You have added shared 
hash of relation buffers and certainly if adds some
extra overhead. According to your latest results this overhead is quite 
small. But it will be hard to prove that there will be no noticeable 
regression

at some workloads.

I wonder if you have considered case of local hash (maintained only 
during recovery)?
If there is after-crash recovery, then there will be no concurrent 
access to shared buffers and this hash will be up-to-date.
in case of hot-standby replica we can use some simple invalidation (just 
one flag or counter which indicates that buffer cache was updated).
This hash also can be constructed on demand when DropRelFileNodeBuffers 
is called first time (so w have to scan all buffers once, but subsequent 
drop operation will be fast.


i have not thought much about it, but it seems to me that as far as this 
problem only affects recovery, we do not need shared hash for it.

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-29 Thread Thomas Munro

On Tue, Jul 14, 2020 at 9:12 AM Robert Haas  wrote:
> A number of EDB customers have had this error crop on their tables for
> reasons that we have usually not been able to determine. In many

Do you happen to know if they ever used the
snapshot-too-old feature?

Re: Improving connection scalability: GetSnapshotData()

2020-07-29 Thread Thomas Munro

On Wed, Jul 29, 2020 at 6:15 PM Thomas Munro  wrote:
> +static inline FullTransactionId
> +FullXidViaRelative(FullTransactionId rel, TransactionId xid)
>
> I'm struggling to find a better word for this than "relative".

The best I've got is "anchor" xid.  It is an xid that is known to
limit nextFullXid's range while the receiving function runs.

64 matches

Mail list logo