Re: Strange coding in _mdfd_openseg()

2019-04-04 Thread Thomas Munro
On Thu, Apr 4, 2019 at 4:16 PM Kyotaro HORIGUCHI
 wrote:
> At Wed, 3 Apr 2019 13:47:46 -0700, Andres Freund  wrote 
> in <20190403204746.2yumq7c2mirmo...@alap3.anarazel.de>
> > Yea, I totally agree it's weird. I'm not sure if I'd go for an assertion
> > of equality, or just invert the >= (which I agree I probably just
> > screwed up and didn't notice when reviewing the patch because it looked
> > close enough to correct and it didn't have a measurable effect).
>
> I looked there and agreed. _mdfd_openseg is always called just to
> add one new segment after the last opened segment by the two
> callers. So I think == is better.

Thanks.  Some other little things I noticed while poking around in this area:

Callers of _mdgetseg(EXTENSION_RETURN_NULL) expect errno to be set if
it returns NULL, and it expects the same of
mdopen(EXTERNSION_RETURN_NULL), and yet the latter does:

fd = PathNameOpenFile(path, O_RDWR | PG_BINARY);

if (fd < 0)
{
if ((behavior & EXTENSION_RETURN_NULL) &&
FILE_POSSIBLY_DELETED(errno))
{
pfree(path);
return NULL;
}

1.  I guess that needs save_errno treatment to protect it from being
clobbered by pfree()?
2.  It'd be nice if function documentation explicitly said which
functions set errno on error (and perhaps which syscalls).
3.  Why does some code use "file < 0" and other code "file <= 0" to
detect errors from fd.c functions that return File?

-- 
Thomas Munro
https://enterprisedb.com




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-05 08:38:34 +0300, Darafei "Komяpa" Praliaskouski wrote:
> On Fri, Apr 5, 2019 at 6:58 AM Tom Lane  wrote:
> 
> > Andres Freund  writes:
> > > I think the right approach would be to do all of this in heap_insert and
> > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> > > is specified, remember whether it is either currently empty, or is
> > > already marked as all-visible. If previously empty, mark it as all
> > > visible at the end. If already all visible, there's no need to change
> > > that. If not yet all-visible, no need to do anything, since it can't
> > > have been inserted with COPY FREEZE.  Do you see any problem doing it
> > > that way?
> >
> > Do we want to add overhead to these hot-spot routines for this purpose?
> >
> 
> Sizing the overhead: workflows right now don't end with COPY FREEZE - you
> need another VACUUM to set maps.
> Anything that lets you skip that VACUUM (and faster than that VACUUM
> itself) is helping. You specifically asked for it to be skippable with
> FREEZE anyway.

Tom's point was that the routines I was suggesting to adapt above aren't
just used for COPY FREEZE.

Greetings,

Andres Freund




Re: GiST VACUUM

2019-04-04 Thread Andrey Borodin
Hi!

> 4 апр. 2019 г., в 20:15, Heikki Linnakangas  написал(а):
> 
> On 25/03/2019 15:20, Heikki Linnakangas wrote:
>> On 24/03/2019 18:50, Andrey Borodin wrote:
>>> I was working on new version of gist check in amcheck and understand one 
>>> more thing:
>>> 
>>> /* Can this page be recycled yet? */
>>> bool
>>> gistPageRecyclable(Page page)
>>> {
>>>  return PageIsNew(page) ||
>>>  (GistPageIsDeleted(page) &&
>>>   TransactionIdPrecedes(GistPageGetDeleteXid(page), 
>>> RecentGlobalXmin));
>>> }
>>> 
>>> Here RecentGlobalXmin can wraparound and page will become unrecyclable for 
>>> half of xid cycle. Can we prevent it by resetting PageDeleteXid to 
>>> InvalidTransactionId before doing RecordFreeIndexPage()?
>>> (Seems like same applies to GIN...)
>> True, and B-tree has the same issue. I thought I saw a comment somewhere
>> in the B-tree code about that earlier, but now I can't find it. I
>> must've imagined it.
>> We could reset it, but that would require dirtying the page. That would
>> be just extra I/O overhead, if the page gets reused before XID
>> wraparound. We could avoid that if we stored the full XID+epoch, not
>> just XID. I think we should do that in GiST, at least, where this is
>> new. In the B-tree, it would require some extra code to deal with
>> backwards-compatibility, but maybe it would be worth it even there.
> 
> I suggest that we do the attached. It fixes this for GiST. The patch changes 
> expands the "deletion XID" to 64-bits, and changes where it's stored. Instead 
> of storing it pd_prune_xid, it's stored in the page contents. Luckily, a 
> deleted page has no real content.

So, we store full xid right after page header?
+static inline void
+GistPageSetDeleteXid(Page page, FullTransactionId deletexid)
+{
+   Assert(PageIsEmpty(page));
+   ((PageHeader) page)->pd_lower = MAXALIGN(SizeOfPageHeaderData) + 
sizeof(FullTransactionId);
+
+   *((FullTransactionId *) PageGetContents(page)) = deletexid;
+}

Usually we leave one ItemId (located at invalid offset number) untouched. I do 
not know is it done for a reason or not


Also, I did not understand this optimization:
+   /*
+* We can skip this if the page was deleted so long ago, that no scan 
can possibly
+* still see it, even in a standby. One measure might be anything older 
than the
+* table's frozen-xid, but we don't have that at hand here. But 
anything older than
+* 2 billion, from the next XID, is surely old enough, because you 
would hit XID
+* wraparound at that point.
+*/
+   nextxid = ReadNextFullTransactionId();
+   diff = U64FromFullTransactionId(nextxid) - 
U64FromFullTransactionId(latestRemovedXid);
+   if (diff < 0x7fff)
+   return;

Standby can be lagging months from primary, and, theoretically, close the gap 
in one sudden WAL leap... Also, I think, that comparison sign should be >, not 
<.


Best regards, Andrey Borodin.



Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Komяpa
On Fri, Apr 5, 2019 at 6:58 AM Tom Lane  wrote:

> Andres Freund  writes:
> > I think the right approach would be to do all of this in heap_insert and
> > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> > is specified, remember whether it is either currently empty, or is
> > already marked as all-visible. If previously empty, mark it as all
> > visible at the end. If already all visible, there's no need to change
> > that. If not yet all-visible, no need to do anything, since it can't
> > have been inserted with COPY FREEZE.  Do you see any problem doing it
> > that way?
>
> Do we want to add overhead to these hot-spot routines for this purpose?
>

Sizing the overhead: workflows right now don't end with COPY FREEZE - you
need another VACUUM to set maps.
Anything that lets you skip that VACUUM (and faster than that VACUUM
itself) is helping. You specifically asked for it to be skippable with
FREEZE anyway.


-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Why does "toast_tuple_target" allow values below TOAST_TUPLE_TARGET?

2019-04-04 Thread David Rowley
On Fri, 5 Apr 2019 at 18:25, Amit Langote  wrote:
>
> On 2019/04/05 14:09, David Rowley wrote:
> > Or rather, why does the reloption allow values below the compile-time 
> > constant?
>
> Maybe there is already a discussion in progress on the topic?
>
> * Caveats from reloption toast_tuple_target *
> https://www.postgresql.org/message-id/flat/CABOikdMt%3DmOtzW_ax_8pa9syEPo5Lji%3DLJrN2dunht8K-SLWzg%40mail.gmail.com

Doh. Okay, thanks. I'll drop this thread then. Seems what I wanted to
ask and say has been asked and said already.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Why does "toast_tuple_target" allow values below TOAST_TUPLE_TARGET?

2019-04-04 Thread Amit Langote
On 2019/04/05 14:09, David Rowley wrote:
> Or rather, why does the reloption allow values below the compile-time 
> constant?

Maybe there is already a discussion in progress on the topic?

* Caveats from reloption toast_tuple_target *
https://www.postgresql.org/message-id/flat/CABOikdMt%3DmOtzW_ax_8pa9syEPo5Lji%3DLJrN2dunht8K-SLWzg%40mail.gmail.com

Thanks,
Amit





Re: [GSoC] application ideas

2019-04-04 Thread Andrey Borodin
Hi!

We are discussing GSoC details offlist, but I'll put some recommendations on 
your proposal to the list.

> 3 апр. 2019 г., в 2:53, pantilimonov misha  написал(а):
> 
> Andrey, thank you for your reply.
> 
>> 24.03.2019, 12:12, "Andrey Borodin" :
>> 
>> I like the idea of more efficient BufferTag->Page data structure. I'm not 
>> sure cache locality is a real problem there, but I believe this idea 
>> deserves giving it a shot.
>> I'd happily review your proposal and co-mentor project, if it will be chosen 
>> for GSoC.
> 
> Here it is:
> 
> https://docs.google.com/document/d/1HmhOs07zE8Q1TX1pOdtjxHSjjAUjaO2tp9NSmry8muY/edit?usp=sharing

While your project is clearly research-oriented, it is planned within 
PostgreSQL development process. Can you please add some information about which 
patches are your going to put on commitfest?
Also, please plan to review one or more patches. This is important for 
integrating into community.

BTW, there is somewhat related IntegerSet data structure added recently [0]. In 
my version it was implemented as radix tree. I think it is a good example how 
generic data structure can be presented and then reused by BufferManager.

Best regards, Andrey Borodin.

[0] 
https://github.com/postgres/postgres/commit/df816f6ad532ad685a3897869a2e64d3a53fe312



Why does "toast_tuple_target" allow values below TOAST_TUPLE_TARGET?

2019-04-04 Thread David Rowley
Or rather, why does the reloption allow values below the compile-time constant?

It looks like we currently allow values as low as 128.  The problem
there is that heap_update() and heap_prepare_insert() both only bother
calling toast_insert_or_update() when the tuple's length is above
TOAST_TUPLE_TARGET, so it seems to have no effect when set to a lower
value.

I don't think we can change heap_update() and heap_prepare_insert() to
do "tup->t_len > RelationGetToastTupleTarget(relation,
TOAST_TUPLE_TARGET)" instead as such a table might not even have a
TOAST relation since needs_toast_table() will return false if it
thinks the tuple length can't be above TOAST_TUPLE_TARGET.

It does not seem possible to add/remote the toast table when the
reloption is changed either as we're only obtaining a
ShareUpdateExclusiveLock to set it. We'd likely need to upgrade that
to an AccessExclusiveLock to do that.

I understand from reading [1] that Simon was mostly interested in
keeping values inline for longer. I saw no mention of moving them out
of line sooner.

[1] 
https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2ya00o...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: [HACKERS] Block level parallel vacuum

2019-04-04 Thread Masahiko Sawada
On Fri, Apr 5, 2019 at 4:51 AM Robert Haas  wrote:
>
> On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada  wrote:
> > These patches conflict with the current HEAD. Attached the updated patches.
>
> They'll need another rebase.
>

Thank you for the notice. Rebased.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From 87061bbc5b0c2d7c47b820ed97e6d738fbd1781a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Thu, 4 Apr 2019 11:42:25 +0900
Subject: [PATCH v23 1/2] Add parallel option to VACUUM command

This change adds PARALLEL option to VACUUM command that enable us to
perform index vacuuming and index cleanup with parallel
workers. Indivisual indexes are processed by one vacuum
process. Therefore parallel vacuum can be used when the table has at
least two indexes and we cannot specify larger parallel degree than
the number of indexes that the table has.

The parallel degree is either specified by user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers. The table size and index size don't
affect it.
---
 doc/src/sgml/config.sgml  |  14 +-
 doc/src/sgml/ref/vacuum.sgml  |  31 ++
 src/backend/access/heap/vacuumlazy.c  | 890 ++
 src/backend/access/transam/parallel.c |   4 +
 src/backend/commands/vacuum.c |  27 ++
 src/backend/parser/gram.y |   1 +
 src/backend/postmaster/autovacuum.c   |   1 +
 src/bin/psql/tab-complete.c   |   2 +-
 src/include/access/heapam.h   |   2 +
 src/include/commands/vacuum.h |   5 +
 src/test/regress/expected/vacuum.out  |  12 +-
 src/test/regress/sql/vacuum.sql   |   6 +
 12 files changed, 889 insertions(+), 106 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bc1d0f7..0b65d9b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2226,13 +2226,13 @@ include_dir 'conf.d'

 
  Sets the maximum number of parallel workers that can be
- started by a single utility command.  Currently, the only
- parallel utility command that supports the use of parallel
- workers is CREATE INDEX, and only when
- building a B-tree index.  Parallel workers are taken from the
- pool of processes established by , limited by .  Note that the requested
+ started by a single utility command.  Currently, the parallel
+ utility commands that support the use of parallel workers are
+ CREATE INDEX only when building a B-tree index,
+ and VACUUM without FULL
+ option. Parallel workers are taken from the pool of processes
+ established by , limited
+ by .  Note that the requested
  number of workers may not actually be available at run time.
  If this occurs, the utility operation will run with fewer
  workers than expected.  The default value is 2.  Setting this
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index fdd8151..a0dd997 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -33,6 +33,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean ]
 SKIP_LOCKED [ boolean ]
 INDEX_CLEANUP [ boolean ]
+PARALLEL [ integer ]
 
 and table_and_columns is:
 
@@ -144,6 +145,22 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ integer background
+  workers (for the detail of each vacuum phases, please refer to
+  ). Only one worker can be used per index. So
+  parallel workers are launched only when there are at least 2
+  indexes in the table. Workers for vacuum launches before starting each phases
+  and exit at the end of the phase. These behaviors might change in a future release.
+  This option can not use with FULL option.
+ 
+
+   
+
+   
 DISABLE_PAGE_SKIPPING
 
  
@@ -219,6 +236,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ integer
+
+ 
+  Specifies parallel degree for PARALLEL option. The
+  value must be at least 1. If the parallel degree
+  integer is omitted, then
+  VACUUM decides the number of workers based on number of
+  indexes on the relation which further limited by
+  .
+ 
+
+   
+
+   
 table_name
 
  
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c9d8312..ae077ab 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -22,6 +22,20 @@
  * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * Lazy vacuum supports parallel execution with parallel worker processes. In
+ * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
+ * parallel worker processes. Individual indexes is 

RE: Timeout parameters

2019-04-04 Thread Jamison, Kirk
On Thursday, April 4, 2019 5:20PM (GMT+9), Ryohei Nagaura wrote:

> > From: Fabien COELHO 
> > * About socket_timeout v12 patch, I'm not sure there is a consensus.
> I think so too. I just made the patch being able to be committed anytime.
> 
> Finally, I rebased all the patches because they have come not to be applied 
> to the updated PG.

I just checked and confirmed that the TCP USER TIMEOUT patch set v20 works.
Although you should capitalize "linux" to "Linux" as already mentioned before.
The committer can also just fix that very minor part, if patch is deemed 
committable.

Note to committer: The "Ready for Committer" status is mainly intended for
tcp user timeout parameter.

OTOH, unless there is consensus with the socket_timeout,
for now the socket_timeout patch status still remains as "Needs Review".

Regards,
Kirk Jamison





Re: Caveats from reloption toast_tuple_target

2019-04-04 Thread Pavan Deolasee
On Thu, Apr 4, 2019 at 11:36 AM Michael Paquier  wrote:

>
>
> I mean that toast_tuple_target is broken as-is, because it should be
> used on the new tuples of a relation as a threshold to decide if this
> tuple should be toasted or not, but we don't actually use the
> reloption value for that decision-making: the default threshold
> TOAST_TUPLE_THRESHOLD gets used instead all the time!  The code does
> not even create a toast table in some cases.
>

I think the problem with the existing code is that while it allows users to
set toast_tuple_target to be less than TOAST_TUPLE_THRESHOLD, the same is
not honoured while deciding whether to toast a row or not. AFAICS it works
ok when toast_tuple_target is greater than or equal to
TOAST_TUPLE_THRESHOLD i.e. it won't toast the rows unless they are larger
than toast_tuple_target.

IMV it makes sense to simply cap the lower limit of toast_tuple_target to
the compile time default and update docs to reflect that. Otherwise, we
need to deal with the possibility of dynamically creating the toast table
if the relation option is lowered after creating the table. Your proposed
patch handles the case when the toast_tuple_target is specified at CREATE
TABLE, but we would still have problem with ALTER TABLE, no? But there
might be side effects of changing the lower limit for pg_dump/pg_restore.
So we would need to think about that too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-04 23:57:58 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I think the right approach would be to do all of this in heap_insert and
> > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> > is specified, remember whether it is either currently empty, or is
> > already marked as all-visible. If previously empty, mark it as all
> > visible at the end. If already all visible, there's no need to change
> > that. If not yet all-visible, no need to do anything, since it can't
> > have been inserted with COPY FREEZE.  Do you see any problem doing it
> > that way?
> 
> Do we want to add overhead to these hot-spot routines for this purpose?

For heap_multi_insert I can't see it being a problem - it's only used
from copy.c, and the cost should be "smeared" over many tuples.  I'd
assume that compared with locking a page, WAL logging, etc, it'd not
even meaningfully show up for heap_insert. Especially because we already
have codepaths for options & HEAP_INSERT_FROZEN in
heap_prepare_insert(), and I'd assume those could be combined.

I think we should measure it, but I don't think that one or two
additional, very well predictd, branches are going to be measurable in
in those routines.

The patch, as implemented, has modifications in
RelationGetBufferForTuple(), that seems like they'd be more expensive.

Greetings,

Andres Freund




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-05 09:20:36 +0530, Pavan Deolasee wrote:
> On Fri, Apr 5, 2019 at 9:05 AM Andres Freund  wrote:
> > I think the right approach would be to do all of this in heap_insert and
> > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> > is specified, remember whether it is either currently empty, or is
> > already marked as all-visible. If previously empty, mark it as all
> > visible at the end. If already all visible, there's no need to change
> > that. If not yet all-visible, no need to do anything, since it can't
> > have been inserted with COPY FREEZE.
> 
> 
> We're doing roughly the same. If we are running INSERT_FROZEN, whenever
> we're about to switch to a new page, we check if the previous page should
> be marked all-frozen and do it that way. The special code in copy.c is
> necessary to take care of the last page which we don't get to handle in the
> regular code path.

Well, it's not the same, because you need extra code from copy.c, extra
lock cycles, and extra WAL logging.


> Or are you suggesting that we don't even rescan the page for all-frozen
> tuples at the end and just simply mark it all-frozen at the start, when the
> first tuple is inserted and then don't touch the PD_ALL_VISIBLE/visibility
> map bit as we go on inserting more tuples in the page?

Correct. If done right that should be cheaper (no extra scans, less WAL
logging), without requiring some new dispatch logic from copy.c.


> Anyways, if major architectural changes are required then it's probably too
> late to consider this for PG12, even though it's more of a bug fix and a
> candidate for back-patching too.

Let's just see how bad it looks? I don't feel like we'd need to be super
strict about it. If it looks simple enough I'd e.g. be ok to merge this
soon after freeze, and backpatch around maybe 12.1 or such.

Greetings,

Andres Freund




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Tom Lane
Andres Freund  writes:
> I think the right approach would be to do all of this in heap_insert and
> heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> is specified, remember whether it is either currently empty, or is
> already marked as all-visible. If previously empty, mark it as all
> visible at the end. If already all visible, there's no need to change
> that. If not yet all-visible, no need to do anything, since it can't
> have been inserted with COPY FREEZE.  Do you see any problem doing it
> that way?

Do we want to add overhead to these hot-spot routines for this purpose?

regards, tom lane




Re: fix the spelling mistakes of comments

2019-04-04 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Apr 04, 2019 at 07:57:02AM -0400, Robert Haas wrote:
>> On Wed, Apr 3, 2019 at 6:55 AM Liu, Huailing  
>> wrote:
 * This module contains the code for waiting and release★ of backends.

>>> I think the word marked★ should be 'releasing'.

>> It could be changed, but I don't think it's really wrong as written.

> Indeed.

Well, the problem is the lack of grammatical agreement between "waiting"
and "release".  It's poor English at least.

regards, tom lane




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Pavan Deolasee
On Fri, Apr 5, 2019 at 8:37 AM Andres Freund  wrote:

> Hi,
>
> On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote:
>
> >
> > Hmm, isn't there already a critical section in visibilitymap_set itself?
>
> There is, but the proposed code sets all visible on the page, and marks
> the buffer dirty, before calling visibilitymap_set.
>

How's it any different than what we're doing at vacuumlazy.c:1322? We set
the page-level bit, mark the buffer dirty and then call
visibilitymap_set(), all outside a critical section.

1300 /* mark page all-visible, if appropriate */
1301 if (all_visible && !all_visible_according_to_vm)
1302 {
1303 uint8   flags = VISIBILITYMAP_ALL_VISIBLE;
1304
1305 if (all_frozen)
1306 flags |= VISIBILITYMAP_ALL_FROZEN;
1307
1308 /*
1309  * It should never be the case that the visibility map
page is set
1310  * while the page-level bit is clear, but the reverse is
allowed
1311  * (if checksums are not enabled).  Regardless, set the
both bits
1312  * so that we get back in sync.
1313  *
1314  * NB: If the heap page is all-visible but the VM bit is
not set,
1315  * we don't need to dirty the heap page.  However, if
checksums
1316  * are enabled, we do need to make sure that the heap page
is
1317  * dirtied before passing it to visibilitymap_set(),
because it
1318  * may be logged.  Given that this situation should only
happen in
1319  * rare cases after a crash, it is not worth optimizing.
1320  */
1321 PageSetAllVisible(page);
1322 MarkBufferDirty(buf);
1323 visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
1324   vmbuffer, visibility_cutoff_xid, flags);
1325 }

As the first para in that comment says, I thought it's ok for page-level
bit to be set but the visibility bit to be clear, but not the vice versa.
The proposed code does not introduce any  new behaviour AFAICS. But I might
be missing something.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Pavan Deolasee
Hi,

On Fri, Apr 5, 2019 at 9:05 AM Andres Freund  wrote:

>
>
> I think the right approach would be to do all of this in heap_insert and
> heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> is specified, remember whether it is either currently empty, or is
> already marked as all-visible. If previously empty, mark it as all
> visible at the end. If already all visible, there's no need to change
> that. If not yet all-visible, no need to do anything, since it can't
> have been inserted with COPY FREEZE.


We're doing roughly the same. If we are running INSERT_FROZEN, whenever
we're about to switch to a new page, we check if the previous page should
be marked all-frozen and do it that way. The special code in copy.c is
necessary to take care of the last page which we don't get to handle in the
regular code path.

Or are you suggesting that we don't even rescan the page for all-frozen
tuples at the end and just simply mark it all-frozen at the start, when the
first tuple is inserted and then don't touch the PD_ALL_VISIBLE/visibility
map bit as we go on inserting more tuples in the page?

Anyways, if major architectural changes are required then it's probably too
late to consider this for PG12, even though it's more of a bug fix and a
candidate for back-patching too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote:
> On 2019-Apr-04, Andres Freund wrote:
> > I still think this is the wrong architecture.
> 
> Hmm.

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE.  Do you see any problem doing it
that way?

Greetings,

Andres Freund




Re: speeding up planning with partitions

2019-04-04 Thread Amit Langote
On 2019/04/05 12:18, David Rowley wrote:
> On Fri, 5 Apr 2019 at 16:09, Amit Langote  
> wrote:
>> Although, we still have ways
>> to go in terms of scaling generic plan execution to larger partition
>> counts, solution(s) for which have been proposed by David but haven't made
>> it into master yet.
> 
> Is that a reference to the last paragraph in [1]?  That idea has not
> gone beyond me writing that text yet! :-(   It was more of a passing
> comment on the only way I could think of to solve the problem.
> 
> [1] 
> https://www.postgresql.org/message-id/CAKJS1f-y1HQK+VjG7=C==vGcLnzxjN8ysD5NmaN8Wh4=vsy...@mail.gmail.com

Actually, I meant to refer to the following:

https://commitfest.postgresql.org/22/1897/

Of course, we should pursue all available options. :)

Thanks,
Amit





Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Thomas Munro
On Thu, Apr 4, 2019 at 11:47 PM Thomas Munro  wrote:
> Pushed.  Thanks for all the work on this!

I managed to break this today while testing with RELSEG_SIZE set to 1
block (= squillions of 8kb files).  The problem is incorrect arguments
to _mdfd_getseg(), in code added recently by me.  Without the
EXTENSION_DONT_CHECK_SIZE flag, it refuses to open segments following
segments that have been truncated, leading to a checkpointer fsync
panic.  It's also passing segnum where a blocknum is wanted.  It
should have used exactly the same arguments as in the old code, but
didn't.  I will push a fix shortly.

-- 
Thomas Munro
https://enterprisedb.com




Re: speeding up planning with partitions

2019-04-04 Thread David Rowley
On Fri, 5 Apr 2019 at 16:09, Amit Langote  wrote:
> Although, we still have ways
> to go in terms of scaling generic plan execution to larger partition
> counts, solution(s) for which have been proposed by David but haven't made
> it into master yet.

Is that a reference to the last paragraph in [1]?  That idea has not
gone beyond me writing that text yet! :-(   It was more of a passing
comment on the only way I could think of to solve the problem.

[1] 
https://www.postgresql.org/message-id/CAKJS1f-y1HQK+VjG7=C==vGcLnzxjN8ysD5NmaN8Wh4=vsy...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




RE: Problem during Windows service start

2019-04-04 Thread Higuchi, Daisuke
Hi, 

Thank you for picking up this and I'm sorry for delay reply. 

>> If wait_for_postmaster returns POSTMASTER_STILL_STARTING will it 
>> be correct to set the status of windows service to SERVICE_START_PENDING ?

Yes, I think this is the best. Currently, I do not have good solution to change 
the status from SERVICE_START_PENDING to SERVICE_RUNNING after recovery is 
completed. 

>> I would like to take this up if no one is working on this.

Thank you for your proposing! I would like to ask if possible. 
I do not have much time to think about this topic now...

Regards, 
Higuchi 



Re: speeding up planning with partitions

2019-04-04 Thread Amit Langote
On 2019/04/05 6:59, David Rowley wrote:
> On Fri, 5 Apr 2019 at 07:33, Floris Van Nee  wrote:
>> I had a question about the performance of pruning of functions like now() 
>> and current_date. I know these are handled differently, as they cannot be 
>> excluded during the first phases of planning. However, curerntly, this new 
>> patch makes the performance difference between the static timestamp variant 
>> and now() very obvious (even more than before). Consider
>> select * from partitioned_table where ts >= now()
>> or
>> select * from partitioned_table where ts >= '2019-04-04'
>>
>> The second plans in less than a millisecond, whereas the first takes +- 
>> 180ms for a table with 1000 partitions. Both end up with the same plan.
> 
> The patch here only aims to improve the performance of queries to
> partitioned tables when partitions can be pruned during planning. The
> now() version of the query is unable to do that since we don't know
> what that value will be during the execution of the query. In that
> version, you're most likely seeing "Subplans Removed: ". This means
> run-time pruning did some pruning and the planner generated subplans
> for what you see plus  others. Since planning for all partitions is
> still slow, you're getting a larger performance difference than
> before, but only due to the fact that the other plan is now faster to
> generate.

Yeah, the time for generating plan for a query that *can* use pruning but
not during planning is still very much dependent on the number of
partitions, because access plans must be created for all partitions, even
if only one of those plans will actually be used and the rest pruned away
during execution.

> If you're never using prepared statements,

Or if using prepared statements is an option, the huge planning cost
mentioned above need not be paid repeatedly.  Although, we still have ways
to go in terms of scaling generic plan execution to larger partition
counts, solution(s) for which have been proposed by David but haven't made
it into master yet.

Thanks,
Amit





Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote:
> On 2019-Apr-04, Andres Freund wrote:
> 
> > On 2019-04-04 12:23:08 -0700, Andres Freund wrote:
> > > Also, how is this code even close to correct?
> > > CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and
> > > there's no WAL logging? Without even a comment arguing why that's OK (I
> > > don't think it is)?
> > 
> > Peter Geoghegan just reminded me over IM that there's actually logging
> > inside log_heap_visible(), called from visibilitymap_set(). Still lacks
> > a critical section though.
> 
> Hmm, isn't there already a critical section in visibilitymap_set itself?

There is, but the proposed code sets all visible on the page, and marks
the buffer dirty, before calling visibilitymap_set.

Greetings,

Andres Freund




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> Kerberos tests are now failing for me (macOS).  I'm seeing
> 
> psql: error: could not connect to server: Over-size error packet sent by
> the server.
> not ok 3 - GSS encryption without auth
> 
> #   Failed test 'GSS encryption without auth'
> #   at t/002_enc.pl line 170.
> #  got: '2'
> # expected: '0'
> 
> (and repeated for several other tests).

Alright, that over-size error was a bug in the error-handling code,
which I've just pushed a fix for.  That said...

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 2019-04-04 17:35, Stephen Frost wrote:
> > Ok, it looks like there's a server-side error happening here, and it
> > would be good to see what that is, so can you send the server logs?
> 
> These errors appear several times in the server logs:
> 
> FATAL:  GSSAPI context error
> DETAIL:   Miscellaneous failure (see text): Decrypt integrity check
> failed for checksum type hmac-sha1-96-aes256, key type
> aes256-cts-hmac-sha1-96
> 
> FATAL:  accepting GSS security context failed
> DETAIL:   Miscellaneous failure (see text): Decrypt integrity check
> failed for checksum type hmac-sha1-96-aes256, key type
> aes256-cts-hmac-sha1-96

This looks like it's a real issue and it's unclear what's going on here.
I wonder- are you certain that you're using all the same Kerberos
libraries for the KDC, the server, and psql?

If you go back to before the GSSAPI encryption patch, does it work..?

I've certainly seen interesting issues on MacOS, in particular, due to
different Kerberos libraries/tools being installed and I wonder if
that's what is going on here.  Maybe you could check klist vs. psql wrt
what libraries are linked in?

Thanks,

Stephen


signature.asc
Description: PGP signature


RE: libpq debug log

2019-04-04 Thread Iwata, Aya
Hi, 

> The basic idea being:
> 
> - Each line is a whole message.
> - The line begins with <<< for a message received and >>> for a message sent.
> - Strings in single quotes are those sent/received as a fixed number of bytes.
> - Strings in double quotes are those sent/received as a string.
> - 4-byte integers are printed unadorned.
> - 2-byte integers are prefixed by #.
> - I guess 1-byte integers would need some other prefix, maybe @ or ##.

I created v1 patch to improve PQtrace(); output log message in one line. 
Please find my attached patch.


Log output examples;

In current PQtrace log:

To backend> Msg Q   

To backend> "SELECT pg_catalog.set_config('search_path', '', false)"
To backend> Msg complete, length 60

I changed like this:

2019-04-04 02:39:51.488 UTC  > Query 59 "SELECT 
pg_catalog.set_config('search_path', '', false)" 

In current PQtrace log:

>From backend> T
>From backend (#4)> 35
>From backend (#2)> 1
>From backend> "set_config"
>From backend (#4)> 0
>From backend (#2)> 0
>From backend (#4)> 25
>From backend (#2)> 65535
>From backend (#4)> -1
>From backend (#2)> 0

I changed like this:

2019-04-04 02:39:51.489 UTC  < RowDescription 35 #1 "set_config" 0 #0 25 #65535 
-1 #0


> I would like to add timestamp per line and when command processing function
> start/end.
> I think it is useful to know the application process start/end for diagnosis.
> So I will implement like this;
> 
> 2019-03-03 07:24:54.142 UTC   PQgetResult start
> 2019-03-03 07:24:54.143 UTC   < 'T' 35 1 "set_config" 0 #0 25 #65535 -1 #0
> 2019-03-03 07:24:54.144 UTC   PQgetResult end
I would like to add this in next patch if there are not any disagreement.

Regards,
Aya Iwata


v1-0001-libpq-PQtrace-output_one_line.patch
Description: v1-0001-libpq-PQtrace-output_one_line.patch


RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-04-04 Thread Imai, Yoshikazu
On Fri, Apr 5, 2019 at 0:05 AM, Tsunakawa, Takayuki wrote:
> From: Peter Eisentraut [mailto:peter.eisentr...@2ndquadrant.com]
> > I can't detect any performance improvement with the patch applied to
> > current master, using the test case from Yoshikazu Imai (2019-03-19).
> 
> That's strange...  Peter, Imai-san, can you compare your test procedures?

Just for make sure, I described my test procedures in detail.

I install and setup HEAD and patched as follows.

[HEAD(413ccaa)]
(git pull)
./configure --prefix=/usr/local/pgsql-dev --enable-depend
make clean
make

make install

su postgres
export PATH=/usr/local/pgsql-dev/bin:$PATH
initdb -D /var/lib/pgsql/data-dev
vi /var/lib/pgsql/data-dev/postgresql.conf

port = 44201
plan_cache_mode = 'auto' or 'force_custom_plan'
max_parallel_workers = 0
max_parallel_workers_per_gather = 0
max_locks_per_transaction = 4096

pg_ctl -D /var/lib/pgsql/data-dev start



[HEAD(413ccaa) + patch]
(git pull)
patch -u -p1 < 0002.patch
./configure --prefix=/usr/local/pgsql-locallock --enable-depend
make clean
make

make install

su postgres
export PATH=/usr/local/pgsql-locallock/bin:$PATH
initdb -D /var/lib/pgsql/data-locallock
vi /var/lib/pgsql/data-locallock/postgresql.conf

port = 44301
plan_cache_mode = 'auto' or 'force_custom_plan'
max_parallel_workers = 0
max_parallel_workers_per_gather = 0
max_locks_per_transaction = 4096

pg_ctl -D /var/lib/pgsql/data-locallock start


And I tested as follows.

(creating partitioned table for port 44201)
(creating partitioned table for port 44301)
(creating select4096.sql)
for i in `seq 1 5`; do
  pgbench -n -f select4096.sql -T 60 -M prepared -p 44201 | grep including;
  pgbench -n -f select4096.sql -T 60 -M prepared -p 44301 | grep including;
done
tps = 8146.039546 (including connections establishing)
tps = 9021.340872 (including connections establishing)
tps = 8011.186017 (including connections establishing)
tps = 8926.191054 (including connections establishing)
tps = 8006.769690 (including connections establishing)
tps = 9028.716806 (including connections establishing)
tps = 8057.709961 (including connections establishing)
tps = 9017.713714 (including connections establishing)
tps = 7956.332863 (including connections establishing)
tps = 9126.650533 (including connections establishing)


Thanks
--
Yoshikazu Imai


RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-04-04 Thread Tsunakawa, Takayuki
Hi Amit-san, Imai-snan,

From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> I was able to detect it as follows.
> plan_cache_mode = auto
> 
>HEAD: 1915 tps
> Patched: 2394 tps
> 
> plan_cache_mode = custom (non-problematic: generic plan is never created)
> 
>HEAD: 2402 tps
> Patched: 2393 tps

Thanks a lot for very quick confirmation.  I'm relieved to still see the good 
results.


Regards
Takayuki Tsunakawa



Re: fix the spelling mistakes of comments

2019-04-04 Thread Michael Paquier
On Thu, Apr 04, 2019 at 07:57:02AM -0400, Robert Haas wrote:
> On Wed, Apr 3, 2019 at 6:55 AM Liu, Huailing  
> wrote:
>> * This module contains the code for waiting and release★ of backends.
>> * All code in this module executes on the primary. The core streaming
>> * replication transport remains within WALreceiver/WALsender modules.
>>
>> I think the word marked★ should be 'releasing'.
> 
> It could be changed, but I don't think it's really wrong as written.

Indeed.

> > * Walk★ the specified queue from head.  Set the state of any backends that
> >
> > I think the word marked★ should be 'Wake'.
> 
> I think it's correct as written.

Yes, it's correct as-is.  The code *walks* through the shmem queue to
*wake* some of its elements.
--
Michael


signature.asc
Description: PGP signature


Re: pg_rewind vs superuser

2019-04-04 Thread Michael Paquier
On Thu, Apr 04, 2019 at 01:19:44PM +0200, Magnus Hagander wrote:
> All of it, or just the checkpoint part? I assume just the checkpoint part?
> AFAIK it does require superuser in those earlier versions?

I meant of course the checkpoint part down to 9.5, and the rest down
to 11, so done this way.
--
Michael


signature.asc
Description: PGP signature


RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-04-04 Thread Imai, Yoshikazu
On Fri, Apr 5, 2019 at 1:31 AM, Amit Langote wrote:
> On 2019/04/05 5:42, Peter Eisentraut wrote:
> > On 2019-04-04 06:58, Amit Langote wrote:
> >> Also, since the "speed up partition planning" patch went in
> >> (428b260f8), it might be possible to see the performance boost even
> >> with the partitioning example you cited upthread.
> >
> > I can't detect any performance improvement with the patch applied to
> > current master, using the test case from Yoshikazu Imai (2019-03-19).
> 
> I was able to detect it as follows.
> 
> * partitioned table setup:
> 
> $ cat ht.sql
> drop table ht cascade;
> create table ht (a int primary key, b int, c int) partition by hash (a);
> select 'create table ht' || x::text || ' partition of ht for values with
> (modulus 8192, remainder ' || (x)::text || ');' from generate_series(0,
> 8191) x;
> \gexec
> 
> * pgbench script:
> 
> $ cat select.sql
> \set param random(1, 8192)
> select * from ht where a = :param
> 
> * pgbench (5 minute run with -M prepared)
> 
> pgbench -n -M prepared -T 300 -f select.sql
> 
> * tps:
> 
> plan_cache_mode = auto
> 
>HEAD: 1915 tps
> Patched: 2394 tps
> 
> plan_cache_mode = custom (non-problematic: generic plan is never created)
> 
>HEAD: 2402 tps
> Patched: 2393 tps

Amit-san, thanks for testing this.

I also re-ran my tests(3/19) with HEAD(413ccaa) and HEAD(413ccaa) + patched, 
and I can still detect the performance difference with plan_cache_mode = auto.

Thanks
--
Yoshikazu Imai 



Re: Server Crash due to assertion failure in _bt_check_unique()

2019-04-04 Thread Ashutosh Sharma
On Thu, Apr 4, 2019 at 10:12 PM Peter Geoghegan  wrote:

> On Thu, Apr 4, 2019 at 4:06 AM Ashutosh Sharma 
> wrote:
> > Attached is the patch with above changes. Please let me know if my
> understanding is wrong. Thanks.
>
> You have it right. This bug slipped in towards the end of development,
> when the insertstate struct was introduced.
>
> I have pushed something very close to the patch you posted. (I added
> an additional assertion, and tweaked the comments.)
>
>
Thanks Peter :)


> Thanks for the report and patch!
> --
> Peter Geoghegan
>

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-04-04 Thread Amit Langote
On 2019/04/05 5:42, Peter Eisentraut wrote:
> On 2019-04-04 06:58, Amit Langote wrote:
>> Also, since the "speed up partition planning" patch went in (428b260f8),
>> it might be possible to see the performance boost even with the
>> partitioning example you cited upthread.
> 
> I can't detect any performance improvement with the patch applied to
> current master, using the test case from Yoshikazu Imai (2019-03-19).

I was able to detect it as follows.

* partitioned table setup:

$ cat ht.sql
drop table ht cascade;
create table ht (a int primary key, b int, c int) partition by hash (a);
select 'create table ht' || x::text || ' partition of ht for values with
(modulus 8192, remainder ' || (x)::text || ');' from generate_series(0,
8191) x;
\gexec

* pgbench script:

$ cat select.sql
\set param random(1, 8192)
select * from ht where a = :param

* pgbench (5 minute run with -M prepared)

pgbench -n -M prepared -T 300 -f select.sql

* tps:

plan_cache_mode = auto

   HEAD: 1915 tps
Patched: 2394 tps

plan_cache_mode = custom (non-problematic: generic plan is never created)

   HEAD: 2402 tps
Patched: 2393 tps

Thanks,
Amit





Re: New vacuum option to do only freezing

2019-04-04 Thread Masahiko Sawada
On Fri, Apr 5, 2019 at 4:06 AM Robert Haas  wrote:
>
> On Wed, Apr 3, 2019 at 10:32 PM Masahiko Sawada  wrote:
> > Attached the updated version patch.
>
> Committed with a little bit of documentation tweaking.
>

Thank you for committing them!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-04-04 Thread Masahiko Sawada
On Thu, Apr 4, 2019 at 10:07 PM Julien Rouhaud  wrote:
>
> On Thu, Apr 4, 2019 at 1:23 PM Masahiko Sawada  wrote:
> >
> > On Thu, Apr 4, 2019 at 1:26 PM Tsunakawa, Takayuki
> >  wrote:
> > >
> > > From: Fujii Masao [mailto:masao.fu...@gmail.com]
> > > > reloption for TOAST is also required?
> > >
> > > # I've come back to the office earlier than planned...
> > >
> > > Hm, there's no reason to not provide toast.vacuum_shrink_enabled.  Done 
> > > with the attached patch.
> > >
> >
> > Thank you for updating the patch!
>
> +1!
>
> > +vacuum_shrink_enabled,
> > toast.vacuum_shrink_enabled
> > (boolean)
> > +
> > + 
> > + Enables or disables shrinking the table when it's vacuumed.
> > + This also applies to autovacuum.
> > + The default is true.  If true, VACUUM frees empty pages at the
> > end of the table.
> >
> > "VACUUM" needs  or "vacuum" is more appropriate here?
>
> also, the documentation should point out that freeing is not
> guaranteed.  Something like
>
>  + The default is true.  If true, VACUUM will try to free empty
> pages at the end of the table.

+1

>
> > I'm not sure the consensus we got here but we don't make the vacuum
> > command option for this?
>
> I don't think here's a clear consensus, but my personal vote is to add
> it, with  SHRINK_TABLE = [ force_on | force_off | default ] (unless a
> better proposal has been made already)

As INDEX_CLEANUP option has been added by commit a96c41f, the new
option for this feature could also accept zero or one boolean
argument, that is SHRINK_TABLE [true|false] and true by default.
Explicit options on VACUUM command overwrite options set by
reloptions. And if the boolean argument is omitted the option depends
on the reloptions.

FWIW,  I also would like to defer to committer on the naming new
option but an another possible comment on that could be that the term
'truncate' might be more suitable rather than 'shrink' in the context
of lazy vacuum. As Tsunakawa-san mentioned the term 'shrink' is used
in PostgreSQL documentation but we use it mostly in the context of
VACUUM FULL. I found two paragraphs that use the term 'shrink'.

vacuum.sgml:
   
The FULL option is not recommended for routine use,
but might be useful in special cases.  An example is when you have deleted
or updated most of the rows in a table and would like the table to
physically shrink to occupy less disk space and allow faster table
scans. VACUUM FULL will usually shrink the table
more than a plain VACUUM would.
   

maintenance.sgml
Although VACUUM FULL can be used to shrink a table back
to its minimum size and return the disk space to the operating system,
there is not much point in this if the table will just grow again in the
future.  Thus, moderately-frequent standard
VACUUM runs are a
better approach than infrequent VACUUM FULL runs for
maintaining heavily-updated tables.

On the other hand, we use the term 'truncate' in the progress
reporting of lazy vacuum (see documentation of
pg_stat_progress_vacuum). So I'm concerned that if we use the term
'shrink' users will think that this option prevents VACUUM FULL from
working.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




RE: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-04-04 Thread Tsunakawa, Takayuki
From: Masahiko Sawada [mailto:sawada.m...@gmail.com]
> "VACUUM" needs  or "vacuum" is more appropriate here?

Looking at the same file and some other files, "vacuum" looks appropriate 
because it represents the vacuum action, not the specific VACUUM command.


> The format of the documentation of new option is a bit weird. Could it
> be possible to adjust it around 80 characters per line like other
> description?

Ah, fixed.  It's easy to overlook the style with the screen reader software...  
I've been wondering if there are good settings for editing .sgml in Emacs that, 
for example, puts appropriate number of spaces at the beginning of each line 
when  is pressed, automatically break the long line and put spaces, etc.


From: Julien Rouhaud [mailto:rjuju...@gmail.com]
> also, the documentation should point out that freeing is not
> guaranteed.  Something like
> 
>  + The default is true.  If true, VACUUM will try to free empty
> pages at the end of the table.

That's nice.  Done.


> > I'm not sure the consensus we got here but we don't make the vacuum
> > command option for this?
> 
> I don't think here's a clear consensus, but my personal vote is to add
> it, with  SHRINK_TABLE = [ force_on | force_off | default ] (unless a
> better proposal has been made already)

IMO, which I mentioned earlier, I don't think the VACUUM option is necessary 
because:
(1) this is a table property which is determined based on the expected 
workload, not the one that people want to change per VACUUM operation
(2) if someone wants to change the behavior for a particular VACUUM operation, 
he can do it using ALTER TABLE SET.
Anyway, we can add the VACUUM option separately if we want it by all means.  I 
don't it to be the blocker for this feature to be included in PG 12, because 
the vacuum truncaton has been bothering us like others said in this and other 
threads...


Regards
Takayuki Tsunakawa



disable-vacuum-truncation_v6.patch
Description: disable-vacuum-truncation_v6.patch


Re: Protect syscache from bloating with negative cache entries

2019-04-04 Thread Kyotaro HORIGUCHI
Thank you for the comment.

At Thu, 4 Apr 2019 15:44:35 -0400, Robert Haas  wrote in 

> On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI
>  wrote:
> > So it seems to me that the simplest "Full" version wins. The
> > attached is rebsaed version. dlist_move_head(entry) is removed as
> > mentioned above in that patch.
> 
> 1. I really don't think this patch has any business changing the
> existing logic.  You can't just assume that the dlist_move_head()
> operation is unimportant for performance.

Ok, it doesn't show significant performance gain so removed that.

> 2. This patch still seems to add a new LRU list that has to be
> maintained.  That's fairly puzzling.  You seem to have concluded that
> the version without the additional LRU wins, but the sent a new copy
> of the version with the LRU version.

Sorry, I attached wrong one. The attached is the right one, which
doesn't adds the new dlist.

> 3. I don't think adding an additional call to GetCurrentTimestamp() in
> start_xact_command() is likely to be acceptable.  There has got to be
> a way to set this up so that the maximum number of new
> GetCurrentTimestamp() is limited to once per N seconds, vs. the
> current implementation that could do it many many many times per
> second.

The GetCurrentTimestamp() is called only once at very early in
the backend's life in InitPostgres. Not in
start_xact_command. What I did in the function is just copying
stmtStartTimstamp, not GetCurrentTimestamp().

> 4. The code in CatalogCacheCreateEntry seems clearly unacceptable.  In
> a pathological case where CatCacheCleanupOldEntries removes exactly
> one element per cycle, it could be called on every new catcache
> allocation.

It may be a problem, if just one entry was created in the
duration longer than by catalog_cache_prune_min_age and resize
interval, or all candidate entries except one are actually in use
at the pruning moment. Is it realistic?

> I think we need to punt this patch to next release.  We're not
> converging on anything committable very fast.

Yeah, maybe right. This patch had several month silence several
times, got comments and modified taking in the comments for more
than two cycles, and finally had a death sentence (not literaly,
actually postpone) at very close to this third cycle end. I
anticipate the same continues in the next cycle.

By the way, I found the reason of the wrong result of the
previous benchmark. The test 3_0/1 needs to update catcacheclock
midst of the loop. I'm going to fix it and rerun it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 596d6b018e1b7ddd5828298bfaba3ee405eb2604 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Mar 2019 13:32:51 +0900
Subject: [PATCH] Remove entries that haven't been used for a certain time

Catcache entries happen to be left alone for several reasons. It is
not desirable that such useless entries eat up memory. Catcache
pruning feature removes entries that haven't been accessed for a
certain time before enlarging hash array.
---
 doc/src/sgml/config.sgml  |  19 +
 src/backend/tcop/postgres.c   |   2 +
 src/backend/utils/cache/catcache.c| 103 +-
 src/backend/utils/misc/guc.c  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/utils/catcache.h  |  16 
 6 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bc1d0f7bfa..819b252029 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1677,6 +1677,25 @@ include_dir 'conf.d'
   
  
 
+ 
+  catalog_cache_prune_min_age (integer)
+  
+   catalog_cache_prune_min_age configuration
+   parameter
+  
+  
+  
+   
+		 Specifies the minimum amount of unused time in seconds at which a
+		 system catalog cache entry is removed. -1 indicates that this feature
+		 is disabled at all. The value defaults to 300 seconds (5
+		 minutes). The entries that are not used for the duration
+		 can be removed to prevent catalog cache from bloating with useless
+		 entries.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..a0efac86bc 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -71,6 +71,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/catcache.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2577,6 +2578,7 @@ start_xact_command(void)
 	 * not desired, the timeout has to be disabled explicitly.
 	 */
 	enable_statement_timeout();
+	SetCatCacheClock(GetCurrentStatementStartTimestamp());
 }
 
 static void
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index d05930bc4c..03c2d8524c 100644
--- 

Re: COPY FROM WHEN condition

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-05 12:59:06 +1300, David Rowley wrote:
> I read through the final commit to see what had changed and I noticed
> that I had forgotten to remove nbuffers from CopyMultiInsertInfo when
> changing from a hash table to a List.
> 
> Patch to fix is attached.

Pushed, thanks.




RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-04-04 Thread Tsunakawa, Takayuki
Hi Peter, Imai-san,

From: Peter Eisentraut [mailto:peter.eisentr...@2ndquadrant.com]
> I can't detect any performance improvement with the patch applied to
> current master, using the test case from Yoshikazu Imai (2019-03-19).

That's strange...  Peter, Imai-san, can you compare your test procedures?

Peter, can you check and see the performance improvement with David's method on 
March 26 instead?


Regards
Takayuki Tsunakawa



Re: COPY FROM WHEN condition

2019-04-04 Thread David Rowley
On Fri, 5 Apr 2019 at 12:32, Andres Freund  wrote:
> I've pushed this now. Besides those naming changes, I'd to re-add the
> zero initialization (I did end up seing compiler warnings when compiling
> with optimizations).  Also some comment fixes.

Thanks for pushing.

> I added one more flush location, when inserting into a partition that
> cannot use batching - there might e.g. be triggers looking at rows in
> other partitions or such.

Good catch.

I read through the final commit to see what had changed and I noticed
that I had forgotten to remove nbuffers from CopyMultiInsertInfo when
changing from a hash table to a List.

Patch to fix is attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_unused_variable_in_copy_buffers.patch
Description: Binary data


Re: Changes to pg_dump/psql following collation "C" in the catalog

2019-04-04 Thread Tom Lane
"Daniel Verite"  writes:
> I think psql and pg_dump need to adjust, just like the 3rd party tools
> will, at least those that support collation-aware search in the catalog.
> PFA a patch that implements the slight changes needed.
> I'll add an entry for it in the next CF.

Hm, if that's as much as we have to touch, I think there's a good
argument for squeezing it into v12 rather than waiting.  The point
here is mostly to avoid a behavior change from pre-v12, but if we
allow v12 to have a different behavior, it's questionable whether
we'd want v13 to change it back.

Just looking at the patch, I wonder whether it doesn't need some
server-version checks.  At the very least this would break with
pre-9.1 servers, which lack COLLATE altogether.

regards, tom lane




Re: COPY FROM WHEN condition

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-04 12:04:28 -0700, Andres Freund wrote:
> On 2019-04-03 22:20:00 -0700, Andres Freund wrote:
> > On 2019-04-03 20:00:09 +1300, David Rowley wrote:
> > > Oops, I forgot about that one. v4 attached.
> > 
> > I'm pretty happy with this. I'm doing some minor changes (e.g. don't
> > like the function comment formatting that much, the tableam callback
> > needs docs, stuff like that), and then I'm going to push it tomorrow.
> 
> After another read, I also think I'm going to rename the functions a
> bit. CopyMultiInsertInfo_SetupBuffer, with its mix of camel-case and
> underscores, just doesn't seem to mix well with copy.c and also just
> generally other postgres code. I feel we already have too many different
> naming conventions...

I've pushed this now. Besides those naming changes, I'd to re-add the
zero initialization (I did end up seing compiler warnings when compiling
with optimizations).  Also some comment fixes.

I added one more flush location, when inserting into a partition that
cannot use batching - there might e.g. be triggers looking at rows in
other partitions or such.

I found some pretty minor slowdown for COPYing narrow rows into an
unlogged unpartitioned table. That got better by combining the loops in
CopyMultiInsertBufferFlush() (previously there were stalls due to the
indirect function calls for ExecCleartuple()). I think there's still a
tiny slowdown left in that scenario, but given that it's unlogged and
very narrow rows, I think that's ok.

Greetings,

Andres Freund




Re: Changes to pg_dump/psql following collation "C" in the catalog

2019-04-04 Thread Chapman Flack
>> "Daniel Verite"  writes:
>>> One consequence of using the "C" collation in the catalog versus
>>> the db collation

As an intrigued Person Following At Home, I was happy when I found
this little three-message thread had more context in [1] and [2]. :)

-Chap

[1] https://postgr.es/m/15938.1544377...@sss.pgh.pa.us
[2] https://postgr.es/m/5978.1544030...@sss.pgh.pa.us




Re: PostgreSQL Buildfarm Client Release 10

2019-04-04 Thread Andrew Dunstan
On Thu, Apr 4, 2019 at 6:18 PM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > The release can be downloaded from
> > https://github.com/PGBuildFarm/client-code/archive/REL_10.tar.gz or
> > https://buildfarm.postgresql.org/downloads/latest-client.tgz
>
> I don't actually see it on the buildfarm.postgresql.org server?
>
>

It's there. I checked the link with wget. But I think the web cache
might be having an issue.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PostgreSQL Buildfarm Client Release 10

2019-04-04 Thread Tom Lane
Andrew Dunstan  writes:
> The release can be downloaded from
> https://github.com/PGBuildFarm/client-code/archive/REL_10.tar.gz or
> https://buildfarm.postgresql.org/downloads/latest-client.tgz

I don't actually see it on the buildfarm.postgresql.org server?

regards, tom lane




Re: Inadequate executor locking of indexes

2019-04-04 Thread David Rowley
On Fri, 5 Apr 2019 at 08:26, Tom Lane  wrote:
> Pushed with some minor tweaking, mostly comments.

Thanks for tweaking and pushing this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: speeding up planning with partitions

2019-04-04 Thread David Rowley
On Fri, 5 Apr 2019 at 07:33, Floris Van Nee  wrote:
> I had a question about the performance of pruning of functions like now() and 
> current_date. I know these are handled differently, as they cannot be 
> excluded during the first phases of planning. However, curerntly, this new 
> patch makes the performance difference between the static timestamp variant 
> and now() very obvious (even more than before). Consider
> select * from partitioned_table where ts >= now()
> or
> select * from partitioned_table where ts >= '2019-04-04'
>
> The second plans in less than a millisecond, whereas the first takes +- 180ms 
> for a table with 1000 partitions. Both end up with the same plan.

The patch here only aims to improve the performance of queries to
partitioned tables when partitions can be pruned during planning. The
now() version of the query is unable to do that since we don't know
what that value will be during the execution of the query. In that
version, you're most likely seeing "Subplans Removed: ". This means
run-time pruning did some pruning and the planner generated subplans
for what you see plus  others. Since planning for all partitions is
still slow, you're getting a larger performance difference than
before, but only due to the fact that the other plan is now faster to
generate.

If you're never using prepared statements, i.e, always planning right
before execution, then you might want to consider using "where ts >=
'today'::timestamp".  This will evaluate to the current date during
parse and make the value available to the planner. You'll need to be
pretty careful with that though, as if you do prepare queries or
change to do that in the future then the bugs in your application
could be very subtle and only do the wrong thing just after midnight
on some day when the current time progresses over your partition
boundary.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Thomas Munro
On Fri, Apr 5, 2019 at 10:53 AM Thomas Munro  wrote:
> Ok, here is a patch that adds a one-typedef header and uses
> SegmentIndex to replace all cases of BlockNumber and int holding a
> segment number (where as an "index" or a "count").

(sorry, I meant "SegmentNumber", not "SegmentIndex")

-- 
Thomas Munro
https://enterprisedb.com




PostgreSQL Buildfarm Client Release 10

2019-04-04 Thread Andrew Dunstan
Announcing Release 10 of the PostgreSQL Buildfarm client

Principal feature: support for non-standard repositories:

. support multi-element branch names, such as “dev/featurename” or
“bug/ticket_number/branchname”
. provide a get_branches() method in SCM module
. support regular expression branches of interest. This is matched
against the list of available branches
. prune branches when doing git fetch.

This feature and some server side changes will be explored in detail
in my presentation at pgCon in Ottawa next month. The feature doesn’t
affect owners of animals in our normal public Build Farm. However, the
items below are of use to them.

Other features/ behaviour changes:

. support for testing cross version upgrade extended back to 9.2
. support for core Postgres changes:
  . extended support for USE_MODULE_DB
  . new extra_float_digits regime
  . removal of user table oid support
  . removal of abstime and friends
  . changed log file locations
. don’t search for valgrind messages unless valgrind is configured.
. make detection of when NO_TEMP_INSTALL is allowed more bulletproof

There are also various minor bug fixes and code improvements.

The release can be downloaded from
https://github.com/PGBuildFarm/client-code/archive/REL_10.tar.gz or
https://buildfarm.postgresql.org/downloads/latest-client.tgz



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Thomas Munro
On Fri, Apr 5, 2019 at 2:03 AM Alvaro Herrera  wrote:
> On 2019-Apr-04, Thomas Munro wrote:
> > I don't think it's project policy to put a single typedef into its own
> > header like that, and I'm not sure where else to put it.
>
> shrug.  Looks fine to me.  I suppose if we don't have it anywhere, it's
> just because we haven't needed that particular trick yet.  Creating a
> file with a lone typedef seems better than using uint32 to me.

It was commit 9fac5fd7 that gave me that idea.

Ok, here is a patch that adds a one-typedef header and uses
SegmentIndex to replace all cases of BlockNumber and int holding a
segment number (where as an "index" or a "count").

-- 
Thomas Munro
https://enterprisedb.com


0001-Introduce-SegmentNumber-typedef-for-relation-segment.patch
Description: Binary data


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-04 12:23:08 -0700, Andres Freund wrote:
> Also, how is this code even close to correct?
> CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and
> there's no WAL logging? Without even a comment arguing why that's OK (I
> don't think it is)?

Peter Geoghegan just reminded me over IM that there's actually logging
inside log_heap_visible(), called from visibilitymap_set(). Still lacks
a critical section though.

I still think this is the wrong architecture.

Greetings,

Andres Freund

PS: We're going to have to revamp visibilitymap_set() soon-ish - the
fact that it directly calls heap routines inside is bad, means that
additional AMs e.g. zheap has to reimplement that routine.




Re: New vacuum option to do only freezing

2019-04-04 Thread Bossart, Nathan
On 4/4/19, 12:06 PM, "Robert Haas"  wrote:
> Committed with a little bit of documentation tweaking.

Thanks!  I noticed a very small typo in the new documentation.

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index fdd8151220..c652f8b0bc 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -199,7 +199,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 

Re: [proposal] Add an option for returning SQLSTATE in psql error message

2019-04-04 Thread Tom Lane
didier  writes:
> [ 0001-Add-sqlstate-output-mode-to-VERBOSITY_v1.patch ]

Pushed with some mostly-cosmetic adjustments.  The main non-cosmetic
thing I did was to adjust the logic so that if no SQLSTATE is available,
it acts like TERSE mode.  Otherwise, we'd print nothing at all except
"ERROR:", which seems both quite useless and contrary to our message
style guidelines.  Moreover, documenting it this way makes the behavior
not inconsistent with what happens for libpq-generated errors and
errors from protocol-version-2 servers.

regards, tom lane




Re: query logging of prepared statements

2019-04-04 Thread Justin Pryzby
On Thu, Apr 04, 2019 at 03:07:04PM -0300, Alvaro Herrera wrote:
> which does not look good -- the statement is nowhere to be seen.  The commit
> message quotes this as desirable output:

Good catch.

Unnamed statements sent behind the scenes by pqExecParams weren't being logged.

I specifically handled unnamed statements in my v1 patch, and tested that in
20190215145704.gw30...@telsasoft.com, but for some reason dropped that logic in
v2, which was intended to only remove behavior conditional on
log_error_verbosity.

Previous patches also never logged pqPrepare with named prepared statements
(unnamed prepared statements were handled in v1 and SQL PREPARE was handled as
a simple statement).  

On Thu, Apr 04, 2019 at 03:26:30PM -0300, Alvaro Herrera wrote:
> With this patch (pretty much equivalent to reinstanting the
> errdetail_execute for that line),

That means the text of the prepared statement is duplicated for each execute,
which is what we're trying to avoid, no ?

Attached patch promotes message to LOG in exec_parse_message.  Parse is a
protocol-layer message, and I think it's used by (only) pqPrepare and
pqExecParams.

testlibpq3 now shows:

|LOG:  parse : SELECT * FROM test1 WHERE t = $1
|LOG:  execute 
|DETAIL:  parameters: $1 = 'joe''s place'

|LOG:  parse : SELECT * FROM test1 WHERE i = $1::int4
|LOG:  execute 
|DETAIL:  parameters: $1 = '2'

Compare unpatched v11.2 , the text of the prepared statement was shown in
"parse" phase rather than in each execute:

|LOG:  execute : SELECT * FROM test1 WHERE t = $1
|DETAIL:  parameters: $1 = 'joe''s place'

|LOG:  execute : SELECT * FROM test1 WHERE i = $1::int4
|DETAIL:  parameters: $1 = '2'

Justin
>From 0826f7f90eb8f169244140a22db0bbbcb0d2b269 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 9 Feb 2019 19:20:43 -0500
Subject: [PATCH v4] Avoid repetitive log of PREPARE during EXECUTE of prepared
 statements

---
 src/backend/tcop/postgres.c | 70 -
 1 file changed, 18 insertions(+), 52 deletions(-)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1..348343c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -182,7 +182,6 @@ static int	ReadCommand(StringInfo inBuf);
 static void forbidden_in_wal_sender(char firstchar);
 static List *pg_rewrite_query(Query *query);
 static bool check_log_statement(List *stmt_list);
-static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
 static int	errdetail_recovery_conflict(void);
@@ -1041,8 +1040,7 @@ exec_simple_query(const char *query_string)
 	{
 		ereport(LOG,
 (errmsg("statement: %s", query_string),
- errhidestmt(true),
- errdetail_execute(parsetree_list)));
+ errhidestmt(true)));
 		was_logged = true;
 	}
 
@@ -1292,8 +1290,7 @@ exec_simple_query(const char *query_string)
 			ereport(LOG,
 	(errmsg("duration: %s ms  statement: %s",
 			msec_str, query_string),
-	 errhidestmt(true),
-	 errdetail_execute(parsetree_list)));
+	 errhidestmt(true)));
 			break;
 	}
 
@@ -1326,6 +1323,7 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	bool		is_named;
 	bool		save_log_statement_stats = log_statement_stats;
 	char		msec_str[32];
+	bool		was_logged = false;
 
 	/*
 	 * Report query to various monitoring facilities.
@@ -1339,11 +1337,6 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	if (save_log_statement_stats)
 		ResetUsage();
 
-	ereport(DEBUG2,
-			(errmsg("parse %s: %s",
-	*stmt_name ? stmt_name : "",
-	query_string)));
-
 	/*
 	 * Start up a transaction command so we can run parse analysis etc. (Note
 	 * that this will normally change current memory context.) Nothing happens
@@ -1404,6 +1397,14 @@ exec_parse_message(const char *query_string,	/* string to execute */
 		Query	   *query;
 		bool		snapshot_set = false;
 
+		if (check_log_statement(parsetree_list)) {
+			ereport( LOG, // else log as DEBUG2 ?
+(errmsg("parse %s: %s",
+		*stmt_name ? stmt_name : "",
+		query_string)));
+			was_logged = true;
+		}
+
 		raw_parse_tree = linitial_node(RawStmt, parsetree_list);
 
 		/*
@@ -1544,7 +1545,7 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	/*
 	 * Emit duration logging if appropriate.
 	 */
-	switch (check_log_duration(msec_str, false))
+	switch (check_log_duration(msec_str, was_logged))
 	{
 		case 1:
 			ereport(LOG,
@@ -1920,12 +1921,11 @@ exec_bind_message(StringInfo input_message)
 			break;
 		case 2:
 			ereport(LOG,
-	(errmsg("duration: %s ms  bind %s%s%s: %s",
+	(errmsg("duration: %s ms  bind %s%s%s",
 			msec_str,
 			*stmt_name ? stmt_name : "",
 			*portal_name ? "/" : "",
-			*portal_name ? portal_name : "",
-			psrc->query_string),
+			*portal_name ? portal_name : ""),
 	 errhidestmt(true),
 	 errdetail_params(params)));
 			break;

Re: propagating replica identity to partitions

2019-04-04 Thread Alvaro Herrera
On 2019-Mar-29, Peter Eisentraut wrote:

> On 2019-03-28 18:16, Simon Riggs wrote:
> > SET TABLESPACE should not recurse because it copies the data, while
> > holding long locks. If that was ever fixed so it happened concurrently,
> > I would agree this could recurse by default.
> 
> Since a partitioned table has no storage, what is the meaning of moving
> a partitioned table to a different tablespace without recursing?

Changing the tablespace of a partitioned table currently means to set a
default tablespace for partitions created after the change.

Robert proposed to change it so that it means to move every partition to
that tablespace, unless ONLY is specified.  Simon objects to that change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-04-04 Thread Peter Eisentraut
On 2019-04-04 06:58, Amit Langote wrote:
> Also, since the "speed up partition planning" patch went in (428b260f8),
> it might be possible to see the performance boost even with the
> partitioning example you cited upthread.

I can't detect any performance improvement with the patch applied to
current master, using the test case from Yoshikazu Imai (2019-03-19).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: proposal: pg_restore --convert-to-text

2019-04-04 Thread Alvaro Herrera
I just pushed it with trivial changes:

* rebased for cc8d41511721

* changed wording in the error message

* added a new test for the condition in t/001_basic.pl

* Added the "-" in the --help line of -f.


Andrew G. never confirmed that this change is sufficient to appease
users being confused by the previous behavior.  I hope it is ...

Thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Block level parallel vacuum

2019-04-04 Thread Robert Haas
On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada  wrote:
> These patches conflict with the current HEAD. Attached the updated patches.

They'll need another rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-04-04 Thread Robert Haas
On Thu, Apr 4, 2019 at 11:52 AM Antonin Houska  wrote:
> Robert Haas  wrote:
> > I'm willing to put some effort into trying to get this into v13 if
> > you're willing to keep hacking on it, but there's probably a fair
> > amount to do and a year can go by in a hurry.
>
> That'd be appreciated, especially by my boss. It doesn't seem likely that in
> this situation I'll be assigned many other tasks of higher priority. So yes, I
> expect to have quite some time for this patch. Thanks.

Great.  I think it will be appreciated by my boss, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Protect syscache from bloating with negative cache entries

2019-04-04 Thread Robert Haas
On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI
 wrote:
> So it seems to me that the simplest "Full" version wins. The
> attached is rebsaed version. dlist_move_head(entry) is removed as
> mentioned above in that patch.

1. I really don't think this patch has any business changing the
existing logic.  You can't just assume that the dlist_move_head()
operation is unimportant for performance.

2. This patch still seems to add a new LRU list that has to be
maintained.  That's fairly puzzling.  You seem to have concluded that
the version without the additional LRU wins, but the sent a new copy
of the version with the LRU version.

3. I don't think adding an additional call to GetCurrentTimestamp() in
start_xact_command() is likely to be acceptable.  There has got to be
a way to set this up so that the maximum number of new
GetCurrentTimestamp() is limited to once per N seconds, vs. the
current implementation that could do it many many many times per
second.

4. The code in CatalogCacheCreateEntry seems clearly unacceptable.  In
a pathological case where CatCacheCleanupOldEntries removes exactly
one element per cycle, it could be called on every new catcache
allocation.

I think we need to punt this patch to next release.  We're not
converging on anything committable very fast.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Inadequate executor locking of indexes

2019-04-04 Thread Tom Lane
David Rowley  writes:
> Wrong patch.  Here's what I meant to send.

Pushed with some minor tweaking, mostly comments.

Some notes:

* We now have a general convention that queries always take the same lock
type on indexes as on their parent tables, but that convention is not
respected by index DDL.  I'm not sure if there is any additional cleanup
that should be done there.  The DDL code intends to block concurrent
execution of other DDL on the same index, in most cases, so it may be
fine.  At the very least, the different lock levels for those cases are
clearly intentional, while I don't think that was true for DML.

* I dropped the extra assertion you'd added to infer_arbiter_indexes,
as it didn't seem necessary or appropriate.  That function's just
interested in inspecting the index, so it doesn't need to assume anything
about how strong the lock is.  I think the comment that was there was
just trying to justify the shortcut of hard-wiring the lock level as
RowExclusiveLock.

* I went ahead and changed gin_clean_pending_list() to take
RowExclusiveLock not AccessShareLock on its target index.  I'm not quite
sure that AccessShareLock is an actual bug there; it may be that GIN's
internal conventions are such that that's safe.  But it sure seemed darn
peculiar, and at risk of causing future problems.

regards, tom lane




Re: Changes to pg_dump/psql following collation "C" in the catalog

2019-04-04 Thread Daniel Verite
Tom Lane wrote:

> "Daniel Verite"  writes:
> > One consequence of using the "C" collation in the catalog versus
> > the db collation is that pg_dump -t with a regexp may not find
> > the same tables as before. It happens when these conditions are
> > all met:
> > - the collation of the database is not "C"
> > - the regexp has locale-dependant parts
> > - the names to match include characters that are sensitive to
> > locale-dependant matching
> 
> Hm, interesting.
> 
> > It seems that to fix that, we could qualify the references to columns such
> > as "relname" and "schema_name" with COLLATE "default" clauses in the
> > queries that use pattern-matching in client-side tools, AFAICS
> > pg_dump and psql.
> 
> Seems reasonable.  I was initially worried that this might interfere with
> query optimization, but some experimentation says that the planner
> successfully derives prefix index clauses anyway (which is correct,
> because matching a fixed regex prefix doesn't depend on locale).
> 
> It might be better to attach the COLLATE clause to the pattern constant
> instead of the column name; that'd be less likely to break if sent to
> an older server.
> 
> > Before going any further with this idea, is there agreement that it's an
> > issue to address and does this look like the best way to do that?
> 
> That is a question worth asking.  We're going to be forcing people to get
> used to this when working directly in SQL, so I don't know if masking it
> in a subset of tools is really a win or not.

I think psql and pg_dump need to adjust, just like the 3rd party tools
will, at least those that support collation-aware search in the catalog.
PFA a patch that implements the slight changes needed.

I'll add an entry for it in the next CF.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index 5c1732a..69ac6f9 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -808,6 +808,8 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
  * to limit the set of objects returned.  The WHERE clauses are appended
  * to the already-partially-constructed query in buf.  Returns whether
  * any clause was added.
+ * The pattern matching uses the collation of the database through explicit
+ * COLLATE "default" clauses.
  *
  * conn: connection query will be sent to (consulted for escaping rules).
  * buf: output parameter.
@@ -971,17 +973,18 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
 appendPQExpBuffer(buf,
   "(%s OPERATOR(pg_catalog.~) ", namevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
+appendPQExpBufferStr(buf, " COLLATE \"default\"");
 appendPQExpBuffer(buf,
   "\nOR %s OPERATOR(pg_catalog.~) ",
   altnamevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
-appendPQExpBufferStr(buf, ")\n");
+appendPQExpBufferStr(buf, " COLLATE \"default\" )\n");
 			}
 			else
 			{
 appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", namevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
-appendPQExpBufferChar(buf, '\n');
+appendPQExpBufferStr(buf, " COLLATE \"default\"\n");
 			}
 		}
 	}
@@ -997,7 +1000,7 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
 			WHEREAND();
 			appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", schemavar);
 			appendStringLiteralConn(buf, schemabuf.data, conn);
-			appendPQExpBufferChar(buf, '\n');
+			appendPQExpBufferStr(buf, " COLLATE \"default\"\n");
 		}
 	}
 	else


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Andres Freund
Hi,

On 2019-04-04 16:15:54 -0300, Alvaro Herrera wrote:
> On 2019-Apr-04, Andres Freund wrote:
> 
> > I'm totally not OK with this from a layering
> > POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific
> > (without being named such), whereas all the heap specific bits are
> > getting moved below tableam.
> 
> This is a fair complaint, but on the other hand the COPY changes for
> table AM are still being developed, so there's no ground on which to
> rebase this patch.  Do you have a timeline on getting the COPY one
> committed?

~2h. Just pondering the naming of some functions etc.  Don't think
there's a large interdependency though.

But even if tableam weren't committed, I'd still argue that it's
structurally done wrong in the patch right now.  FWIW, I actually think
this whole approach isn't quite right - this shouldn't be done as a
secondary action after we'd already inserted, with a separate
lock-unlock cycle etc.

Also, how is this code even close to correct?
CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and
there's no WAL logging? Without even a comment arguing why that's OK (I
don't think it is)?

Greetings,

Andres Freund




Re: Row Level Security − leakproof-ness and performance implications

2019-04-04 Thread Peter Eisentraut
On 2019-03-18 20:08, Peter Eisentraut wrote:
> I agree that it would be useful to document and discuss better which
> built-in operators are leak-proof and which are not.  But I don't think
> the CREATE POLICY reference page is the place to do it.  Note that the
> leak-proofness mechanism was originally introduced for security-barrier
> views (an early form of RLS if you will), so someone could also
> reasonably expect a discussion there.

It sounds like this will need significant additional work, so setting as
returned with feedback for now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: New vacuum option to do only freezing

2019-04-04 Thread Robert Haas
On Wed, Apr 3, 2019 at 10:32 PM Masahiko Sawada  wrote:
> Attached the updated version patch.

Committed with a little bit of documentation tweaking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: query logging of prepared statements

2019-04-04 Thread Alvaro Herrera
On 2019-Apr-04, Alvaro Herrera wrote:

> I think we could improve on this by setting a "logged" flag in the
> portal; if the Parse logs the statement, then don't include the
> statement in further lines, otherwise do.

Also: I think such a flag could help the case of a query that takes
long enough to execute to exceed the log_min_duration_statement, but not
long enough to parse.  Right now, log_min_duration_statement=500 shows

2019-04-04 15:59:39 -03 [6353-1] LOG:  duration: 2002.298 ms  execute 
2019-04-04 15:59:39 -03 [6353-2] DETAIL:  parameters: $1 = 'joe''s place'

if I change the testlibpq3 query to be 
"SELECT * FROM test1 WHERE t = $1 and pg_sleep(1) is not null",

Also, if you parse once and bind/execute many times, IMO the statement
should be logged exactly once.  I think you could that with the flag I
propose.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: speeding up planning with partitions

2019-04-04 Thread Floris Van Nee
Hi all,

First of all I would like to thank everyone involved in this patch for their 
hard work on this. This is a big step forward. I've done some performance and 
functionality testing with the patch that was committed to master and it looks 
very good.
I had a question about the performance of pruning of functions like now() and 
current_date. I know these are handled differently, as they cannot be excluded 
during the first phases of planning. However, curerntly, this new patch makes 
the performance difference between the static timestamp variant and now() very 
obvious (even more than before). Consider
select * from partitioned_table where ts >= now()
or
select * from partitioned_table where ts >= '2019-04-04'

The second plans in less than a millisecond, whereas the first takes +- 180ms 
for a table with 1000 partitions. Both end up with the same plan.

I'm not too familiar with the code that handles this, but is there a 
possibility for improvement in this area? Or is the stage at which exclusion 
for now()/current_date occurs already too far in the process to make any good 
improvements to this? My apologies if this is considered off-topic for this 
patch, but I ran into this issue specifically when I was testing this patch, so 
I thought I'd ask here about it. I do think a large number of use-cases for 
tables with a large number of partitions involve a timestamp for partition key, 
and naturally people will start writing queries for this that use functions 
such as now() and current_date.

Thanks again for your work on this patch!

-Floris


From: Amit Langote 
Sent: Tuesday, April 2, 2019 7:50 AM
To: Tom Lane
Cc: David Rowley; Imai Yoshikazu; jesper.peder...@redhat.com; Imai, Yoshikazu; 
Amit Langote; Alvaro Herrera; Robert Haas; Justin Pryzby; Pg Hackers
Subject: Re: speeding up planning with partitions [External]

Thanks for taking a look.

On 2019/04/02 2:34, Tom Lane wrote:
> Amit Langote  writes:
>> On 2019/03/30 0:29, Tom Lane wrote:
>>> That seems like probably an independent patch --- do you want to write it?
>
>> Here is that patch.
>> It revises get_relation_constraints() such that the partition constraint
>> is loaded in only the intended cases.
>
> So I see the problem you're trying to solve here, but I don't like this
> patch a bit, because it depends on root->inhTargetKind which IMO is a
> broken bit of junk that we need to get rid of.  Here is an example of
> why, with this patch applied:
>
> regression=# create table p (a int) partition by list (a);
> CREATE TABLE
> regression=# create table p1 partition of p for values in (1);
> CREATE TABLE
> regression=# set constraint_exclusion to on;
> SET
> regression=# explain select * from p1 where a = 2;
> QUERY PLAN
> --
>  Result  (cost=0.00..0.00 rows=0 width=0)
>One-Time Filter: false
> (2 rows)
>
> So far so good, but watch what happens when we include the same case
> in an UPDATE on some other partitioned table:
>
> regression=# create table prtab (a int, b int) partition by list (a);
> CREATE TABLE
> regression=# create table prtab2 partition of prtab for values in (2);
> CREATE TABLE
> regression=# explain update prtab set b=b+1 from p1 where prtab.a=p1.a and 
> p1.a=2;
> QUERY PLAN
> ---
>  Update on prtab  (cost=0.00..82.30 rows=143 width=20)
>Update on prtab2
>->  Nested Loop  (cost=0.00..82.30 rows=143 width=20)
>  ->  Seq Scan on p1  (cost=0.00..41.88 rows=13 width=10)
>Filter: (a = 2)
>  ->  Materialize  (cost=0.00..38.30 rows=11 width=14)
>->  Seq Scan on prtab2  (cost=0.00..38.25 rows=11 width=14)
>  Filter: (a = 2)
> (8 rows)
>
> No constraint exclusion, while in v10 you get
>
>  Update on prtab  (cost=0.00..0.00 rows=0 width=0)
>->  Result  (cost=0.00..0.00 rows=0 width=0)
>  One-Time Filter: false
>
> The reason is that this logic supposes that root->inhTargetKind describes
> *all* partitioned tables in the query, which is obviously wrong.
>
> Now maybe we could make it work by doing something like
>
>   if (rel->reloptkind == RELOPT_BASEREL &&
> (root->inhTargetKind == INHKIND_NONE ||
>  rel->relid != root->parse->resultRelation))

Ah, you're right.  inhTargetKind has to be checked in conjunction with
checking whether the relation is the target relation.

> but I find that pretty messy, plus it's violating the concept that we
> shouldn't be allowing messiness from inheritance_planner to leak into
> other places.

I'm afraid that we'll have to live with this particular hack as long as we
have inheritance_planner(), but we maybe could somewhat reduce the extent
to which the hack is spread into other planner files.

How about we move the part of get_relation_constraints() that loads the
partition 

Re: query logging of prepared statements

2019-04-04 Thread Alvaro Herrera
On 2019-Apr-04, Alvaro Herrera wrote:

> However, turning duration logging off and using log_statement=all, this is 
> what
> I get:
> 
> 2019-04-04 14:58:42.564 -03 [31685] LOG:  statement: SET search_path = 
> testlibpq3
> 2019-04-04 14:58:42.565 -03 [31685] LOG:  execute 
> 2019-04-04 14:58:42.565 -03 [31685] DETAIL:  parameters: $1 = 'joe''s place'
> 2019-04-04 14:58:42.565 -03 [31685] LOG:  execute 
> 2019-04-04 14:58:42.565 -03 [31685] DETAIL:  parameters: $1 = '2'

With this patch (pretty much equivalent to reinstanting the
errdetail_execute for that line),

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dbc7b797c6e..fd73d5e9951 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2056,7 +2056,6 @@ exec_execute_message(const char *portal_name, long 
max_rows)
prepStmtName,
*portal_name ? "/" : "",
*portal_name ? portal_name : 
""),
-errhidestmt(true),
 errdetail_params(portalParams)));
was_logged = true;
}

I get what seems to be pretty much what is wanted for this case:

2019-04-04 15:18:16.817 -03 [4559] LOG:  statement: SET search_path = testlibpq3
2019-04-04 15:18:16.819 -03 [4559] LOG:  execute 
2019-04-04 15:18:16.819 -03 [4559] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 15:18:16.819 -03 [4559] STATEMENT:  SELECT * FROM test1 WHERE t = $1
2019-04-04 15:18:16.820 -03 [4559] LOG:  execute 
2019-04-04 15:18:16.820 -03 [4559] DETAIL:  parameters: $1 = '2'
2019-04-04 15:18:16.820 -03 [4559] STATEMENT:  SELECT * FROM test1 WHERE i = 
$1::int4

However, by setting both log_statement=all and
log_min_duration_statement=0 and that patch (I also added %l to
log_line_prefix), I get this:

2019-04-04 15:23:45 -03 [5208-1] LOG:  statement: SET search_path = testlibpq3
2019-04-04 15:23:45 -03 [5208-2] LOG:  duration: 0.441 ms
2019-04-04 15:23:45 -03 [5208-3] LOG:  duration: 1.127 ms  parse : 
SELECT * FROM test1 WHERE t = $1
2019-04-04 15:23:45 -03 [5208-4] LOG:  duration: 0.789 ms  bind 
2019-04-04 15:23:45 -03 [5208-5] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 15:23:45 -03 [5208-6] LOG:  execute 
2019-04-04 15:23:45 -03 [5208-7] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 15:23:45 -03 [5208-8] STATEMENT:  SELECT * FROM test1 WHERE t = $1
2019-04-04 15:23:45 -03 [5208-9] LOG:  duration: 0.088 ms
2019-04-04 15:23:45 -03 [5208-10] LOG:  duration: 0.363 ms  parse : 
SELECT * FROM test1 WHERE i = $1::int4
2019-04-04 15:23:45 -03 [5208-11] LOG:  duration: 0.206 ms  bind 
2019-04-04 15:23:45 -03 [5208-12] DETAIL:  parameters: $1 = '2'
2019-04-04 15:23:45 -03 [5208-13] LOG:  execute 
2019-04-04 15:23:45 -03 [5208-14] DETAIL:  parameters: $1 = '2'
2019-04-04 15:23:45 -03 [5208-15] STATEMENT:  SELECT * FROM test1 WHERE i = 
$1::int4
2019-04-04 15:23:45 -03 [5208-16] LOG:  duration: 0.053 ms

Note that line 5208-8 is duplicative of 5208-3.

I think we could improve on this by setting a "logged" flag in the
portal; if the Parse logs the statement, then don't include the
statement in further lines, otherwise do.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: query logging of prepared statements

2019-04-04 Thread Alvaro Herrera
On 2019-Mar-04, Justin Pryzby wrote:

> Thanks for reviewing.  I'm also interested in discussion about whether this
> change is undesirable for someone else for some reason ?  For me, the existing
> output seems duplicative and "denormalized".  :)

Some digging turned up that the function you're removing was added by
commit 893632be4e17.  The commit message mentions output for testlibpq3,
so I ran that against a patched server.  With log_min_duration_statement=0
I get this, which looks good:

2019-04-04 14:59:15.529 -03 [31723] LOG:  duration: 0.108 ms  statement: SET 
search_path = testlibpq3
2019-04-04 14:59:15.530 -03 [31723] LOG:  duration: 0.303 ms  parse : 
SELECT * FROM test1 WHERE t = $1
2019-04-04 14:59:15.530 -03 [31723] LOG:  duration: 0.231 ms  bind 
2019-04-04 14:59:15.530 -03 [31723] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 14:59:15.530 -03 [31723] LOG:  duration: 0.016 ms  execute 
2019-04-04 14:59:15.530 -03 [31723] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 14:59:15.530 -03 [31723] LOG:  duration: 0.096 ms  parse : 
SELECT * FROM test1 WHERE i = $1::int4
2019-04-04 14:59:15.530 -03 [31723] LOG:  duration: 0.053 ms  bind 
2019-04-04 14:59:15.530 -03 [31723] DETAIL:  parameters: $1 = '2'
2019-04-04 14:59:15.530 -03 [31723] LOG:  duration: 0.007 ms  execute 
2019-04-04 14:59:15.530 -03 [31723] DETAIL:  parameters: $1 = '2'

An unpatched server emits this:

2019-04-04 15:03:01.176 -03 [1165] LOG:  duration: 0.163 ms  statement: SET 
search_path = testlibpq3
2019-04-04 15:03:01.176 -03 [1165] LOG:  duration: 0.475 ms  parse : 
SELECT * FROM test1 WHERE t = $1
2019-04-04 15:03:01.177 -03 [1165] LOG:  duration: 0.403 ms  bind : 
SELECT * FROM test1 WHERE t = $1
2019-04-04 15:03:01.177 -03 [1165] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 15:03:01.177 -03 [1165] LOG:  duration: 0.028 ms  execute : 
SELECT * FROM test1 WHERE t = $1
2019-04-04 15:03:01.177 -03 [1165] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 15:03:01.177 -03 [1165] LOG:  duration: 0.177 ms  parse : 
SELECT * FROM test1 WHERE i = $1::int4
2019-04-04 15:03:01.177 -03 [1165] LOG:  duration: 0.096 ms  bind : 
SELECT * FROM test1 WHERE i = $1::int4
2019-04-04 15:03:01.177 -03 [1165] DETAIL:  parameters: $1 = '2'
2019-04-04 15:03:01.177 -03 [1165] LOG:  duration: 0.014 ms  execute : 
SELECT * FROM test1 WHERE i = $1::int4
2019-04-04 15:03:01.177 -03 [1165] DETAIL:  parameters: $1 = '2'

Note that with your patch we no longer get the statement in the "bind" and
"execute" lines, which seems good; that was far too noisy.


However, turning duration logging off and using log_statement=all, this is what
I get:

2019-04-04 14:58:42.564 -03 [31685] LOG:  statement: SET search_path = 
testlibpq3
2019-04-04 14:58:42.565 -03 [31685] LOG:  execute 
2019-04-04 14:58:42.565 -03 [31685] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 14:58:42.565 -03 [31685] LOG:  execute 
2019-04-04 14:58:42.565 -03 [31685] DETAIL:  parameters: $1 = '2'

which does not look good -- the statement is nowhere to be seen.  The commit
message quotes this as desirable output:

LOG:  statement: execute : SELECT * FROM test1 WHERE t = $1
DETAIL:  parameters: $1 = 'joe''s place'
LOG:  statement: execute : SELECT * FROM test1 WHERE i = $1::int4
DETAIL:  parameters: $1 = '2'

which is approximately what I get with an unpatched server:

2019-04-04 15:04:25.718 -03 [1235] LOG:  statement: SET search_path = testlibpq3
2019-04-04 15:04:25.719 -03 [1235] LOG:  execute : SELECT * FROM test1 
WHERE t = $1
2019-04-04 15:04:25.719 -03 [1235] DETAIL:  parameters: $1 = 'joe''s place'
2019-04-04 15:04:25.720 -03 [1235] LOG:  execute : SELECT * FROM test1 
WHERE i = $1::int4
2019-04-04 15:04:25.720 -03 [1235] DETAIL:  parameters: $1 = '2'

So I think this needs a bit more work.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Robbie Harwood
Tom Lane  writes:

> I wrote:
>> Stephen Frost  writes:
>>> So I'm a bit surprised that it's taking 4 minutes for you.  I wonder if
>>> there might be an issue related to the KDC wanting to get some amount of
>>> random data and the system you're on isn't producing random bytes very
>>> fast..?
>
>> Not sure.  This is my usual development box and it also does mail, DNS,
>> etc for my household, so I'd expect it to have plenty of entropy.
>> But it's running a pretty old kernel, and old Kerberos too, so maybe
>> the explanation is in there somewhere.
>
> Same test on a laptop running Fedora 28 takes a shade under 5 seconds.
> The laptop has a somewhat better geekbench rating than my workstation,
> but certainly not 50x better.  And I really doubt it's got more entropy
> sources than the workstation.  Gotta be something about the kernel.
>
> Watching the test logs, I see that essentially all the time on the RHEL6
> machine is consumed by the two
>
> # Running: /usr/sbin/kdb5_util create -s -P secret0
>
> steps.  Is there a case for merging the two scripts so we only have to
> do that once?  Maybe not, if nobody else sees this.

I think that would be a good idea!  Unfortunately I don't speak perl
well enough to do that, so I'd just copied-and-modified.

Thanks,
--Robbie


signature.asc
Description: PGP signature


Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Robbie Harwood
Tom Lane  writes:

> Stephen Frost  writes:
>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>>> Well, if the caller thinks what is being passed back is an int,
>>> it will do a 32-to-64-bit widening, which is almost certainly
>>> going to result in a corrupted pointer.
>
>> Oh, good point.  Interesting that it still works then.
>
> There must be something about the x86_64 ABI that allows this to
> accidentally work -- maybe integers are presumed to be sign-extended
> to 64 bits by callee not caller?  I added some logging and verified
> that pgstat.c is seeing the correct string value, so it's working
> somehow.
>
>> I've got a fix for the missing prototypes, I hadn't noticed the issue
>> previously due to always building with SSL enabled as well.
>
> Yeah, I'd just come to the conclusion that it's because I didn't
> include --with-openssl, and libpq-be.h's #ifdef nest doesn't expect
> that.
>
> BTW, the kerberos test suite takes nearly 4 minutes for me, is
> it supposed to be so slow?

My guess is entropy problems as well.  If available, configuring
/dev/urandom passthrough from the host is a generally helpful thing to
do.

My (Fedora, Centos/RHEL 7+) krb5 builds use getrandom() for entropy, so
they shouldn't be slow; I believe Debian also has started doing so
recently as well.  I don't know what other distros/OSs do for this.

Thanks,
--Robbie


signature.asc
Description: PGP signature


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-04-04 Thread Tomas Vondra

Hi,

I've been looking at this patch for a while, and it seems pretty much RFC,
so barring objections I'll take care of that once I do a bit more testing
and review. Unless someone else wants to take care of that.

FWIW I wonder if we should add the code for partitioned tables to
CopyFrom, considering that's unsupported and so can't be tested etc. It's
not a huge amount of code, of course.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Watching the test logs, I see that essentially all the time on the RHEL6
>> machine is consumed by the two
>> # Running: /usr/sbin/kdb5_util create -s -P secret0
>> steps.  Is there a case for merging the two scripts so we only have to
>> do that once?  Maybe not, if nobody else sees this.

> I do think that mergeing them would be a good idea and I can look into
> that, though at least locally that step takes less than a second..  I
> wonder if you might strace (or whatever is appropriate) that kdb5_util
> and see what's taking so long.  I seriously doubt it's the actual
> kdb5_util code and strongly suspect it's some kernel call.

"strace -r" pins the blame pretty firmly on /dev/random:

 0.76 open("/dev/random", O_RDONLY) = 3
 0.000227 fcntl(3, F_SETFD, FD_CLOEXEC) = 0
 0.61 fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0
 0.68 read(3, "\336&\301\310V\344q\217\264-\262\320w-", 64) = 14
 0.91 read(3, "\326\353I\371$\361", 50) = 6
15.328306 read(3, "\214\301\313]I\325", 44) = 6
17.418929 read(3, "z\251\37\275\365\24", 38) = 6
13.366997 read(3, "6\257I\315f\3", 32) = 6
11.457994 read(3, "\370\275\2765\31(", 26) = 6
23.472194 read(3, "\226\r\314\373\2014", 20) = 6
11.746848 read(3, "\335\336BR\30\322", 14) = 6
20.823940 read(3, "\366\214\r\211\0267", 8) = 6
14.429214 read(3, ",g", 2)  = 2
15.494835 close(3)  = 0

There's no other part of the trace that takes more than ~ 0.1s.
So this boils down to the old bugaboo about how much entropy
there really is in /dev/random.

regards, tom lane




Re: [GSoC 2019] Proposal: Develop Performance Farm Database and Website

2019-04-04 Thread Mark Wong
Hi Ilaria,

Edited for bottom posting. :)

On Fri, Mar 29, 2019 at 03:01:05PM +0100, Ilaria wrote:
> > Am 29.03.2019 um 13:52 schrieb Peter Eisentraut 
> > :
> > 
> >> On 2019-03-29 13:04, Robert Haas wrote:
> >>> On Tue, Mar 26, 2019 at 9:10 AM Ila B.  wrote:
> >>> I am Ilaria Battiston, an aspiring GSoC student, and I would love to have 
> >>> a feedback on the first draft of my Google Summer of Code proposal. The 
> >>> project is "Develop Performance Farm Database and Website”. You can find 
> >>> any other detail in the attached PDF file :)
> >> 
> >> I think there's probably a very large amount of work to be done in
> >> this area.  Nobody is going to finish it in a summer.  Still, there's
> >> probably some useful things you could get done in a summer.  I think a
> >> lot will depend on finding a good mentor who is familiar with these
> >> areas (which I am not).  Has anyone expressed an interest?
> > 
> > Moreover, I have a feeling that have been hearing about work on a
> > performance farm for many years.  Perhaps it should be investigated what
> > became of that work and what the problems were getting it to a working
> > state.

> Hello,
> 
> Thanks for the answer. This project is on the official PostgreSQL project 
> list of GSoC 2019, and potential mentors are stated there. 
> 
> I trust mentors’ judgement on outlining the work and the tasks to be done in 
> three months, and there is the previous student’s work to use as example if 
> needed. The project consists in building a database and a website on top of 
> it for users to browse performance data. 
> 
> Let me know whether there are any specific issues you’re concerned about. 

Hongyuan, our student last summer, put together a summary of his
progress in a GitHub issue:

https://github.com/PGPerfFarm/pgperffarm/issues/22


We have systems for proofing (from OSUOSL) and you can also see the
prototype here:

http://140.211.168.111/


For Phase 1, I'd recommend getting familiar with the database schema in
place now.  Perhaps it can use some tweaking, but I just mean to suggest
that it might not be necessary to rebuild it from scratch.

In Phase 2, we had some difficulty last year about getting the
authentication/authorization completely integrated.  I think the main
issue was how to integrate this app while using resources outside of the
community infrastructure.  We may have to continue working around that.

Otherwise, I think the rest make sense.  Let us know if you have any
more questions.

Regards,
Mark

-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Tom Lane
I wrote:
> Stephen Frost  writes:
>> So I'm a bit surprised that it's taking 4 minutes for you.  I wonder if
>> there might be an issue related to the KDC wanting to get some amount of
>> random data and the system you're on isn't producing random bytes very
>> fast..?

> Not sure.  This is my usual development box and it also does mail, DNS,
> etc for my household, so I'd expect it to have plenty of entropy.
> But it's running a pretty old kernel, and old Kerberos too, so maybe
> the explanation is in there somewhere.

Same test on a laptop running Fedora 28 takes a shade under 5 seconds.
The laptop has a somewhat better geekbench rating than my workstation,
but certainly not 50x better.  And I really doubt it's got more entropy
sources than the workstation.  Gotta be something about the kernel.

Watching the test logs, I see that essentially all the time on the RHEL6
machine is consumed by the two

# Running: /usr/sbin/kdb5_util create -s -P secret0

steps.  Is there a case for merging the two scripts so we only have to
do that once?  Maybe not, if nobody else sees this.

regards, tom lane




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-04-04 Thread Antonin Houska
Robert Haas  wrote:

> I'm willing to put some effort into trying to get this into v13 if
> you're willing to keep hacking on it, but there's probably a fair
> amount to do and a year can go by in a hurry.

That'd be appreciated, especially by my boss. It doesn't seem likely that in
this situation I'll be assigned many other tasks of higher priority. So yes, I
expect to have quite some time for this patch. Thanks.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> Kerberos tests are now failing for me (macOS).  I'm seeing
> 
> psql: error: could not connect to server: Over-size error packet sent by
> the server.
> not ok 3 - GSS encryption without auth
> 
> #   Failed test 'GSS encryption without auth'
> #   at t/002_enc.pl line 170.
> #  got: '2'
> # expected: '0'
> 
> (and repeated for several other tests).

Ok, it looks like there's a server-side error happening here, and it
would be good to see what that is, so can you send the server logs?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> There must be something about the x86_64 ABI that allows this to
>> accidentally work -- maybe integers are presumed to be sign-extended
>> to 64 bits by callee not caller?  I added some logging and verified
>> that pgstat.c is seeing the correct string value, so it's working
>> somehow.

> Huh, I'm not sure.  That's certainly interesting though.

Oh, no, it's simpler than that: the pointer values that
be_gssapi_get_princ() is returning just happen to be less than 2^31
on my system.  I'd dismissed that as being unlikely, but it's the truth.

> So I'm a bit surprised that it's taking 4 minutes for you.  I wonder if
> there might be an issue related to the KDC wanting to get some amount of
> random data and the system you're on isn't producing random bytes very
> fast..?

Not sure.  This is my usual development box and it also does mail, DNS,
etc for my household, so I'd expect it to have plenty of entropy.
But it's running a pretty old kernel, and old Kerberos too, so maybe
the explanation is in there somewhere.

regards, tom lane




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Well, if the caller thinks what is being passed back is an int,
> >> it will do a 32-to-64-bit widening, which is almost certainly
> >> going to result in a corrupted pointer.
> 
> > Oh, good point.  Interesting that it still works then.
> 
> There must be something about the x86_64 ABI that allows this to
> accidentally work -- maybe integers are presumed to be sign-extended
> to 64 bits by callee not caller?  I added some logging and verified
> that pgstat.c is seeing the correct string value, so it's working
> somehow.

Huh, I'm not sure.  That's certainly interesting though.

> > I've got a fix for the missing prototypes, I hadn't noticed the issue
> > previously due to always building with SSL enabled as well.
> 
> Yeah, I'd just come to the conclusion that it's because I didn't
> include --with-openssl, and libpq-be.h's #ifdef nest doesn't expect
> that.

Right, that should be fixed now with the commit I just pushed.

> BTW, the kerberos test suite takes nearly 4 minutes for me, is
> it supposed to be so slow?

Unfortunately, the kerberos test suite requires building a KDC to get
tickets from and that takes a bit of time.  On my laptop it takes about
8s:

make -s check  4.67s user 0.85s system 70% cpu 7.819 total

So I'm a bit surprised that it's taking 4 minutes for you.  I wonder if
there might be an issue related to the KDC wanting to get some amount of
random data and the system you're on isn't producing random bytes very
fast..?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Peter Eisentraut
On 2019-04-04 17:16, Tom Lane wrote:
> BTW, the kerberos test suite takes nearly 4 minutes for me, is
> it supposed to be so slow?

I've seen this on some virtualized machines that didn't have a lot of
entropy.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Well, if the caller thinks what is being passed back is an int,
>> it will do a 32-to-64-bit widening, which is almost certainly
>> going to result in a corrupted pointer.

> Oh, good point.  Interesting that it still works then.

There must be something about the x86_64 ABI that allows this to
accidentally work -- maybe integers are presumed to be sign-extended
to 64 bits by callee not caller?  I added some logging and verified
that pgstat.c is seeing the correct string value, so it's working
somehow.

> I've got a fix for the missing prototypes, I hadn't noticed the issue
> previously due to always building with SSL enabled as well.

Yeah, I'd just come to the conclusion that it's because I didn't
include --with-openssl, and libpq-be.h's #ifdef nest doesn't expect
that.

BTW, the kerberos test suite takes nearly 4 minutes for me, is
it supposed to be so slow?

regards, tom lane




Re: GiST VACUUM

2019-04-04 Thread Heikki Linnakangas

On 25/03/2019 15:20, Heikki Linnakangas wrote:

On 24/03/2019 18:50, Andrey Borodin wrote:

I was working on new version of gist check in amcheck and understand one more 
thing:

/* Can this page be recycled yet? */
bool
gistPageRecyclable(Page page)
{
  return PageIsNew(page) ||
  (GistPageIsDeleted(page) &&
   TransactionIdPrecedes(GistPageGetDeleteXid(page), RecentGlobalXmin));
}

Here RecentGlobalXmin can wraparound and page will become unrecyclable for half 
of xid cycle. Can we prevent it by resetting PageDeleteXid to 
InvalidTransactionId before doing RecordFreeIndexPage()?
(Seems like same applies to GIN...)


True, and B-tree has the same issue. I thought I saw a comment somewhere
in the B-tree code about that earlier, but now I can't find it. I
must've imagined it.

We could reset it, but that would require dirtying the page. That would
be just extra I/O overhead, if the page gets reused before XID
wraparound. We could avoid that if we stored the full XID+epoch, not
just XID. I think we should do that in GiST, at least, where this is
new. In the B-tree, it would require some extra code to deal with
backwards-compatibility, but maybe it would be worth it even there.


I suggest that we do the attached. It fixes this for GiST. The patch 
changes expands the "deletion XID" to 64-bits, and changes where it's 
stored. Instead of storing it pd_prune_xid, it's stored in the page 
contents. Luckily, a deleted page has no real content.


I think we should fix this in a similar manner in B-tree, too, but that 
can be done separately. For B-tree, we need to worry about 
backwards-compatibility, but that seems simple enough; we just need to 
continue to understand old deleted pages, where the deletion XID is 
stored in the page opaque field.


- Heikki
>From b7897577c83a81ec04394ce7113d1d8a47804086 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 4 Apr 2019 18:06:48 +0300
Subject: [PATCH 1/2] Refactor checks for deleted GiST pages.

The explicit check in gistScanPage() isn't currently really necessary, as
a deleted page is always empty, so the loop would fall through without
doing anything, anyway. But it's a marginal optimization, and it gives a
nice place to attach a comment to explain how it works.
---
 src/backend/access/gist/gist.c| 40 ---
 src/backend/access/gist/gistget.c | 14 +++
 2 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 2db790c840..028b06b264 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -693,14 +693,15 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 			continue;
 		}
 
-		if (stack->blkno != GIST_ROOT_BLKNO &&
-			stack->parent->lsn < GistPageGetNSN(stack->page))
+		if ((stack->blkno != GIST_ROOT_BLKNO &&
+			 stack->parent->lsn < GistPageGetNSN(stack->page)) ||
+			GistPageIsDeleted(stack->page))
 		{
 			/*
-			 * Concurrent split detected. There's no guarantee that the
-			 * downlink for this page is consistent with the tuple we're
-			 * inserting anymore, so go back to parent and rechoose the best
-			 * child.
+			 * Concurrent split or page deletion detected. There's no
+			 * guarantee that the downlink for this page is consistent with
+			 * the tuple we're inserting anymore, so go back to parent and
+			 * rechoose the best child.
 			 */
 			UnlockReleaseBuffer(stack->buffer);
 			xlocked = false;
@@ -719,9 +720,6 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 			GISTInsertStack *item;
 			OffsetNumber downlinkoffnum;
 
-			/* currently, internal pages are never deleted */
-			Assert(!GistPageIsDeleted(stack->page));
-
 			downlinkoffnum = gistchoose(state.r, stack->page, itup, giststate);
 			iid = PageGetItemId(stack->page, downlinkoffnum);
 			idxtuple = (IndexTuple) PageGetItem(stack->page, iid);
@@ -842,12 +840,13 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 	 * leaf/inner is enough to recognize split for root
 	 */
 }
-else if (GistFollowRight(stack->page) ||
-		 stack->parent->lsn < GistPageGetNSN(stack->page))
+else if ((GistFollowRight(stack->page) ||
+		  stack->parent->lsn < GistPageGetNSN(stack->page)) &&
+		 GistPageIsDeleted(stack->page))
 {
 	/*
-	 * The page was split while we momentarily unlocked the
-	 * page. Go back to parent.
+	 * The page was split or deleted while we momentarily
+	 * unlocked the page. Go back to parent.
 	 */
 	UnlockReleaseBuffer(stack->buffer);
 	xlocked = false;
@@ -856,18 +855,6 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
 }
 			}
 
-			/*
-			 * The page might have been deleted after we scanned the parent
-			 * and saw the downlink.
-			 */
-			if (GistPageIsDeleted(stack->page))
-			{
-UnlockReleaseBuffer(stack->buffer);
-xlocked = false;
-state.stack = stack = stack->parent;
-continue;
-	

Re: POC: GROUP BY optimization

2019-04-04 Thread Dmitry Dolgov
> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund  wrote:
>
> As nothing has happened since, I'm marking this as returned with
> feedback.

This patch was on my radar for some time in the past and we've seen use cases
where it could be pretty useful (probably even without the incremental sort
patch). I would like to make some progress here and see if it's possible to
continue it's development. I've attached the rebased version with a small
changes, e.g. I've created a separate patch with group by reordering tests to
make it easy to see what changes were introduced, and after some experiments
removed part that seems to duplicate "group by" reordering to follow "order
by". Also looks like it's possible to make these patches independent by having
a base patch with the isolated group_keys_reorder_by_pathkeys (they're
connected via n_preordered), but I haven't done this yet.

I went through the thread to summarize the objections, that were mentioned so
far. Most of them are related to the third patch in the series, where
reordering based on "ndistincs" is implemented, and are about cost_sort (all
the possible problems that could happen without proper cost estimation due to
non uniform distribution, different comparison costs and so on) and figuring
out how to limit number of possible combinations of pathkeys to compare. I
haven't looked at the proposed backtracking approach, but taking into account
that suggested patch for cost_sort [1] is RWF, I wonder what would be the best
strategy to proceed?

[1]: https://commitfest.postgresql.org/21/1706/


v11-0001-Add-tests-for-group-by-optimization.patch
Description: Binary data


v11-0002-Reorder-to-match-ORDER-BY-or-index.patch
Description: Binary data


v11-0003-Reorder-by-values-distribution.patch
Description: Binary data


Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> I'm not very sure why the integer/pointer confusion in pgstat_bestart
> >> doesn't cause hard crashes when using gss auth --- or does
> >> this suite not actually test that?
> 
> > Isn't it just saying that because of the implicit declaration..?
> > Once that's fixed, the integer/pointer warning will go away, but
> > it's actually a pointer in either case, hence why it isn't crashing.
> 
> Well, if the caller thinks what is being passed back is an int,
> it will do a 32-to-64-bit widening, which is almost certainly
> going to result in a corrupted pointer.

Oh, good point.  Interesting that it still works then.

I've got a fix for the missing prototypes, I hadn't noticed the issue
previously due to always building with SSL enabled as well.  I'm testing
with a non-SSL build and will push the fix shortly.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-04-04 Thread Robert Haas
On Thu, Apr 4, 2019 at 9:57 AM Antonin Houska  wrote:
> I think I finally understand. Originally I thought the question is how to
> compute correct page checksum while the hint bits can be changed w/o exclusive
> lock on the buffer. Now I realize that it's more about *recovery*: if the hint
> bit change is followed by a torn page write, the hint bit can get changed on
> disk but the checksum might not get updated. The wrong checksum is detected
> during recovery, but if XLOG does not contain the corresponding full page
> image, we're not able to recover.
>
> And with encryption, the consequence is even worse because torn page write
> causes not only wrong checksum of otherwise useful page, but really damaged
> page.

Correct.

> I'll enforce the FPW in the next version of the patch.

Cool.

I'm willing to put some effort into trying to get this into v13 if
you're willing to keep hacking on it, but there's probably a fair
amount to do and a year can go by in a hurry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I'm not very sure why the integer/pointer confusion in pgstat_bestart
>> doesn't cause hard crashes when using gss auth --- or does
>> this suite not actually test that?

> Isn't it just saying that because of the implicit declaration..?
> Once that's fixed, the integer/pointer warning will go away, but
> it's actually a pointer in either case, hence why it isn't crashing.

Well, if the caller thinks what is being passed back is an int,
it will do a 32-to-64-bit widening, which is almost certainly
going to result in a corrupted pointer.

> The test suite does test GSS authentication and GSS encryption.

Hm.  I'll poke at this more closely.

regards, tom lane




Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid

2019-04-04 Thread Noah Misch
On Wed, Apr 03, 2019 at 07:05:43PM -0700, Noah Misch wrote:
> On Mon, Apr 01, 2019 at 08:19:56AM +, Daniel Gustafsson wrote:
> > On Monday, April 1, 2019 12:42 AM, Noah Misch  wrote:
> > > On Fri, Mar 29, 2019 at 09:53:51AM +, Daniel Gustafsson wrote:
> > 
> > > > This seems like a case where it would be useful to log a shmdt() error 
> > > > or do
> > > > an Assert() around the success of the operation perhaps?
> > >
> > > I'll add the same elog(LOG) we have at other shmdt() sites. I can't think 
> > > of
> > > a site where we Assert() about the results of a system call. While shmdt()
> > > might be a justified exception, elog(LOG) seems reasonable.
> > 
> > Agreed, seems reasonable.
> 
> Pushed, but that broke two buildfarm members:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus=2019-04-04%2000%3A33%3A14
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis=2019-04-04%2000%3A33%3A13
> 
> I think the problem arose because these animals run on the same machine, and
> their test execution was synchronized to the second.  Two copies of the new
> test ran concurrently.  It doesn't tolerate that, owing to expectations about
> which shared memory keys are in use.  My initial thought is to fix this by
> having a third postmaster that runs throughout the test and represents
> ownership of a given port.  If that postmaster gets something other than the
> first shm key pertaining to its port, switch ports and try again.
> 
> I'll also include fixes for the warnings Andres reported on the
> pgsql-committers thread.

This thread's 2019-04-03 patches still break buildfarm members in multiple
ways.  I plan to revert them.  I'll wait a day or two before doing that, in
case more failure types show up.




Re: pg_upgrade: Pass -j down to vacuumdb

2019-04-04 Thread Tom Lane
Peter Eisentraut  writes:
> I think the real fix is to have pg_upgrade copy over the statistics.
> They are right there after all, just take them.

Well, it's not "just take them", we need to develop some infrastructure
that will permit coping with cross-version differences in the stats
storage.  This isn't insoluble, but it will take some work.  We had
a prior discussion that trailed off without much result:

https://www.postgresql.org/message-id/flat/724322880.K8vzik8zPz%40abook

regards, tom lane




Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > On Thu, Apr 4, 2019 at 05:20 Peter Eisentraut <
> > peter.eisentr...@2ndquadrant.com> wrote:
> >> Kerberos tests are now failing for me (macOS).
> 
> > Interesting, they work locally for me on Ubuntu.  Unfortunately, I don’t
> > have macOS.  This only happens when encryption is being used, presumably?
> > GSS authentication is still working fine?
> 
> The kerberos test suite passes for me on RHEL6 (kerberos 1.10.3),
> but I observe some compiler warnings that need to be dealt with:

Interesting, I don't see those with my build.  I'll have to figure out
why not.  Will fix them in any case.

> $ ./configure --with-gssapi ...
> $ time make -j8 -s
> be-secure-gssapi.c:597: warning: no previous prototype for 
> 'be_gssapi_get_auth'
> be-secure-gssapi.c:609: warning: no previous prototype for 'be_gssapi_get_enc'
> be-secure-gssapi.c:621: warning: no previous prototype for 
> 'be_gssapi_get_princ'
> pgstat.c: In function 'pgstat_bestart':
> pgstat.c:2986: warning: implicit declaration of function 'be_gssapi_get_auth'
> pgstat.c:2987: warning: implicit declaration of function 'be_gssapi_get_enc'
> pgstat.c:2990: warning: implicit declaration of function 'be_gssapi_get_princ'
> pgstat.c:2990: warning: passing argument 2 of 'strlcpy' makes pointer from 
> integer without a cast
> ../../../src/include/port.h:429: note: expected 'const char *' but argument 
> is of type 'int'
> All of PostgreSQL successfully made. Ready to install.
> 
> I'm not very sure why the integer/pointer confusion in pgstat_bestart
> doesn't cause hard crashes when using gss auth --- or does
> this suite not actually test that?

Isn't it just saying that because of the implicit declaration..?
Once that's fixed, the integer/pointer warning will go away, but
it's actually a pointer in either case, hence why it isn't crashing.

The test suite does test GSS authentication and GSS encryption.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PATCH v20] GSSAPI encryption support

2019-04-04 Thread Tom Lane
Stephen Frost  writes:
> On Thu, Apr 4, 2019 at 05:20 Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>> Kerberos tests are now failing for me (macOS).

> Interesting, they work locally for me on Ubuntu.  Unfortunately, I don’t
> have macOS.  This only happens when encryption is being used, presumably?
> GSS authentication is still working fine?

The kerberos test suite passes for me on RHEL6 (kerberos 1.10.3),
but I observe some compiler warnings that need to be dealt with:

$ ./configure --with-gssapi ...
$ time make -j8 -s
be-secure-gssapi.c:597: warning: no previous prototype for 'be_gssapi_get_auth'
be-secure-gssapi.c:609: warning: no previous prototype for 'be_gssapi_get_enc'
be-secure-gssapi.c:621: warning: no previous prototype for 'be_gssapi_get_princ'
pgstat.c: In function 'pgstat_bestart':
pgstat.c:2986: warning: implicit declaration of function 'be_gssapi_get_auth'
pgstat.c:2987: warning: implicit declaration of function 'be_gssapi_get_enc'
pgstat.c:2990: warning: implicit declaration of function 'be_gssapi_get_princ'
pgstat.c:2990: warning: passing argument 2 of 'strlcpy' makes pointer from 
integer without a cast
../../../src/include/port.h:429: note: expected 'const char *' but argument is 
of type 'int'
All of PostgreSQL successfully made. Ready to install.

I'm not very sure why the integer/pointer confusion in pgstat_bestart
doesn't cause hard crashes when using gss auth --- or does
this suite not actually test that?

regards, tom lane




Re: [HACKERS] generated columns

2019-04-04 Thread Amit Langote
On Thu, Apr 4, 2019 at 8:01 PM Peter Eisentraut
 wrote:
>
> On 2019-04-04 11:42, Amit Langote wrote:
> > Hmm, I'm afraid we might get bug reports if we go with this.  Why is it OK
> > to get null in this case when a user explicitly asked for 'foo'?
>
> The way stored generated columns work on foreign tables is that the
> to-be-stored value is computed, then given to the foreign table handler,
> which then has to store it, and then return it later when queried.
> However, since the backing store of a foreign table is typically
> modifiable directly by the user via other channels, it's possible to
> create situations where actually stored data does not satisfy the
> generation expression.  That is the case here.  I don't know of a
> principled way to prevent that.  It's just one of the problems that can
> happen when you store data in a foreign table:  You have very little
> control over what data actually ends up being stored.

OK,  thanks for explaining.  We do allow DEFAULT to be specified on
foreign tables, although locally-defined defaults have little meaning
if the FDW doesn't allow inserts.  Maybe same thing applies to
GENERATED AS columns.

Would it make sense to clarify this on CREATE FOREIGN TABLE page?

Thanks,
Amit




Re: pg_upgrade: Pass -j down to vacuumdb

2019-04-04 Thread Peter Eisentraut
On 2019-04-03 23:24, Justin Pryzby wrote:
> I just did a test on one of our large-but-not-huge customers.  With
> stats_target=1, analyzing a 145GB partitioned table looks like it'll take
> perhaps an hour; they have ~1TB data, so delaying services during ANALYZE 
> would
> nullify the utility of pg_upgrade.  I can restore the essential tables from
> backup in 15-30 minutes.

I think the real fix is to have pg_upgrade copy over the statistics.
They are right there after all, just take them.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-04-04 Thread Antonin Houska
Robert Haas  wrote:

> On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska  wrote:
> > > Hint bits also rely on this principle.  I thought there might be some
> > > interaction between this work and wal_log_hints for this reason, but I see
> > > nothing of that sort in the patch.
> >
> > I'm not sure if the hint bit is still a problem if we first copy the shared
> > buffer to backend-local memory and encrypt it there. That's what the patch
> > does.
> 
> That has the same problem that I described in the previous paragraph,
> except for the relation data files rather than the WAL itself.  See
> the manual's description of wal_log_hints:
> 
>
> When this parameter is on, the
> PostgreSQL
> server writes the entire content of each disk page to WAL during the
> first modification of that page after a checkpoint, even for
> non-critical modifications of so-called hint bits.
>
> 
> For encryption to work, you need that.  A hint bit write might convert
> the page to garbage, just as in the previous example, and this option
> make sure that if that happens, there will be an FPW, allowing you to
> recover...

I think I finally understand. Originally I thought the question is how to
compute correct page checksum while the hint bits can be changed w/o exclusive
lock on the buffer. Now I realize that it's more about *recovery*: if the hint
bit change is followed by a torn page write, the hint bit can get changed on
disk but the checksum might not get updated. The wrong checksum is detected
during recovery, but if XLOG does not contain the corresponding full page
image, we're not able to recover.

And with encryption, the consequence is even worse because torn page write
causes not only wrong checksum of otherwise useful page, but really damaged
page.

I'll enforce the FPW in the next version of the patch.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: COPY FROM WHEN condition

2019-04-04 Thread David Rowley
On Thu, 4 Apr 2019 at 18:20, Andres Freund  wrote:
> I'm pretty happy with this. I'm doing some minor changes (e.g. don't
> like the function comment formatting that much, the tableam callback
> needs docs, stuff like that), and then I'm going to push it tomorrow.
>
> I'm planning to attribute it to you, me, and Haribabu Kommi.

Sounds good.

>
> Oh, btw, is there a reason you're memset(0)'ing multiInsertInfo? Seems
> unnecessary, given that in all cases we're using it we're going to do
> CopyMultiInsertInfo_Init(). And IME valgrind and the compiler are more
> helpful when you don't just default initialize, because then they can
> tell when you forgot to initialize a field (say like
> CopyMultiInsertInfo_Init not initializing nbuffers).

At some point when I was hacking at it, I was getting a warning about
it being possibly uninitialised. I don't recall exactly how the code
was then, but I just tried removing it and I don't get any warning
now. So probably fine to remove.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Retronym: s/magnetic disk/main data/

2019-04-04 Thread David Fetter
On Thu, Apr 04, 2019 at 05:49:36PM +1300, Thomas Munro wrote:
> Hello,
> 
> I propose that in v13 we redefine "MD" (from md.c) to mean "main data"
> (instead of "magnetic disk").  That's the standard storage layout for
> typical table and index AMs.  As opposed to the proposed undo and SLRU
> SMGRs that provide layouts specialised for different life cycles.

Maybe we could use dd for "diamagnetic data" ;)

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: pg_upgrade version checking questions

2019-04-04 Thread Daniel Gustafsson
On Wednesday, March 27, 2019 1:43 PM, Christoph Berg  wrote:

> Re: Daniel Gustafsson 2019-03-26 
> pC-NMmh4vQLQP76YTwY4AuoD4OdNw9egikekyQpXFpgqmTlGjIZXOTd2W5RDZPpRski5N3ADRrLYgLk6QUuvmuT5fWC9acPAYyDU1AVxJcU=@yesql.se
>
> > 0003 - Make -B default to CWD and remove the exec_path check
> > Defaulting to CWD for the new bindir has the side effect that the default
> > sockdir is in the bin/ directory which may be less optimal.
>
> Hmm, I would have thought that the default for the new bindir is the
> directory where pg_upgrade is located, not the CWD, which is likely to
> be ~postgres or the like?

Yes, thinking on it that's obviously better.  The attached v2 repurposes the
find_my_exec() check to make the current directory of pg_upgrade the default
for new_cluster.bindir (the other two patches are left as they were).

cheers ./daniel



0001-Only-allow-upgrades-by-the-same-exact-version-new-v2.patch
Description: Binary data


0002-Check-all-used-executables-v2.patch
Description: Binary data


0003-Default-new-bindir-to-exec_path-v2.patch
Description: Binary data


Re: Inadequate executor locking of indexes

2019-04-04 Thread David Rowley
On Fri, 5 Apr 2019 at 02:22, David Rowley  wrote:
> v2 attached.

Wrong patch.  Here's what I meant to send.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


use_rellockmode_for_indexes_too_v3.patch
Description: Binary data


Re: Inadequate executor locking of indexes

2019-04-04 Thread David Rowley
On Thu, 4 Apr 2019 at 15:35, Amit Langote  wrote:
> BTW, get_actual_variable_range is static to selfuncs.c and other public
> functions that are entry point to get_actual_variable_range's
> functionality appear to require having built a RelOptInfo first, which
> means (even for third party code) having gone through get_relation_info
> and opened the indexes.  That may be too much wishful thinking though.

I think we should do this. We have the CheckRelationLockedByMe Asserts
to alert us if this ever gets broken. I think even if something added
a get_relation_info_hook to alter the indexes somehow, say to remove
one then it should still be okay as get_actual_variable_range() only
looks at index that are in that list and the other two functions are
dealing with Paths, which couldn't exist for an index that was removed
from RelOptInfo->indexlist.

> >> Also, I noticed that there is infer_arbiter_indexes() too, which opens the
> >> target table's indexes with RowExclusiveLock.  I thought for a second
> >> that's a index-locking site in the planner that you may have missed, but
> >> turns out it might very well be the first time those indexes are locked in
> >> a given insert query's processing, because query_planner doesn't need to
> >> plan access to the result relation, so get_relation_info is not called.
> >
> > I skimmed over that and thought that the rellockmode would always be
> > RowExclusiveLock anyway, so didn't change it. However, it would make
> > sense to use rellockmode for consistency. We're already looking up the
> > RangeTblEntry to get the relid, so getting the rellockmode is about
> > free.
>
> Yeah, maybe it'd be a good idea, if only for consistency, to fetch the
> rellockmode of the resultRelation RTE and use it for index_open.

I've changed infer_arbiter_indexes() too, but decided to use
rellockmode rather than NoLock since we're not using
RelOptInfo->indexlist.  If anything uses get_relation_info_hook to
remove indexes and then closes removed ones releasing the lock, then
we could end up with problems here.

v2 attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


use_rellockmode_for_indexes_too_v2.patch
Description: Binary data


Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-04-04 Thread Julien Rouhaud
On Thu, Apr 4, 2019 at 1:23 PM Masahiko Sawada  wrote:
>
> On Thu, Apr 4, 2019 at 1:26 PM Tsunakawa, Takayuki
>  wrote:
> >
> > From: Fujii Masao [mailto:masao.fu...@gmail.com]
> > > reloption for TOAST is also required?
> >
> > # I've come back to the office earlier than planned...
> >
> > Hm, there's no reason to not provide toast.vacuum_shrink_enabled.  Done 
> > with the attached patch.
> >
>
> Thank you for updating the patch!

+1!

> +vacuum_shrink_enabled,
> toast.vacuum_shrink_enabled
> (boolean)
> +
> + 
> + Enables or disables shrinking the table when it's vacuumed.
> + This also applies to autovacuum.
> + The default is true.  If true, VACUUM frees empty pages at the
> end of the table.
>
> "VACUUM" needs  or "vacuum" is more appropriate here?

also, the documentation should point out that freeing is not
guaranteed.  Something like

 + The default is true.  If true, VACUUM will try to free empty
pages at the end of the table.

> I'm not sure the consensus we got here but we don't make the vacuum
> command option for this?

I don't think here's a clear consensus, but my personal vote is to add
it, with  SHRINK_TABLE = [ force_on | force_off | default ] (unless a
better proposal has been made already)




Re: Refactoring the checkpointer's fsync request queue

2019-04-04 Thread Alvaro Herrera
On 2019-Apr-04, Thomas Munro wrote:

> I don't think it's project policy to put a single typedef into its own
> header like that, and I'm not sure where else to put it.

shrug.  Looks fine to me.  I suppose if we don't have it anywhere, it's
just because we haven't needed that particular trick yet.  Creating a
file with a lone typedef seems better than using uint32 to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




  1   2   >