date:20180504

Re: [HACKERS] Clock with Adaptive Replacement

2018-05-04 Thread Andrey Borodin

Hi!

> 4 мая 2018 г., в 16:05, Юрий Соколов  написал(а):
> 
> I didn't suggest log scale of usages, but rather "replacement-period based" 
> increment: usage count could be incremented at most once in NBlocks/32 
> replaced items. Once it is incremented, its "replacement time" is remembered, 
> and next NBlocks/32 replacements usage count of this buffer doesn't increment.
> This way, increment is synchronized with replacement activity.
But you loose difference between "touched once" and "actively used". Log scale 
of usage solves this: usage count grows logarithmically, but drains linearly.

> 
> Digging further, I suggest as improvement of GClock algorithm:
> - placing new buffer with usage count = 0 (and next NBlock/32 replacements 
> its usage count doesn't increased)
> - increment not by 1, but by 8 (it simulates "hot queue" of popular 
> algorithms) with limit 32.
> - scan at most 25 buffers for eviction. If no buffer with zero usage count 
> found, the least used buffer (among scanned 25) is evicted.
> (new buffers are not evicted during their first NBlock/32 replacements).
> 

I do not understand where these numbers come from...

Best regards, Andrey Borodin.

Re: Draft release notes are up

2018-05-04 Thread Tom Lane

Michael Paquier  writes:
> On Fri, May 04, 2018 at 07:00:18PM -0400, Tom Lane wrote:
> +
> +
> + 
> +  Support building with Microsoft Visual Studio 2015 (Michael Paquier)
> + 

> This will only be part of release-9.5.sgml in the final version, right?

Right, each item is only for the branch(es) indicated in its comment.
I'll reassemble them once people have vetted the text.

regards, tom lane

[doc fix] Trivial fix for PG 11 partitioning

2018-05-04 Thread Tsunakawa, Takayuki

Hello,

Attached is a trivial documentation fix for PG 11 partitioning, which includes:

* pg_partition fails to mention hash for the strategy.
* Partitioning key column values can now be modified, which results in row 
movement between partitions.

Regards
Takayuki Tsunakawa




partition_doc_fix.patch
Description: partition_doc_fix.patch

Re: Draft release notes are up

2018-05-04 Thread Michael Paquier

On Fri, May 04, 2018 at 07:00:18PM -0400, Tom Lane wrote:
> First cut at release notes for next week's minor releases is up at
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=488ccfe40a865e3f3c6343e2de026c37ba5a7d50
> 
> If you prefer not to read XML markup, they should appear in the devel docs
> on the website after guaibasaurus's next run, a couple hours from now.

+
+
+ 
+  Support building with Microsoft Visual Studio 2015 (Michael
Paquier)
+ 
+
+ 
+  Various fixes needed for VS2015 compatibility were previously
+  back-patched into the 9.5 branch, but one was missed.
+ 
+

This will only be part of release-9.5.sgml in the final version, right?
--
Michael


signature.asc
Description: PGP signature

Draft release notes are up

2018-05-04 Thread Tom Lane

First cut at release notes for next week's minor releases is up at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=488ccfe40a865e3f3c6343e2de026c37ba5a7d50

If you prefer not to read XML markup, they should appear in the devel docs
on the website after guaibasaurus's next run, a couple hours from now.

regards, tom lane

Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2018-05-04 Thread Heikki Linnakangas


On 04/05/18 10:05, Heikki Linnakangas wrote:

On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:

At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas  wrote in 
<89e33d4f-5c75-0738-3dcb-44c4df59e...@iki.fi>

Looking at the patch linked above
(https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):


--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11693,6 +11693,10 @@ retry:
Assert(reqLen <= readLen);
*readTLI = curFileTLI;
+
+ if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
readBuf))
+   goto next_record_is_invalid;
+
return readLen;
next_record_is_invalid:


What is the point of adding this XLogReaderValidatePageHeader() call?
The caller of this callback function, ReadPageInternal(), checks the
page header anyway. Earlier in this thread, you said:


Without the lines, server actually fails to start replication.

(I try to remember the details...)

The difference is whether the function can cause retry for the
same portion of a set of continued records (without changing the
plugin API). XLogPageRead can do that. On the other hand all
ReadPageInternal can do is just letting the caller ReadRecords
retry from the beginning of the set of continued records since
the caller side handles only complete records.

In the failure case, in XLogPageRead, when read() fails, it can
try the next source midst of a continued records. On the other
hand if the segment was read but it was recycled one, it passes
"success" to ReadPageInternal and leads to retring from the
beginning of the recrod. Infinite loop.


I see. You have the same problem if you have a WAL file that's corrupt
in some other way, but one of the sources would have a correct copy.
ValidXLogPageHeader() only checks the page header, after all. But unlike
a recycled WAL segment, that's not supposed to happen as part of normal
operation, so I guess we can live with that.


Pushed this now, after adding some comments. Thanks!


Calling XLogReaderValidatePageHeader in ReadPageInternal is
redundant, but removing it may be interface change of xlogreader
plugin and I am not sure that is allowed.


We should avoid doing that in back-branches, at least. But in 'master',
I wouldn't mind redesigning the API. Dealing with all the retrying is
pretty complicated as it is, if we can simplify that somehow, I think
that'd be good.


I spent some time musing on what a better API might look like. We could 
remove the ReadPage callback, and instead have XLogReadRecord return a 
special return code to mean "give me more data". I'm thinking of 
something like:


/* return code of XLogReadRecord() */
typedef enum
{
XLREAD_SUCCESS,
XLREAD_INVALID_RECORD,  /* a record was read, but it was corrupt */
XLREAD_INVALID_PAGE,/* the page that was supplied looks invalid. */
XLREAD_NEED_DATA,   /* caller should place more data in buffer, 
and retry */

} XLogReadRecord_Result;


And the calls to XLogReadRecord() would look something like this:

for(;;)
{
rc = XLogReadRecord(reader, startptr, errormsg);

if (rc == XLREAD_SUCCESS)
{
/* great, got record */
}
if (rc == XLREAD_INVALID_PAGE || XLREAD_INVALID_RECORD)
{
elog(ERROR, "invalid record");
}
if (rc == XLREAD_NEED_DATA)
{
/*
 * Read a page from disk, and place it into reader->readBuf
 */
XLogPageRead(reader->readPagePtr, /* page to read */
 reader->reqLen   /* # of bytes to read */ );
/*
 * Now that we have read the data that XLogReadRecord()
 * requested, call it again.
 */
continue;
}
}

So instead of having a callback, XLogReadRecord() would return 
XLREAD_NEED_DATA. The caller would then need to place that data into the 
buffer, and call it again. If a record spans multiple pages, 
XLogReadRecord() would return with XLREAD_NEED_DATA multiple times, to 
read each page.


The important difference for the bug we're discussing on this thread is 
is that if you passed an invalid page to XLogReadRecord(), it would 
return with XLREAD_INVALID_PAGE. You could then try reading the same 
page from a different source, and call XLogReadRecord() again, and it 
could continue where it was left off, even if it was in the middle of a 
continuation record.


This is clearly not backpatchable, but maybe something to think about 
for v12.


- Heikki

Re: perlcritic (was Re: pgsql: Fix precedence problem in new Perl code.)

2018-05-04 Thread Mike Blackwell

Alvaro,

I didn't mean to imply otherwise.  Our settings here are probably
different.

Good point on the git log --grep.  I'll try to remember that in the future.

Mike

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RRD*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *

On Fri, May 4, 2018 at 4:36 PM, Alvaro Herrera 
wrote:

> Moving discussion to -hackers list.
>
> Mike Blackwell wrote:
> > I didn't see a .perlcriticrc file in the project, so ran with our local
> > settings.
> >
> > With those, perlcritic is pretty unhappy, even at -4, though I don't see
> > anything that pops out as potentially bug-inducing.
>
> Uh, we've certainly fixed things to appease perlcritic before (see git
> log --grep perlcritic).  Maybe we need to come up with some .rc file to
> our liking and try to adhere to it.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: --enable-ccache configure option

2018-05-04 Thread Tom Lane

Alvaro Herrera  writes:
> Andres Freund wrote:
>>> What exactly would it do? I use ccache all the time, have for years,
>>> but it's not clear to me what can usefully be done by configure.

>> Use ccache without having to set it up individually for CC, CLANG, CXX.

> Umm, I just add /usr/lib/ccache somewhere in PATH (ahead of the compiler
> binaries) which is enough to make it all work.

On my preferred distros (ie Red Hat), ccache is used automatically;
there is nothing that needs to be done in configure AFAIK.  (It looks
like this happens by dint of the same thing Alvaro mentions, ie the
standard PATH has /usr/lib(64)?/ccache in front of the actual compilers.)
It'd be important not to break such setups in trying to make it happen
elsewhere.

regards, tom lane

Re: pageinspect get_raw_page() option to return invalid pages

2018-05-04 Thread Peter Geoghegan

On Fri, May 4, 2018 at 11:56 AM, Andres Freund  wrote:
> Can you expand on what they want?
>
> - Avoid polluting caches? Why's the ringbuffer logic not good enough?
> - Continue after a checksum or similar failure? That seems a bit useless
>   for amcheck imo? You know there's corruption at that point after all.
> - Read on disk data, bypassing shared buffers? That'd present a lot of
>   coherency issues, no?

I think that "Read on-disk data" would be a compelling feature in
certain environments. It would present some concurrency issues, but
those seem solvable. The only thing that the "!readonly &&
!heapallindexed" checks need that complicates things is a snapshot
that prevents concurrent recycling by VACUUM. If we gave up on the
cross-page invariant check, then we wouldn't even need to worry about
concurrent recycling by VACUUM, while still offering almost the same
coverage as "!readonly && !heapallindexed" (I suppose we'd also have
to give up on the level cross-check that you suggested when v1 of
amcheck went in, too, but I think that that's it).

Maybe it would a better use of my time to focus on making this
accessible to backup tools, that should ideally work without needing
to acquire any MVCC snapshot. Probably from a front-end utility. We'd
need to export at least some operator class functionality to make that
work, though.

-- 
Peter Geoghegan

Re: Built-in connection pooling

2018-05-04 Thread Merlin Moncure

On Fri, May 4, 2018 at 2:25 PM, Robert Haas  wrote:
> On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure  wrote:
>> If we are breaking 1:1 backend:session relationship, what controls
>> would we have to manage resource consumption?
>
> I mean, if you have a large number of sessions open, it's going to
> take more memory in any design.  If there are multiple sessions per
> backend, there may be some possibility to save memory by allocating it
> per-backend rather than per-session; it shouldn't be any worse than if
> you didn't have pooling in the first place.

It is absolutely worse, or at least can be.   plpgsql plan caches can
be GUC dependent due to search_path; you might get a different plan
depending on which tables resolve into the function.  You might
rightfully regard this as an edge case but there are other 'leakages',
for example, sessions with different planner settings obviously ought
not to share backend plans.  Point being, there are many
interdependent things in the session that will make it difficult to
share some portions but not others.

> However, I think that's probably worrying about the wrong end of the
> problem first.  IMHO, what we ought to start by doing is considering
> what a good architecture for this would be, and how to solve the
> general problem of per-backend session state.  If we figure that out,
> then we could worry about optimizing whatever needs optimizing, e.g.
> memory usage.

Exactly -- being able to manage down resource consumption by
controlling session count is a major feature that ought not to be
overlooked. So I'm kind of signalling that if given a choice between
that (funneling a large pool of connections down to a smaller number
of backends) and externalized shared sessions I'd rather have the
funnel; it solves a number of very important problems with respect to
server robustness.  So I'm challenging (in a friendly, curious way) if
breaking session:backend 1:1 is really a good idea.  Maybe a
connection pooler implementation can do both of those things or it's
unfair to expect an implementation to do both of them.

merlin

perlcritic (was Re: pgsql: Fix precedence problem in new Perl code.)

2018-05-04 Thread Alvaro Herrera

Moving discussion to -hackers list.

Mike Blackwell wrote:
> I didn't see a .perlcriticrc file in the project, so ran with our local
> settings.
> 
> With those, perlcritic is pretty unhappy, even at -4, though I don't see
> anything that pops out as potentially bug-inducing.

Uh, we've certainly fixed things to appease perlcritic before (see git
log --grep perlcritic).  Maybe we need to come up with some .rc file to
our liking and try to adhere to it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: --enable-ccache configure option

2018-05-04 Thread Alvaro Herrera

Andres Freund wrote:

> > On Fri, May 4, 2018 at 1:58 PM, Andres Freund  wrote:

> > > ISTM --enable-ccache would simplify setup for a number of people? And it
> > > shouldn't be hard to implement.
> > 
> > What exactly would it do? I use ccache all the time, have for years,
> > but it's not clear to me what can usefully be done by configure.
> 
> Use ccache without having to set it up individually for CC, CLANG, CXX.

Umm, I just add /usr/lib/ccache somewhere in PATH (ahead of the compiler
binaries) which is enough to make it all work.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: --enable-ccache configure option

2018-05-04 Thread Andrew Dunstan

On Fri, May 4, 2018 at 5:07 PM, Andres Freund  wrote:
> On 2018-05-04 17:04:38 -0400, Andrew Dunstan wrote:
>> On Fri, May 4, 2018 at 1:58 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > Now that there's several tools used during compilation that can benefit
>> > from ccache I wonder if it's time to add an option to configure for
>> > setting up ccache across the board.
>> >
>> > ISTM --enable-ccache would simplify setup for a number of people? And it
>> > shouldn't be hard to implement.
>> >
>>
>>
>> What exactly would it do? I use ccache all the time, have for years,
>> but it's not clear to me what can usefully be done by configure.
>
> Use ccache without having to set it up individually for CC, CLANG, CXX.
>

Ah, I see. Not a bad idea. The only thing is it should look to make
sure these aren't already set up with ccache - if they are it should
leave them alone. It's also possible to set up ccache in a transparent
mode, where you don't have to call ccache explicitly. Not sure what
the implications of that would be.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: --enable-ccache configure option

2018-05-04 Thread Andres Freund

On 2018-05-04 17:04:38 -0400, Andrew Dunstan wrote:
> On Fri, May 4, 2018 at 1:58 PM, Andres Freund  wrote:
> > Hi,
> >
> > Now that there's several tools used during compilation that can benefit
> > from ccache I wonder if it's time to add an option to configure for
> > setting up ccache across the board.
> >
> > ISTM --enable-ccache would simplify setup for a number of people? And it
> > shouldn't be hard to implement.
> >
> 
> 
> What exactly would it do? I use ccache all the time, have for years,
> but it's not clear to me what can usefully be done by configure.

Use ccache without having to set it up individually for CC, CLANG, CXX.

Greetings,

Andres Freund

Re: MSYS2 and pg_upgrade testing

2018-05-04 Thread Andrew Dunstan

On Fri, May 4, 2018 at 2:30 PM, Robert Haas  wrote:
> On Thu, May 3, 2018 at 5:25 PM, Andrew Dunstan
>  wrote:
>> I've been getting an Msys2 environment working, and will soon document
>> how to build with this environment. There are several nice things
>> about it, including a modern version of perl and proper support for
>> the mingw-w64 compilers.
>>
>> So far the only thing I have found that needs to be changed for us to
>> support it is this, which is so trivial and low risk that I propose to
>> backpatch it to all live releases:
>>
>> diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
>> index a805018..45ccd8f 100644
>> --- a/src/bin/pg_upgrade/test.sh
>> +++ b/src/bin/pg_upgrade/test.sh
>> @@ -34,7 +34,7 @@ standard_initdb() {
>>  testhost=`uname -s`
>>
>>  case $testhost in
>> -   MINGW*)
>> +   MINGW*|MSYS*)
>> LISTEN_ADDRESSES="localhost"
>> PGHOST=localhost
>> ;;
>
> It's astonishing that's the only thing that needs to change!
>


I might find more stuff as I go along, but this change was enough for
me to be able to do a full buildfarm client run.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: --enable-ccache configure option

2018-05-04 Thread Andrew Dunstan

On Fri, May 4, 2018 at 1:58 PM, Andres Freund  wrote:
> Hi,
>
> Now that there's several tools used during compilation that can benefit
> from ccache I wonder if it's time to add an option to configure for
> setting up ccache across the board.
>
> ISTM --enable-ccache would simplify setup for a number of people? And it
> shouldn't be hard to implement.
>


What exactly would it do? I use ccache all the time, have for years,
but it's not clear to me what can usefully be done by configure.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: VM map freeze corruption

2018-05-04 Thread Alvaro Herrera

Pavan Deolasee wrote:

> I wonder if we should just avoid initialising those variables at the top
> and take compiler's help to detect any missed branches. That way, we shall
> know exactly why and where we are making them true/false. I also think that
> we should differentiate between scenarios where xmin/xmax is already frozen
> and scenarios where we are freezing them now.

After staring at this code for a while, it strikes me that the xmin case
is different enough from the xmax case that it works better to let the
logic be different; namely for xmin a single bool suffices.  I kept your
logic for xmax, except that xmax_already_frozen requires initialization
(to 'false') or we risk having it end up as 'true' due to random garbage
in the stack and set totally_frozen_p to true spuriously because of this
(or spuriously fail the assertion in line 6942).  I noticed this when
running Dan's test case with your patch -- the assertion failed for the
xmin case:

TRAP: FailedAssertion(«!(!xmin_already_frozen)», Archivo: 
«/pgsql/source/master/src/backend/access/heap/heapam.c», Línea: 6845)

That led me to the rewrite of the xmin logic, and it also led me to
looking deeper at the xmax case (with which I didn't run into any
assertion failure, but it's clear that it could definitely happen if the
stack happens to have that bit set.)

I also chose to accept the suggestion in XXX to throw an error:

if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
...
else if (TransactionIdIsNormal(xid))
...
else if ((tuple->t_infomask & HEAP_XMAX_INVALID) ||
 !TransactionIdIsValid(HeapTupleHeaderGetRawXmax(tuple)))
else
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg_internal("found xmax %u (infomask 0x%04x) not frozen, 
not multi, not normal",
 xid, tuple->t_infomask)));

There's no place else in the backend where we report an infomask, but in
this case it seems warranted (for forensics) if this elog ever fires.
The tests involved are:

which means that this ereport could fire is if the transaction is
BootstrapTransactionId or FrozenTransactionId.  In either case VACUUM
should have removed the tuple as dead, so these cases shouldn't ever
occur.

(Looking at the other caller for this code, which is cluster.c for table
rewrites, it appears that dead tuples are not passed to
heap_freeze_tuple, so xmax=BootstrapXid/FrozenXid should not be a
problem there either.)

Patch attached.  I intend to push this soon, to have it in next week's
set.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1a672150be..72395a50b8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6803,9 +6803,10 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
  xl_heap_freeze_tuple *frz, 
bool *totally_frozen_p)
 {
boolchanged = false;
-   boolfreeze_xmax = false;
+   boolxmax_already_frozen = false;
+   boolxmin_frozen;
+   boolfreeze_xmax;
TransactionId xid;
-   booltotally_frozen = true;

frz->frzflags = 0;
frz->t_infomask2 = tuple->t_infomask2;
@@ -6814,6 +6815,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,

/* Process xmin */
xid = HeapTupleHeaderGetXmin(tuple);
+   xmin_frozen = ((xid == FrozenTransactionId) ||
+  HeapTupleHeaderXminFrozen(tuple));
if (TransactionIdIsNormal(xid))
{
if (TransactionIdPrecedes(xid, relfrozenxid))
@@ -6832,9 +6835,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,

frz->t_infomask |= HEAP_XMIN_FROZEN;
changed = true;
+   xmin_frozen = true;
}
-   else
-   totally_frozen = false;
}

/*
@@ -6857,9 +6859,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,

relfrozenxid, relminmxid,

cutoff_xid, cutoff_multi, &flags);

-   if (flags & FRM_INVALIDATE_XMAX)
-   freeze_xmax = true;
-   else if (flags & FRM_RETURN_IS_XID)
+   freeze_xmax = (flags & FRM_INVALIDATE_XMAX);
+
+   if (flags & FRM_RETURN_IS_XID)
{
/*
 * NB -- some of these transformations are only valid 
because we
@@ -6873,7 +6875,6 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
if (flags & FRM_MARK_COMMITTED)
frz->t_infomask |= HE

PostgreSQL 11 Beta 1 Release: 2018-05-24

2018-05-04 Thread Jonathan S. Katz

Hi,

The Release Management Team is pleased to announce that
the release date for PostgreSQL 11 Beta 1 is set to be 2018-05-24.

We’re excited to make the Beta available for testing and receive
some early feedback around the latest major release of PostgreSQL.

Please let us know if you have any questions.

Thanks,

Jonathan

Re: Built-in connection pooling

2018-05-04 Thread Robert Haas

On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure  wrote:
> If we are breaking 1:1 backend:session relationship, what controls
> would we have to manage resource consumption?

I mean, if you have a large number of sessions open, it's going to
take more memory in any design.  If there are multiple sessions per
backend, there may be some possibility to save memory by allocating it
per-backend rather than per-session; it shouldn't be any worse than if
you didn't have pooling in the first place.

However, I think that's probably worrying about the wrong end of the
problem first.  IMHO, what we ought to start by doing is considering
what a good architecture for this would be, and how to solve the
general problem of per-backend session state.  If we figure that out,
then we could worry about optimizing whatever needs optimizing, e.g.
memory usage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

genbki.pl not quoting keywords in postgres.bki output

2018-05-04 Thread Mark Dilger

Hackers,

There are not yet any examples in the postgres sources where this
oversight causes problems, but in genbki.pl, where it makes the
decision whether to quote a token:

# Quote value if needed.  We need not quote values that satisfy
# the "id" pattern in bootscanner.l, currently "[-A-Za-z0-9_]+".
$bki_value = sprintf(qq'"%s"', $bki_value)
  if length($bki_value) == 0
  or $bki_value =~ /[^-A-Za-z0-9_]/;

it should also quote anything that is a keyword, such as "open", as
otherwise you get a syntax error during initdb.

It might be nice to quote the keywords as a defense against this in
future catalog changes in the public sources. As an example, try adding
the following:

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66c6c224a8..d48aeb4fd3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10202,4 +10202,8 @@
   proisstrict => 'f', prorettype => 'bool', proargtypes => 'oid int4 int4 any',
   proargmodes => '{i,i,i,v}', prosrc => 'satisfies_hash_partition' },
 
+{ oid => '',
+  proname => 'open', prolang => '14', prorettype => 'bool',
+  proargtypes => 'text',
+  prosrc => 'select true' }, # prosrc doesn't matter for this example of 
genbki behavior
 ]


Perhaps you don't think that is a big enough problem to be worth
guarding against.

This patch is not that complicated, but it does create a new coding
requirement to keep bootparse.y and genbki.pl from getting out of sync.
It might be simpler to just change genbki.pl to quote everything rather
than applying this patch.  I don't have an opinion on that.

mark



genbki_bootparse_keywords.patch.1
Description: Binary data

Re: Global snapshots

2018-05-04 Thread Robert Haas

On Tue, May 1, 2018 at 5:02 PM, Stas Kelvich  wrote:
> Yes, that totally possible. On both systems you need:

Cool.

> * set track_global_snapshots='on' -- this will start writing each
>   transaction commit sequence number to SRLU.
> * set global_snapshot_defer_time to 30 seconds, for example -- this
>   will delay oldestXmin advancement for specified amount of time,
>   preserving old tuples.

So, is the idea that we'll definitely find out about any remote
transactions within 30 seconds, and then after we know about remote
transactions, we'll hold back OldestXmin some other way?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pageinspect get_raw_page() option to return invalid pages

2018-05-04 Thread Andres Freund

On 2018-05-04 11:53:25 -0700, Peter Geoghegan wrote:
> On Fri, May 4, 2018 at 11:46 AM, Andres Freund  wrote:
> > Could you expand on that? Are you envisioning an option to
> > ReadBufferExtended()? Because that's certainly not what I'm thinking of
> > - it seems dangerous to populate shared buffers with an invalid
> > page. Therefore I was more thinking to falling back to smgrread() or
> > such.
> 
> I'm not envisaging anything specific just yet. It would be nice if
> amcheck had an option that bypassed shared_buffers, because users want
> that. That's all.

Can you expand on what they want?

- Avoid polluting caches? Why's the ringbuffer logic not good enough?
- Continue after a checksum or similar failure? That seems a bit useless
  for amcheck imo? You know there's corruption at that point after all.
- Read on disk data, bypassing shared buffers? That'd present a lot of
  coherency issues, no?

Greetings,

Andres Freund

Re: pageinspect get_raw_page() option to return invalid pages

2018-05-04 Thread Peter Geoghegan

On Fri, May 4, 2018 at 11:46 AM, Andres Freund  wrote:
> Could you expand on that? Are you envisioning an option to
> ReadBufferExtended()? Because that's certainly not what I'm thinking of
> - it seems dangerous to populate shared buffers with an invalid
> page. Therefore I was more thinking to falling back to smgrread() or
> such.

I'm not envisaging anything specific just yet. It would be nice if
amcheck had an option that bypassed shared_buffers, because users want
that. That's all.

-- 
Peter Geoghegan

Re: pageinspect get_raw_page() option to return invalid pages

2018-05-04 Thread Andres Freund

On 2018-05-04 11:42:35 -0700, Peter Geoghegan wrote:
> On Fri, May 4, 2018 at 11:41 AM, Andres Freund  wrote:
> > Hi,
> >
> > Currently there's no convenient way to get a corrupted page (be it a
> > checksum failure, corrupted pd_upper/lower or whatnot) via pageinspect's
> > get_raw_page(). Given that pageinspect is kind of our tool to do inspect
> > issues on a data level that's not so great.
> >
> > I therefore propose that we add an *option* that bypasses shared buffers
> > and reads the underlying data directly. And then skips over the
> > validation.  I'm not 100% sure if this should be done unconditionally,
> > or only if the item wasn't found in cache. There's some coherency
> > differences, obviously.
> 
> +1. This would be a good option for amcheck, too.

Could you expand on that? Are you envisioning an option to
ReadBufferExtended()? Because that's certainly not what I'm thinking of
- it seems dangerous to populate shared buffers with an invalid
page. Therefore I was more thinking to falling back to smgrread() or
such.

Greetings,

Andres Freund

Re: pageinspect get_raw_page() option to return invalid pages

2018-05-04 Thread Peter Geoghegan

On Fri, May 4, 2018 at 11:41 AM, Andres Freund  wrote:
> Hi,
>
> Currently there's no convenient way to get a corrupted page (be it a
> checksum failure, corrupted pd_upper/lower or whatnot) via pageinspect's
> get_raw_page(). Given that pageinspect is kind of our tool to do inspect
> issues on a data level that's not so great.
>
> I therefore propose that we add an *option* that bypasses shared buffers
> and reads the underlying data directly. And then skips over the
> validation.  I'm not 100% sure if this should be done unconditionally,
> or only if the item wasn't found in cache. There's some coherency
> differences, obviously.

+1. This would be a good option for amcheck, too.


-- 
Peter Geoghegan

pageinspect get_raw_page() option to return invalid pages

2018-05-04 Thread Andres Freund

Hi,

Currently there's no convenient way to get a corrupted page (be it a
checksum failure, corrupted pd_upper/lower or whatnot) via pageinspect's
get_raw_page(). Given that pageinspect is kind of our tool to do inspect
issues on a data level that's not so great.

I therefore propose that we add an *option* that bypasses shared buffers
and reads the underlying data directly. And then skips over the
validation.  I'm not 100% sure if this should be done unconditionally,
or only if the item wasn't found in cache. There's some coherency
differences, obviously.

We could also just make it two functions, instead of a parameter.

It's unfortunately not entirely trivial to access specific blocks with
pg_read_binary_file() - one needs a query that deals with block sizes,
segment numbers, segment sizes etc...

Greetings,

Andres Freund

Re: MSYS2 and pg_upgrade testing

2018-05-04 Thread Robert Haas

On Thu, May 3, 2018 at 5:25 PM, Andrew Dunstan
 wrote:
> I've been getting an Msys2 environment working, and will soon document
> how to build with this environment. There are several nice things
> about it, including a modern version of perl and proper support for
> the mingw-w64 compilers.
>
> So far the only thing I have found that needs to be changed for us to
> support it is this, which is so trivial and low risk that I propose to
> backpatch it to all live releases:
>
> diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
> index a805018..45ccd8f 100644
> --- a/src/bin/pg_upgrade/test.sh
> +++ b/src/bin/pg_upgrade/test.sh
> @@ -34,7 +34,7 @@ standard_initdb() {
>  testhost=`uname -s`
>
>  case $testhost in
> -   MINGW*)
> +   MINGW*|MSYS*)
> LISTEN_ADDRESSES="localhost"
> PGHOST=localhost
> ;;

It's astonishing that's the only thing that needs to change!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--enable-ccache configure option

2018-05-04 Thread Andres Freund

Hi,

Now that there's several tools used during compilation that can benefit
from ccache I wonder if it's time to add an option to configure for
setting up ccache across the board.

ISTM --enable-ccache would simplify setup for a number of people? And it
shouldn't be hard to implement.

Greetings,

Andres Freund

Re: nbtsort.c performs unneeded (though harmless) truncation

2018-05-04 Thread Peter Geoghegan

On Fri, May 4, 2018 at 2:39 AM, Teodor Sigaev  wrote:
> Thank you, pushed.

Thanks.

-- 
Peter Geoghegan

Re: Explain buffers wrong counter with parallel plans

2018-05-04 Thread Robert Haas

On Wed, May 2, 2018 at 11:37 AM, Adrien Nayrat
 wrote:
> In 9.6 gather node reports sum of buffers for main process + workers. In 10,
> gather node only reports buffers from the main process.

Oh, really?  Well, that sounds like a bug.  Amit seems to think it's
expected behavior, but I don't know why it should be.  The commit
message makes it sound like it's just refactoring, but in fact it
seems to have made a significant behavior change that doesn't look
very desirable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Negative DST, or, should we delay using tzdata 2018e update?

2018-05-04 Thread Andres Freund

Hi,m

On 2018-05-04 10:58:22 -0400, Tom Lane wrote:
> On balance I'm leaning towards option #2, ie go with the new data but
> wait till just after the minor releases.  Thoughts?

+1. Should have a beta in that timeframe as well, giving more people a
chance to notice.

Greetings,

Andres Freund

Re: GSoC 2018: thrift encoding format

2018-05-04 Thread Stephen Frost

Greetings,

* Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote:
> > I understand that you're open to having it as a new data type or as a
> > bytea, but I don't agree.  This should be a new data type, just as json
> > is a distinct data type and so is jsonb.
> 
> Could you please explain in a little more detail why you believe so?

As mentioned elsewhere, there's multiple ways to encode thrift, no?  We
should pick which one makes sense and make that the interface to the
data type and then we might actually store the data differently, not to
mention that we'll likely want to build on things like indexing
capabilities to this data type, as we have for jsonb, and that's much
cleaner to do with a proper data type than if everyone has to use bytea
to store the data and then functional indexes (if we could even make
that happen...  I'm not thrilled with such an idea in any case).

Data validation is another thing- if it's a thrift data type then we can
validate that it's correct on the way in, and depend on that correctness
on the way out (to some extent- obviously we have to be wary of
corruption possibilities and such).

We could toss out all of our data types and store everything as bytea's
if we wanted to, but we don't, and for quite a few good reasons, these
are just a couple that I'm thinking of off-hand.

> Also I wonder whether in your opinion the extension should provide
> implicit casts from/to bytea as well.

I wouldn't make them implicit...

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: GSoC 2018: thrift encoding format

2018-05-04 Thread Aleksander Alekseev

Hello Stephen,

> Perhaps the design decisions aren't all made beforehand, but they also
> shouldn't be made in a vacuum- there should be discussions on -hackers
> about what the right decision is for a given aspect and that's what
> should be worked towards.

+1, agree. 

> > Personally I would probably just write a Thrift<->JSONB converter. But
> > there are pros and cons of this approach. For instance, CPU and memory
> > overhead for creating and storing temporary JSONB object is an obvious
> > drawback. On the other hand there are time limits for this project and
> > thus it makes sense to implement a feature as fast and as simple as
> > possible, and optimize it later (if necessary). 
> 
> Just having such a convertor would reduce the usefulness of this
> extension dramatically, wouldn't it?  Considering the justification for
> the extension used on the GSoC project page, it certainly strikes me as
> losing most of the value if we just convert to JSONB.
> 
> > Maybe Charles likes to optimize everything. In this case he may choose
> > to implement all the getters and setters from scratch. This doesn't
> > exclude possibility of implementing the Thrift<->JSONB converter later.
> 
> Having a way to cast between the two is entirely reasonable, imv, but
> that's very different from having the data only able to be stored as
> JSONB..

Good point.

> I understand that you're open to having it as a new data type or as a
> bytea, but I don't agree.  This should be a new data type, just as json
> is a distinct data type and so is jsonb.

Could you please explain in a little more detail why you believe so?
Also I wonder whether in your opinion the extension should provide
implicit casts from/to bytea as well.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: Built-in connection pooling

2018-05-04 Thread Merlin Moncure

On Thu, May 3, 2018 at 12:01 PM, Robert Haas  wrote:
> On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure  wrote:
>> What _I_ (maybe not others) want is a
>> faster pgbouncer that is integrated into the database; IMO it does
>> everything exactly right.
>
> I have to admit that I find that an amazing statement.  Not that
> pgbouncer is bad technology, but saying that it does everything
> exactly right seems like a vast overstatement.  That's like saying
> that you don't want running water in your house, just a faster motor
> for the bucket you use to draw water from the well.

Well you certainly have a point there; I do have a strong tendency for
overstatement :-).

Let's put it like this: being able to have connections funnel down to
a smaller number of sessions is nice feature.  Applications that are
large, complex, or super high volume have a tendency towards stateless
(with respect to the database session) architecture anyways so I tend
not to mind lack of session features when pooling (prepared statements
perhaps being the big outlier here).  It really opens up a lot of
scaling avenues.  So better a better phrased statement might be, "I
like the way pgbouncer works, in particular transaction mode pooling
from the perspective of the applications using it".  Current main pain
points are the previously mentioned administrative headaches and
better performance from a different architecture (pthreads vs libev)
would be nice.

I'm a little skeptical that we're on the right path if we are pushing
a lot of memory consumption into the session level where a session is
pinned all the way back to a client connection. plpgsql function plan
caches can be particularly hungry on memory and since sessions have
their own GUC ISTM each sessions has to have their own set of them
since plans depend on search path GUC which is session specific.
Previous discussions on managing cache memory consumption (I do dimly
recall you making a proposal on that very thing) centrally haven't
gone past panning stages AFAIK.

If we are breaking 1:1 backend:session relationship, what controls
would we have to manage resource consumption?

merlin

Re: Proper way to reload config files in backend SIGHUP handler

2018-05-04 Thread Mike Palmiotto

On 05/04/2018 09:35 AM, Michael Paquier wrote:
> On Fri, May 04, 2018 at 12:47:09AM -0400, Mike Palmiotto wrote:
>> I don't seem to have any problem catching the SIGHUP in my extension
>> without > BackgroundWorkerUnblockSignals a la:
>>
>> pqsignal_sighup(SIGHUP, sighup_handler);
> 
> I am pretty sure you mean s/pqsignal_sighup/pqsignal
Yeah, sorry, that's what I meant. :)

> 
>> static void
>> sighup_handler(SIGNAL_ARGS)
>> {
>>   
>> }
> 
> Signal handlers should not do anything complicated, and should access
> only variables of the type volatile sig_atomic_t.  This is also in the
> documentation here (see Writing Signal Handlers):
> https://www.postgresql.org/docs/devel/static/source-conventions.html
> 
>> Perhaps the signal is already unblocked at this point. Unblocking signals
>> first doesn't appear to have any affect. Am I missing something here?
>>
>> I noticed that 6e1dd27
>> (https://github.com/postgres/postgres/commit/6e1dd2773eb60a6ab87b27b8d9391b756e904ac3)
>> introduced a ConfigReloadPending flag that can be set by calling
>> PostgresSigHupHandler inside the extension's handler, which seemingly allows
>> things to work as they usually would, and then we can go on to do other 
>> config
>> processing.
>>
>> Is this approach kosher, or is there another preferred route?
> 
> From the documentation of
> https://www.postgresql.org/docs/devel/static/bgworker.html:
> 
> Signals are initially blocked when control reaches the background
> worker's main function, and must be unblocked by it; this is to allow
> the process to customize its signal handlers, if necessary. Signals can
> be unblocked in the new process by calling
> BackgroundWorkerUnblockSignals and blocked by calling
> BackgroundWorkerBlockSignals.
> 
> I have for ages in one of my github repositories a small example of
> bgworker which covers signal handling, located here:
> https://github.com/michaelpq/pg_plugins/tree/master/hello_signal
> 
> If you quote the call to BackgroundWorkerUnblockSignals, you would see
> that the SIGHUP is not received by the process, which is what the
> documentation says, and what I would expect as well.

Ah, that explains it. I'm catching SIGHUP in my extension (backend) without
using a background worker. Would writing a background worker and registering
it in PG_init allow us to reload the extension-specific config changes in the
backend? It seems like that would only affect the memory of the worker process
itself and not propagate to the backend. Perhaps that's an inaccurate 
assumption.

The end goal is:
1) Update extension-specific configs
2) Issue a postgres reload (SIGHUP)
3) Update extension variables in connected backend according to configs

That all _appears_ to work with the following:
1) Call PostgresSigHupHandler (set ConfigReloadPending, SetLatch(MyLatch) and
let the internal code handle that normally)
2) Read in custom extension-specific config files and modify non-volatile,
non-sig_atomic_t variables that are specific to the extension

It's possible that issues could arise with different levels of optimization. I
haven't tested that out.

As you've indicated, it's unsafe to do anything other than SetLatch/modify
volatile sig_atomic_t variables. If this means modifying extension-specific
non-volatile, non-sig_atomic_t variables is also hazardous, we should probably
just stick to reloading these configs using a custom SQL function, rather than
using a sighup handler.

Thanks for your patience in this line of questioning.

Regards,

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com

Re: Allow reload recovery.conf during recovery

2018-05-04 Thread Sergei Kornilov

Hello Michael
Thank you, i understand your opinion.
I really tried to find a discussion about reload recovery.conf (or walreceiver 
conninfo changing and similar stuff), not about GUC.

regards, Sergei

Negative DST, or, should we delay using tzdata 2018e update?

2018-05-04 Thread Tom Lane

Earlier this week I pushed tzdata 2018d into our tree, expecting that
that would be the current IANA timezone data for next week's releases.
Well, I was wrong; today they pushed out a new version:

https://mm.icann.org/pipermail/tz-announce/2018-May/50.html

on account of North Korea deciding to change their time zone with a
week's notice.

Ordinarily I'd just absorb 2018e and be done with it.  But I am feeling
hesitant about this update, because it also incorporates this change:

 Bring back the negative-DST changes of 2018a, except be more
 compatible with data parsers that do not support negative DST.
 Also, this now affects historical time stamps in Namibia and the
 former Czechoslovakia, not just Ireland.  The main format now uses
 negative DST to model time stamps in Europe/Dublin (from 1971 on),
 Europe/Prague (1946/7), and Africa/Windhoek (1994/2017).  This
 does not affect UT offsets, only time zone abbreviations and the
 tm_isdst flag.  Also, this does not affect rearguard or vanguard
 formats; effectively the main format now uses vanguard instead of
 rearguard format.  Data parsers that do not support negative DST
 can still use data from the rearguard tarball described below.

As the announcement hints, this "negative DST" business was sufficiently
controversial that the initial attempt to push it was rescinded, on the
grounds that negative DST offsets might break people's software, and
some lead time was required to deal with that.

Basically, the point here is that it seems that in Ireland, legal
"standard time" is what they use in summer, and in winter they use
"daylight savings" time that is one hour behind standard time, rather
than one hour ahead as is usually meant by "daylight savings".  Up to
now the IANA database has just ignored the fact that "IST" is legally
"Irish Standard Time" not "Irish Summer Time", but now they want to
represent it properly.

>From our point of view, the question is whether any applications are
likely to get confused if the "is_dst" column shown in pg_timezone_names
and pg_timezone_abbrevs is turned on for offsets that are one hour behind
their zone's "regular" time instead of one hour ahead.  I do not think
anything inside PG will get confused (though I could be wrong about that).
Application-level problems of this sort seem possible but not very likely.

We have a couple of alternatives:

1. Bull ahead and push tzdata 2018e into our tree, and ship that with
Monday's releases.  I am scared of this choice, because there is
basically no time remaining to detect or recover from any problems.

2. Delay applying 2018e until after next week's releases, so that we have
three months for any problems to get noticed.  (Also, 2018e could go out
with v11 betas, so that we can expect there would actually be some field
testing of it before it'd be released in back branches.)

3. As the announcement mentions, there's a "rear guard" version of
the tzdata files in which the negative-DST changes haven't been made.
Perhaps we could use that version for awhile, maybe only in the back
branches.  But eventually we'd have to change --- it's pretty clear that
the IANA crew isn't going to maintain this "rear guard" format forever ---
and it's not clear that postponing the pain would make anything better.

An important consideration here is that for a very large fraction of
our users, our choices here won't affect things in the least: any package
that uses --with-system-tzdata is going to be subject to the platform's
decisions about which tzdata to use and when to push it, not ours.

We can be pretty certain that bleeding-edge-loving platforms (e.g. Gentoo)
will push negative-DST-using tzdata soon.  Therefore, our code has to be
prepared to work with that even if we don't ship it.  This leads me to
think that we'd best incorporate 2018e into HEAD and ship it in 11beta,
regardless of what we do for the back branches.

I don't particularly want to get into a situation where we're shipping
different tzdata in v11+ than in older branches; that'd be a maintenance
nightmare.  So I'm not enamored of the variants of option 3 that would
have us doing that.  But the variant in which we ship rear-guard data
in all branches seems like it's just postponing a problem that we
can't postpone forever.

On balance I'm leaning towards option #2, ie go with the new data but
wait till just after the minor releases.  Thoughts?

regards, tom lane

Re: Allow reload recovery.conf during recovery

2018-05-04 Thread Michael Paquier

On Fri, May 04, 2018 at 05:19:26PM +0300, Sergei Kornilov wrote:
> I did not find previous discussions.

There have been a lot of discussions across the years about switching
recovery parameters to use the GUC infrastructure, please see those
two ones:
https://www.postgresql.org/message-id/CABUevEy5aWuwySXEC6i3JA6cvy8agGZHQbOn0RVZ4h4oxM0Dkw%40mail.gmail.com
https://www.postgresql.org/message-id/CAJKUy5id1eyweK0W4%2ByyCM6%2B-qYs9erLidUmb%3D1a-QYBgTW4Qw%40mail.gmail.com

And more recently this one which reached a kind of agreement:
https://www.postgresql.org/message-id/CANP8%2BjLO5fmfudbB1b1iw3pTdOK1HBM%3DxMTaRfOa5zpDVcqzew%40mail.gmail.com

Parameters in recovery.conf now have a level equivalent to
GUC_POSTMASTER as those are just read when the startup process finds a
recovery.conf file and are never changed.  Before making some of them as
reloadable, let's switch them to be GUCs first and not re-invent the
SIGHUP handling of parameters as your patch does.  And after let's
discuss about switching some of them to GUC_SIGHUP.
--
Michael


signature.asc
Description: PGP signature

Allow reload recovery.conf during recovery

2018-05-04 Thread Sergei Kornilov

Hello all
I want propose patch to make possible change primary_conninfo, 
primary_slot_name, restore_command and trigger_file in recovery.conf without 
restart postgresql.
Startup process will reread recovery.conf on SIGHUP.

My primary usecase is postgresql cluster with streaming replication. When we 
lost current master and want promote new master we need restart all replicas 
with new primary_conninfo.
Another useful cases from my work:
- switch wal streaming to another NIC
- change replication username (or password when .pgpass was not used)
When restarting, shared buffers are lost; some time are spent in recovery from 
last restart point. But in some cases all we need is to change connection 
string.
Changing restore_command may be useful for users too.

I did not find previous discussions.

I split readRecoveryCommandFile to two functions:
- readRecoveryCommandFile with reading and validating recovery.conf and avoid 
side effects
- ProcessRecoveryCommandFile with other logic

Changing primary_conninfo or primary_slot_name will restart walreceiver (i do 
not touch walreceiver source, i request standard restart)
trigger_file was not my primary target, but i not find any reason to decline 
this change on reload (also did not find any tests with trigger_file, i add 
one). I am not sure about all other settings like recovery_min_apply_delay, 
archive_cleanup_command and etc, so i forbid these changes.

Patch to current master. Changes to the documentation and a few additional 
tests are included.

PS: of course, i do not mean postgresql 11

regards, Sergeidiff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 92825fd..0660938 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -14,7 +14,9 @@
 recovery.confrecovery.conf
 file. They apply only for the duration of the
 recovery.  They must be reset for any subsequent recovery you wish to
-perform.  They cannot be changed once recovery has begun.
+perform. Excepts restore_command, primary_conninfo, 
+primary_slot_name and trigger_file 
+they cannot be changed once recovery has begun.

 

@@ -76,6 +78,10 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 database server shutdown) or an error by the shell (such as command
 not found), then recovery will abort and the server will not start up.

+   
+   
+Can be changed during recovery with configuration files reload.
+   
   
  
 
@@ -410,6 +416,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  
   This setting has no effect if standby_mode is off.
  
+ 
+  Can be changed during recovery with configuration files reload.
+  walreceiver will be restart after 
+  primary_conninfo was changed.
+ 
 


@@ -427,6 +438,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   This setting has no effect if primary_conninfo is not
   set.
  
+ 
+  Can be changed during recovery with configuration files reload.
+  walreceiver will be restart after 
+  primary_slot_name was changed.
+ 
 


@@ -442,6 +458,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   the standby using pg_ctl promote.
   This setting has no effect if standby_mode is off.
  
+ 
+  Can be changed during recovery with configuration files reload.
+ 
 

 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c0923d9..55409fd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -255,26 +255,39 @@ static bool restoredFromArchive = false;
 static char *replay_image_masked = NULL;
 static char *master_image_masked = NULL;
 
-/* options taken from recovery.conf for archive recovery */
-char	   *recoveryRestoreCommand = NULL;
-static char *recoveryEndCommand = NULL;
-static char *archiveCleanupCommand = NULL;
-static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
-static bool recoveryTargetInclusive = true;
-static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE;
-static TransactionId recoveryTargetXid;
-static TimestampTz recoveryTargetTime;
-static char *recoveryTargetName;
-static XLogRecPtr recoveryTargetLSN;
-static int	recovery_min_apply_delay = 0;
+/* options taken from RECOVERY_COMMAND_FILE */
+typedef struct recoveryCommandData
+{
+	bool		confExists;		/* is RECOVERY_COMMAND_FILE present */
+
+	/* options for archive recovery */
+	char	   *recoveryRestoreCommand;
+	char	   *recoveryEndCommand;
+	char	   *archiveCleanupCommand;
+	RecoveryTargetType recoveryTarget;
+	bool		recoveryTargetInclusive;
+	RecoveryTargetAction recoveryTargetAction;
+	TransactionId reco

Re: GSoC 2018: thrift encoding format

2018-05-04 Thread Stephen Frost

Greetings,

* Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote:
> >> Personally I think raw data bytes are OK if functions for getting all
> >> keys and values from this data are provided
> > 
> > What is the purpose of using Thrift "encoding" if it turns out to be a
> > simple wrapper for existing binary data?
> > 
> > Do you mean the goal is to have "get/set" functions to fetch data out of
> > bytea field?
> 
> I mean Charles is free to choose the interface for the extension he
> believes is right. There would be no much learning left in the project
> if all design decisions were made beforehand.

Perhaps the design decisions aren't all made beforehand, but they also
shouldn't be made in a vacuum- there should be discussions on -hackers
about what the right decision is for a given aspect and that's what
should be worked towards.

> Personally I would probably just write a Thrift<->JSONB converter. But
> there are pros and cons of this approach. For instance, CPU and memory
> overhead for creating and storing temporary JSONB object is an obvious
> drawback. On the other hand there are time limits for this project and
> thus it makes sense to implement a feature as fast and as simple as
> possible, and optimize it later (if necessary). 

Just having such a convertor would reduce the usefulness of this
extension dramatically, wouldn't it?  Considering the justification for
the extension used on the GSoC project page, it certainly strikes me as
losing most of the value if we just convert to JSONB.

> Maybe Charles likes to optimize everything. In this case he may choose
> to implement all the getters and setters from scratch. This doesn't
> exclude possibility of implementing the Thrift<->JSONB converter later.

Having a way to cast between the two is entirely reasonable, imv, but
that's very different from having the data only able to be stored as
JSONB..

> Should Thrift objects be represented in the DBMS as a special Thrift
> type, or as raw bytea? Personally I don't care. Once again, there are
> pros and cons. It's good to have a bit of additional type safety. On the
> other hand, it's not convenient to cast Thrift<->bytea all the time, and
> if we add implicit casting there will be little type safety left. In
> pg_protobuf extension I choose to store Protobuf as bytea, but if
> Charles prefer to introduce a separate type that's fine by me.

I understand that you're open to having it as a new data type or as a
bytea, but I don't agree.  This should be a new data type, just as json
is a distinct data type and so is jsonb.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Moving libpg{common,port,feutils}.a to pkglibdir?

2018-05-04 Thread Michael Paquier

On Thu, May 03, 2018 at 08:40:49PM +0200, Christoph Berg wrote:
> [*] I haven't checked, but this likely applies to Michael Paquier's
> version of it as well

Confirmed.  That applies as well.
--
Michael


signature.asc
Description: PGP signature

Re: Is there a memory leak in commit 8561e48?

2018-05-04 Thread Michael Paquier

On Thu, May 03, 2018 at 08:45:13AM -0400, Peter Eisentraut wrote:
> I went with the patch I had posted, since I needed to move ahead with
> something.  If it turns out to be a problem, we can easily switch it
> around.

Okay.

> As Tom mentioned, in order to grow the SPI stack to where it has a
> significant size, you might also overrun the OS stack and other things.
> On the other hand, the current/new arrangement is a win for normal SPI
> use, since you don't need to rebuild the stack on every call.

It would be nice to mention that in the release notes..
--
Michael


signature.asc
Description: PGP signature

Re: pg_dump should use current_database() instead of PQdb()

2018-05-04 Thread Peter Eisentraut

On 4/10/18 12:40, Peter Eisentraut wrote:
> A report from a pgbouncer user revealed that running pg_dump -C/--create
> does not work through a connection proxy if the virtual database name on
> the proxy does not match the real database name on the database server.
> That's because pg_dump looks up the database to be dumped using the
> information from PQdb().  It should be using current_database() instead.
>  (The code was quite likely written before current_database() was
> available (PG 7.3)).
> 
> See attached patch.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proper way to reload config files in backend SIGHUP handler

2018-05-04 Thread Michael Paquier

On Fri, May 04, 2018 at 12:47:09AM -0400, Mike Palmiotto wrote:
> I don't seem to have any problem catching the SIGHUP in my extension
> without > BackgroundWorkerUnblockSignals a la:
> 
> pqsignal_sighup(SIGHUP, sighup_handler);

I am pretty sure you mean s/pqsignal_sighup/pqsignal/.

> static void
> sighup_handler(SIGNAL_ARGS)
> {
>   
> }

Signal handlers should not do anything complicated, and should access
only variables of the type volatile sig_atomic_t.  This is also in the
documentation here (see Writing Signal Handlers):
https://www.postgresql.org/docs/devel/static/source-conventions.html

> Perhaps the signal is already unblocked at this point. Unblocking signals
> first doesn't appear to have any affect. Am I missing something here?
> 
> I noticed that 6e1dd27
> (https://github.com/postgres/postgres/commit/6e1dd2773eb60a6ab87b27b8d9391b756e904ac3)
> introduced a ConfigReloadPending flag that can be set by calling
> PostgresSigHupHandler inside the extension's handler, which seemingly allows
> things to work as they usually would, and then we can go on to do other config
> processing.
> 
> Is this approach kosher, or is there another preferred route?

From the documentation of
https://www.postgresql.org/docs/devel/static/bgworker.html:

Signals are initially blocked when control reaches the background
worker's main function, and must be unblocked by it; this is to allow
the process to customize its signal handlers, if necessary. Signals can
be unblocked in the new process by calling
BackgroundWorkerUnblockSignals and blocked by calling
BackgroundWorkerBlockSignals.

I have for ages in one of my github repositories a small example of
bgworker which covers signal handling, located here:
https://github.com/michaelpq/pg_plugins/tree/master/hello_signal

If you quote the call to BackgroundWorkerUnblockSignals, you would see
that the SIGHUP is not received by the process, which is what the
documentation says, and what I would expect as well.
--
Michael

signature.asc
Description: PGP signature

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Tatsuo Ishii

> I totally agree.  Ideally, rewind would just rewind data dirs by default
> and provide an option to include other files as specified by the
> administrator.

Something like "--copy-including-files=file1,file2,..."? However I
would like to also have "--copy-excluding-files=". The latter would be
more convenient for most users like me.

> However I ran out of time this time to do this patch and hope to submit
> again after 11 is finalized.

Sounds great.

> There are two other things that would really be nice to make work too (but
> think that's another major version away):
> 
> 1.  Make pg_rewind work over the replication protocol so it doesn't require
> db superuser
> 2.  Unify, to the extent possible, the code base with pg_basebackup.

Interesting idea.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Tatsuo Ishii

> Even if so, using the ALTER SYSTEM command after pg_rewind might be an easy
> way of correcting the port back to what you want without doing more
> complicated processing or re-writing the entire postgresql.conf.

To make ALTER SYSTEM works, the server needs to start up in the first
place. But without changing the port number, it's impossible to start
the server in some cases. In my case, I create several PostgreSQL
database clusters on single server by assigning different port
numbers. If the port number is already used by other server (in
pg_rewind case the source server aparently uses the port), the rewound
server won't start up.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

MAP syntax for arrays

2018-05-04 Thread Ildar Musin


Hello hackers,

Recently I was working with sql arrays in postgres and it turned out
that postgres doesn't have such very convinient functional constructions
as map, reduce and filter. Currently to map function over array user has
to make a subquery like:

select u.* from
my_table,
lateral (
select array_agg(lower(elem))
from unnest(arr) as elem
) as u;

Which is not only inconvenient but not very efficient as well (see
'Demo' section below).

When I dug into the code I found that postgres already has the needed
infrastructure for implementing map for arrays; actually array coercing
already works that way (it basically maps cast function).

In the attached patch there is a simple map implementation which
introduces new expression type and syntax:

MAP( OVER )

For example:

SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
?column?
-
 {ONE,TWO,THREE}
(1 row)

This is probably not the most useful notation and it would be better to
have syntax for mapping arbitrary expressions over array, not just
function. I'm struggling to come up with a good idea of how it should
look like. It could look something like following:

MAP( FOR  IN )

For instance:

SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);

Looking forward for community's suggestions!

Demo


Here is a small comparison between map and unnest/aggregate ways for
per-element processing of arrays. Given a table with 1K rows which
contains single column of text[] type. Each array contains 5/10/100
elements.

create table my_table (arr text[]);
insert into my_table
select array_agg(md5(random()::text))
from generate_series(1, 1000) as rows,
 generate_series(1, 10) as elements
group by rows;

There are two scripts for pgbench. One for 'map' syntax:

select map(upper over arr) from my_table;

And one for unnest/aggregate:

select u.* from my_table,
lateral (
select array_agg(upper(elem))
from unnest(arr) as elem
) as u;

Results are:

 elements per array |  map (tps) | unnest/aggregate (tps)
++
  5 | 139.105359 |   74.434010
 10 |  74.089743 |   43.622554
100 |   7.693000 |5.325805

Apparently map is more efficient for small arrays. And as the size of
array increases the difference decreases.

I'll be glad to any input from the community. Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e284fd7..85d76b2 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -886,6 +886,56 @@ ExecInitExprRec(Expr *node, ExprState *state,
 break;
 			}
 
+		case T_MapExpr:
+			{
+MapExpr	   *map = (MapExpr *) node;
+ExprState  *elemstate;
+Oid			resultelemtype;
+
+ExecInitExprRec(map->arrexpr, state, resv, resnull);
+
+resultelemtype = get_element_type(map->resulttype);
+if (!OidIsValid(resultelemtype))
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("target type is not an array")));
+
+/* Construct a sub-expression for the per-element expression */
+elemstate = makeNode(ExprState);
+elemstate->expr = map->elemexpr;
+elemstate->parent = state->parent;
+elemstate->ext_params = state->ext_params;
+elemstate->innermost_caseval = (Datum *) palloc(sizeof(Datum));
+elemstate->innermost_casenull = (bool *) palloc(sizeof(bool));
+
+ExecInitExprRec(map->elemexpr, elemstate,
+&elemstate->resvalue, &elemstate->resnull);
+
+/* Append a DONE step and ready the subexpression */
+scratch.opcode = EEOP_DONE;
+ExprEvalPushStep(elemstate, &scratch);
+ExecReadyExpr(elemstate);
+
+scratch.opcode = EEOP_MAP;
+scratch.d.map.elemexprstate = elemstate;
+scratch.d.map.resultelemtype = resultelemtype;
+
+if (elemstate)
+{
+	/* Set up workspace for array_map */
+	scratch.d.map.amstate =
+		(ArrayMapState *) palloc0(sizeof(ArrayMapState));
+}
+else
+{
+	/* Don't need workspace if there's no subexpression */
+	scratch.d.map.amstate = NULL;
+}
+
+ExprEvalPushStep(state, &scratch);
+break;
+			}
+
 		case T_OpExpr:
 			{
 OpExpr	   *op = (OpExpr *) node;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9d6e25a..b2cbc45 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -328,6 +328,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_FUNCEXPR_STRICT,
 		&&CASE_EEOP_FUNCEXPR_FUSAGE,
 		&&CASE_EEOP_FUNCEXPR_STRICT_FUSAGE,
+		&&CASE_EEOP_MAP,
 		&&CASE_EEOP_BOOL_AND_STEP_FIRST,
 		&&CASE_EEOP_BOOL_AND_ST

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Michael Paquier

On Fri, May 04, 2018 at 02:05:25PM +0200, Chris Travers wrote:
> I totally agree.  Ideally, rewind would just rewind data dirs by default
> and provide an option to include other files as specified by the
> administrator.

That's actually a potential please-shoot-both-my-feet option.  Imagine
if one uses it for accidentally (or imagine reduce the amount of network
bandwidth) removing critical data like what's in pg_xact/...  So I am
not especially a fan of such options.

> There are two other things that would really be nice to make work too (but
> think that's another major version away):
> 
> 1.  Make pg_rewind work over the replication protocol so it doesn't require
> db superuser

Actually, there is no need to use a superuser now for v11.  A user just
needs to be part of the system group pg_read_server_files.  Switching to
the replication protocol could introduce more bugs which are not worth
the risk.

> 2.  Unify, to the extent possible, the code base with pg_basebackup.

Yeah, that's what happened with 266b6ac.  You could unify things a bit
more by sharing the exclude filter lists now in basebackup.c and
filemap.c into a common header, but some variables are declared only on
backend-side code and I would love to avoid tweaks like in pg_resetwal
which include postgres.h and enforce FRONTEND to 1 without using
postgres_fe.h.
--
Michael

signature.asc
Description: PGP signature

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Christoph Moench-Tegeder

## Tatsuo Ishii (is...@sraoss.co.jp):

> Currently pg_rewind copies all files including postgresql.conf. It
> would be nice if pg_rewind has an option to not copy
> postgresql.conf.

How about including a file outside the data directory with "local"
settings? Like "include /some/where/else/local.conf".
Or use a setup with all config files (except postgresql.auto.conf)
outside the data directory (as on Debian/Ubuntu)?

Regards,
Christoph

-- 
Spare Space

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Michael Paquier

On Fri, May 04, 2018 at 02:10:46PM +0200, Chris Travers wrote:
> Yes that is copied currently by pg_rewind, as are server logs if they are
> in the data directory.

Yes, that is valid for all the existing versions of pg_rewind in the
core tree, as well as what I maintain for VMware (9.3 and 9.4
versions).

> I think replication slots also might be copied but would have to check.

No.  At least not since 266b6ac which will be part of Postgres 11.  All
that is described in the documentation (see the bottom):
https://www.postgresql.org/docs/devel/static/app-pgrewind.html

Those are copied up to v10 though.
--
Michael

signature.asc
Description: PGP signature

Re: make installcheck-world in a clean environment

2018-05-04 Thread Alexander Lakhin


04.05.2018 14:58, Robert Haas wrote:

Thanks for your involvement!


Your proposed fix involves inventing something called
USE_INSTALLED_ASSETS, but we don't need that anywhere else, so why do
we need it for ECPG?
We need that not only for ECPG, but for libpostgres, libpgport, and 
libpq too, if we want to check the installed ones (instead of recompiled).
Imagine that we will recompile psql (and use it) to perform installcheck 
for the server instance. It will work but we'll miss an opportunity to 
check whether the installed psql is working. I believe, it's the same 
with ecpg (as now, we can even remove installed ecpg and have all the 
tests passed).

I am not opposed to fixing this, but one might also ask whether it's a
good use of time, since the workaround is straightforward.


I'm not seeing a workaround to perform more complete installcheck 
without modifying makefiles. So for me the question is whether the 
increasing the testing surface justifies this use of time.



Best regards,

--
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Chris Travers

On Fri, May 4, 2018 at 2:06 PM, Isaac Morland 
wrote:

> Would it work to use ALTER SYSTEM (postgresql.auto.conf)? Or is that
> copied by pg_rewind also?
>

Yes that is copied currently by pg_rewind, as are server logs if they are
in the data directory.  I think replication slots also might be copied but
would have to check.

>
> Even if so, using the ALTER SYSTEM command after pg_rewind might be an
> easy way of correcting the port back to what you want without doing more
> complicated processing or re-writing the entire postgresql.conf.
>
> On 4 May 2018 at 06:59, Tatsuo Ishii  wrote:
>
>> Currently pg_rewind copies all files including postgresql.conf. It
>> would be nice if pg_rewind has an option to not copy
>> postgresql.conf. I sometimes create multiple PostgreSQL database
>> clusters with different port number which postmaster is listening on
>> for a testing purpose. So existing postgresql.conf on the target
>> cluster being overwritten by pg_rewind is annoying. I believe there
>> are some use cases where different port numbers are used among
>> PostgreSQL database clusters in the real world.
>>
>> Comments?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Isaac Morland

Would it work to use ALTER SYSTEM (postgresql.auto.conf)? Or is that copied
by pg_rewind also?

Even if so, using the ALTER SYSTEM command after pg_rewind might be an easy
way of correcting the port back to what you want without doing more
complicated processing or re-writing the entire postgresql.conf.

On 4 May 2018 at 06:59, Tatsuo Ishii  wrote:

> Currently pg_rewind copies all files including postgresql.conf. It
> would be nice if pg_rewind has an option to not copy
> postgresql.conf. I sometimes create multiple PostgreSQL database
> clusters with different port number which postmaster is listening on
> for a testing purpose. So existing postgresql.conf on the target
> cluster being overwritten by pg_rewind is annoying. I believe there
> are some use cases where different port numbers are used among
> PostgreSQL database clusters in the real world.
>
> Comments?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>

Re: pg_rewind and postgresql.conf

2018-05-04 Thread Chris Travers

On Fri, May 4, 2018 at 12:59 PM, Tatsuo Ishii  wrote:

> Currently pg_rewind copies all files including postgresql.conf. It
> would be nice if pg_rewind has an option to not copy
> postgresql.conf. I sometimes create multiple PostgreSQL database
> clusters with different port number which postmaster is listening on
> for a testing purpose. So existing postgresql.conf on the target
> cluster being overwritten by pg_rewind is annoying. I believe there
> are some use cases where different port numbers are used among
> PostgreSQL database clusters in the real world.
>
> Comments?
>

I totally agree.  Ideally, rewind would just rewind data dirs by default
and provide an option to include other files as specified by the
administrator.

However I ran out of time this time to do this patch and hope to submit
again after 11 is finalized.

There are two other things that would really be nice to make work too (but
think that's another major version away):

1.  Make pg_rewind work over the replication protocol so it doesn't require
db superuser
2.  Unify, to the extent possible, the code base with pg_basebackup.

> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: make installcheck-world in a clean environment

2018-05-04 Thread Robert Haas

On Mon, Apr 2, 2018 at 5:12 AM, Alexander Lakhin  wrote:
> Is it a supported scenario to make installcheck-world without performing
> "make" first?

The evidence suggests that the answer is "no".

> (If I do "make -C src/interfaces/ecpg" and then "make installcheck-world",
> then this error is gone. And when I set up all the extensions, all tests
> passed successfully.)

Your proposed fix involves inventing something called
USE_INSTALLED_ASSETS, but we don't need that anywhere else, so why do
we need it for ECPG?

I am not opposed to fixing this, but one might also ask whether it's a
good use of time, since the workaround is straightforward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: unused_oids script is broken with bsd sed

2018-05-04 Thread John Naylor

On 5/4/18, Tom Lane  wrote:
> Well, the work's already done, and personally I think the code is
> cleaner after 9bf28f96c and bad51a49a regardless of whether there's any
> performance benefit.  You could call 1f1cd9b5d a wart if you wanted.
> But we've done largely the same thing to pgindent, for one, in the past
> year or so, and I find that to be a usability improvement, independently
> of whether there's any build performance win.  My editor gets cranky
> when files change under it for no reason.

Thanks for looking into that. I meant to do so, but it got stuck on
the back burner. I'm happy to report that, running non-parallel make
on my old laptop goes from over 5 minutes to about 5 seconds, when
touching a catalog file.

-John Naylor

Re: [HACKERS] Clock with Adaptive Replacement

2018-05-04 Thread Vladimir Sitnikov

>I don't have time to check this out just now, but it seems like an
excellent idea

There are a couple of sad things:
1) DTrace probes seem to be disabled by default. At least, I had to build
PostgreSQL with --enable-dtrace in my macOS.
Does that (the requirement to enable dtrace within PostgreSQL) block from
capturing trace files from real-life applications?

There's an option to use system-level IO trace points (e.g.
https://github.com/omniti-labs/pgtreats/blob/master/tools/pg_file_stress ),
however it would be harder to get DB/relation kind of ids.

2) (database, tablespace, relation) oids cannot be easily mapped from
within a DTrace script. One can workaround that by using a SQL connection
to the database, however it gets a bit complicated if there are multiple DB
instances. What I mean is it is hard to tell which connection URL to use in
order to resolve the oids in question.

Vladimir

Re: GSoC 2018: thrift encoding format

2018-05-04 Thread Aleksander Alekseev

Hello Vladimir,

> I'm just trying to figure out what are the use cases for using that Thrift
> extension.

You can find an answer in the project description:

https://wiki.postgresql.org/wiki/GSoC_2018#Thrift_datatype_support_.282018.29

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Clock with Adaptive Replacement

2018-05-04 Thread Юрий Соколов

пт, 4 мая 2018 г., 12:27 Andrey Borodin :

>
> > 4 мая 2018 г., в 0:37, Robert Haas  написал(а):
> >
> > On Wed, May 2, 2018 at 3:06 PM, Vladimir Sitnikov
> >  wrote:
> >> Sample output can be seen here:
> >> https://github.com/vlsi/pgsqlstat/tree/pgsqlio#pgsqlio
> >
> > Neat.  Not sure what generated this trace, but note this part:
> >
> > 32718388813748820500 16631638516604
> 0
> > 3271840973321 436800 16631638516604
> 1
> > 3271842680626 450200 16631638516604
> 1
> > 3271846077927 417300 16631638516604
> 1
> >
> > If we want to avoid artificial inflation of usage counts, that kind of
> > thing would be a good place to start -- obviously 4 consecutive
> > accesses to the same buffer by the same backend doesn't justify a
> > separate usage count bump each time.
>
> Upper in this thread Yura suggested that usages should not create equal
> bump each time. He effectively suggested log scale of usages, thus many
> consecutive usages will be taken into account but not dramatically more
> important than just few recent usages.
>

I didn't suggest log scale of usages, but rather "replacement-period based"
increment: usage count could be incremented at most once in NBlocks/32
replaced items. Once it is incremented, its "replacement time" is
remembered, and next NBlocks/32 replacements usage count of this buffer
doesn't increment.
This way, increment is synchronized with replacement activity.

Digging further, I suggest as improvement of GClock algorithm:
- placing new buffer with usage count = 0 (and next NBlock/32 replacements
its usage count doesn't increased)
- increment not by 1, but by 8 (it simulates "hot queue" of popular
algorithms) with limit 32.
- scan at most 25 buffers for eviction. If no buffer with zero usage count
found, the least used buffer (among scanned 25) is evicted.
(new buffers are not evicted during their first NBlock/32 replacements).

Regards,
Yura Sokolov.

pg_rewind and postgresql.conf

2018-05-04 Thread Tatsuo Ishii

Currently pg_rewind copies all files including postgresql.conf. It
would be nice if pg_rewind has an option to not copy
postgresql.conf. I sometimes create multiple PostgreSQL database
clusters with different port number which postmaster is listening on
for a testing purpose. So existing postgresql.conf on the target
cluster being overwritten by pg_rewind is annoying. I believe there
are some use cases where different port numbers are used among
PostgreSQL database clusters in the real world.

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: GSoC 2018: thrift encoding format

2018-05-04 Thread Vladimir Sitnikov

>I mean Charles is free to choose the interface for the extension he
believes is right

I'm just trying to figure out what are the use cases for using that Thrift
extension.

For instance, it would be interesting if Thrift was an alternative way to
transfer data between client and the database. I guess it could simplify
building efficient clients.

Vladimir

Re: nbtsort.c performs unneeded (though harmless) truncation

2018-05-04 Thread Teodor Sigaev


Thank you, pushed.

Peter Geoghegan wrote:

I noticed that we're calling _bt_nonkey_truncate() needlessly when a
minimum key is needed at the leftmost page on each level of the tree.
This was always a special case, and I think that it should remain as
one. Attached patch avoids unneeded truncations, while preserving the
generic BTreeTupleGetNAtts() assertions.

This isn't a correctness issue, and the extra overhead of unneeded
truncation should be negligible, but what we have now seems confusing
to me.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] path toward faster partition pruning

2018-05-04 Thread Marina Polyakova


Hello everyone in this thread!

I got a similar server crash as in [1] on the master branch since the 
commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails 
because the second argument ScalarArrayOpExpr is not a Const or an 
ArrayExpr, but is an ArrayCoerceExpr (see [2]):


=# create table list_parted (
  a varchar
) partition by list (a);
=# create table part_ab_cd partition of list_parted for values in ('ab', 
'cd');

=# CREATE OR REPLACE FUNCTION public.x_stl_text_integer (
)
RETURNS text STABLE AS
$body$
BEGIN
RAISE NOTICE 's text integer';
RETURN 1::text;
END;
$body$
LANGUAGE 'plpgsql';
=# explain (costs off) select * from list_parted where a in ('ab', 'cd', 
x_stl_text_integer());

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

[1] 
https://www.postgresql.org/message-id/CAKcux6nCsCmu9oUnnuKZkeBenYvUFbU2Lt4q2MFNEb7QErzn8w%40mail.gmail.com


[2] partprune.c, function match_clause_to_partition_key:
if (IsA(rightop, Const))
{
...
}
else
{
ArrayExpr  *arrexpr = castNode(ArrayExpr, rightop); # 
fails here
...
}

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Clock with Adaptive Replacement

2018-05-04 Thread Andrey Borodin


> 4 мая 2018 г., в 0:37, Robert Haas  написал(а):
> 
> On Wed, May 2, 2018 at 3:06 PM, Vladimir Sitnikov
>  wrote:
>> Sample output can be seen here:
>> https://github.com/vlsi/pgsqlstat/tree/pgsqlio#pgsqlio
> 
> Neat.  Not sure what generated this trace, but note this part:
> 
> 32718388813748820500 16631638516604  0
> 3271840973321 436800 16631638516604  1
> 3271842680626 450200 16631638516604  1
> 3271846077927 417300 16631638516604  1
> 
> If we want to avoid artificial inflation of usage counts, that kind of
> thing would be a good place to start -- obviously 4 consecutive
> accesses to the same buffer by the same backend doesn't justify a
> separate usage count bump each time.

Upper in this thread Yura suggested that usages should not create equal bump 
each time. He effectively suggested log scale of usages, thus many consecutive 
usages will be taken into account but not dramatically more important than just 
few recent usages.

Best regards, Andrey Borodin.

Re: Built-in connection pooling

2018-05-04 Thread Konstantin Knizhnik




On 03.05.2018 20:01, Robert Haas wrote:

On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure  wrote:

What _I_ (maybe not others) want is a
faster pgbouncer that is integrated into the database; IMO it does
everything exactly right.

I have to admit that I find that an amazing statement.  Not that
pgbouncer is bad technology, but saying that it does everything
exactly right seems like a vast overstatement.  That's like saying
that you don't want running water in your house, just a faster motor
for the bucket you use to draw water from the well.

May be if you are engaged in agriculture at your country house, then 
having a well with good motor pump is better for watering of plants than 
water faucet at your kitchen.
But most of homeowners prefer to open a tapto wash hands rather than 
perform some complex manipulations with motor pump.


I absolutely sure that external connection poolers will always have 
their niche: them can be used as natural proxy between multiple clients 
and DBMS.

Usually HA/load balancing also can be done at this level.

But there are many cases when users just do not want to worry about  
connection pooling: them just has some number of clients (which can be 
larger enough and several times larger than optimal number of Postgres 
backends) and them want them to access database without introducing some 
intermediate layers. In this case built-in connection pooler will be the 
ideal solution.


This is from user's point of view. From Postgres developer's point of 
view, built-in pooler has  some technical advantages comparing with 
external pooler.
Some of this advantages can be eliminated by significant redesign of 
Postgres architecture, for example introducing shared cache of prepared 
statements...
But in any case, the notion of session context  and possibility to 
maintain larger number of opened sessions will always be topical.



Some update on status of built-in connection pooler prototype: I managed 
to run regression and isolation tests for pooled connections.
Right now  6 of 185 tests failed are failed for regression tests and 2 
of 67 tests failed for isolation tests.
For regression tests result may vary depending on parallel schedule, 
because of manipulations with roles/permissions which are not currently 
supported.
The best results are for sequential schedule: 5 failed tests: this 
failures caused by differences in pg_prepared_statements caused by 
"mangled" prepared names.


Failures of isolation tests are caused by unsupported advisory locks.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: GSoC 2018: thrift encoding format

2018-05-04 Thread Aleksander Alekseev

Hello Vladimir,

>> Personally I think raw data bytes are OK if functions for getting all
>> keys and values from this data are provided
> 
> What is the purpose of using Thrift "encoding" if it turns out to be a
> simple wrapper for existing binary data?
> 
> Do you mean the goal is to have "get/set" functions to fetch data out of
> bytea field?

I mean Charles is free to choose the interface for the extension he
believes is right. There would be no much learning left in the project
if all design decisions were made beforehand.

Personally I would probably just write a Thrift<->JSONB converter. But
there are pros and cons of this approach. For instance, CPU and memory
overhead for creating and storing temporary JSONB object is an obvious
drawback. On the other hand there are time limits for this project and
thus it makes sense to implement a feature as fast and as simple as
possible, and optimize it later (if necessary). 

Maybe Charles likes to optimize everything. In this case he may choose
to implement all the getters and setters from scratch. This doesn't
exclude possibility of implementing the Thrift<->JSONB converter later.

Should Thrift objects be represented in the DBMS as a special Thrift
type, or as raw bytea? Personally I don't care. Once again, there are
pros and cons. It's good to have a bit of additional type safety. On the
other hand, it's not convenient to cast Thrift<->bytea all the time, and
if we add implicit casting there will be little type safety left. In
pg_protobuf extension I choose to store Protobuf as bytea, but if
Charles prefer to introduce a separate type that's fine by me.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2018-05-04 Thread Teodor Sigaev


Thanks to everyone, v3 is pushed.

Teodor Sigaev wrote:
I don't very happy with rootBuffer added everywhere. ginInsertItemPointers() 
and   ginPrepareDataScan() now take both args, rootBlkno and rootBuffer, 
second could be invalid. As I can see, you did it to call 
CheckForSerializableConflictIn() in ginEntryInsert(). Seems, it could be 
reverted and CheckForSerializableConflictIn() should be added to 
ginFindLeafPage() with searchMode = false.

Implemented, v3 is attached.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: Optimze usage of immutable functions as relation

2018-05-04 Thread Andrew Gierth

> "Aleksandr" == Aleksandr Parfenov  writes:

 Aleksandr> The idea of the fix for this situation is to check is a
 Aleksandr> result of the function constant or not during the planning
 Aleksandr> of the query. Attached patch does this by processing Var
 Aleksandr> entries at planner stage and replace them with constant
 Aleksandr> value if it is possible. Plans after applying a patch
 Aleksandr> (SeqScan query for comparison):

>From an implementation point of view your patch is obviously broken in
many ways (starting with not checking varattno anywhere, and not
actually checking anywhere if the expression is volatile).

But more importantly the plans that you showed seem _worse_ in that
you've created extra calls to to_tsquery even though the query has been
written to try and avoid that.

I think you need to take a step back and explain more precisely what you
think you're trying to do and what the benefit (and cost) is.

Specific coding style points (not exhaustive):

 Aleksandr>  pointedNode->functions->length == 1

list_length()

 Aleksandr> pointedNode->functions->head->data.ptr_value

linitial() or linitial_node()

 Aleksandr> if (result->type == T_FuncExpr)

IsA()

(what if the result is the inline expansion of a volatile function?)

 Aleksandr> pfree(result);

result is a whole tree of nodes, pfree is pointless here

(don't bother trying to fix the above points in this patch, that won't
address the fundamental flaws)

-- 
Andrew (irc:RhodiumToad)

Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2018-05-04 Thread Heikki Linnakangas


On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:

At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas  wrote in 
<89e33d4f-5c75-0738-3dcb-44c4df59e...@iki.fi>

Looking at the patch linked above
(https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):


--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11693,6 +11693,10 @@ retry:
Assert(reqLen <= readLen);
*readTLI = curFileTLI;
+
+ if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
readBuf))
+   goto next_record_is_invalid;
+
return readLen;
   next_record_is_invalid:


What is the point of adding this XLogReaderValidatePageHeader() call?
The caller of this callback function, ReadPageInternal(), checks the
page header anyway. Earlier in this thread, you said:


Without the lines, server actually fails to start replication.

(I try to remember the details...)

The difference is whether the function can cause retry for the
same portion of a set of continued records (without changing the
plugin API). XLogPageRead can do that. On the other hand all
ReadPageInternal can do is just letting the caller ReadRecords
retry from the beginning of the set of continued records since
the caller side handles only complete records.

In the failure case, in XLogPageRead, when read() fails, it can
try the next source midst of a continued records. On the other
hand if the segment was read but it was recycled one, it passes
"success" to ReadPageInternal and leads to retring from the
beginning of the recrod. Infinite loop.


I see. You have the same problem if you have a WAL file that's corrupt 
in some other way, but one of the sources would have a correct copy. 
ValidXLogPageHeader() only checks the page header, after all. But unlike 
a recycled WAL segment, that's not supposed to happen as part of normal 
operation, so I guess we can live with that.



Calling XLogReaderValidatePageHeader in ReadPageInternal is
redundant, but removing it may be interface change of xlogreader
plugin and I am not sure that is allowed.


We should avoid doing that in back-branches, at least. But in 'master', 
I wouldn't mind redesigning the API. Dealing with all the retrying is 
pretty complicated as it is, if we can simplify that somehow, I think 
that'd be good.


- Heikki

69 matches

Mail list logo