date:20191007

Re: Change atoi to strtol in same place

2019-10-07 Thread Joe Nelson

David Rowley wrote:
> It's not for this patch to decide what ports clients can connect to
> PostgreSQL on. As far as I'm aware Windows has no restrictions on what
> ports unprivileged users can listen on. I think we're likely going to
> upset a handful of people if we block the client tools from connecting
> to ports < 1024.

Makes sense. We can instead allow any port number and if it errors at
connection time then the user will find out at that point.

> This part seems over-complex to me. What's wrong with just returning a
> bool and if the caller gets "false", then just have it emit the error,
> such as:
> 
> "compression level must be between %d and %d"
> 
> I see Michael's patch is adding this new return type, but really, is
> there a good reason why we need to do something special when the user
> does not pass in an integer?

Displaying the range when given a non-integer input could be misleading.
For instance, suppose we put as little restriction on the port number
range as possible, enforcing only that it's positive, between 1 and
INT_MAX.  If someone passes a non-integer value, they would get a
message like:

invalid port number: must be an integer in range 1..2147483647

Sure, the parsing code will accept such a big number, but we don't want
to *suggest* the number. Notice if the user had passed a well-formed
number for the port it's unlikely to be greater than INT_MAX, so they
wouldn't have to see this weird message.

Perhaps you weren't suggesting to remove the upper limit from port
checking, just to change the lower limit from 1024 back to 1. In that
case we could keep it capped at 65535 and the error message above would
be OK.

Other utilities do have command line args that are allowed the whole
non-negative (but signed) int range, and their error message would show
the big number. It's not misleading in that case, but a little
ostentatious.

> Current patch:
> $ pg_dump -Z blah
> invalid compression level: could not parse 'blah' as integer
> 
> I propose:
> $ pg_dump -Z blah
> compression level must be an integer in range 0..9
> 
> This might save a few round trips, e.g the current patch will do:

I do see your reasoning that we're teasing people with a puzzle they
have to solve with multiple invocations. On the other hand, passing a
non-number for the compression level is pretty strange, and perhaps
explicitly calling out the mistake might make someone look more
carefully at what they -- or their script -- is doing.

> The translation string here must be consistent over all platforms. I
> think this will cause issues if the translation string uses %ld and
> the platform requires %lld?

A very good and subtle point. I'll change it to %lld so that a single
format string will work everywhere.

> I think you should maybe aim for 2 patches here.
> 
> Patch 1: ...
> 
> Patch 2: Add additional validation where we don't currently do
> anything. e.g pg_dump -j
> 
> We can then see if there's any merit in patch 2 of if it's adding more
> complexity than is really needed.

Are you saying that my current patch adds extra constraints for
pg_dump's -j argument, or that in the future we could do that? Because I
don't believe the current patch adds any appreciable complexity for that
particular argument, other than ensuring the value is positive, which
doesn't seem too contentious.

Maybe we can see whether we can reach consensus on the current
parse-and-check combo patch, and if discussion drags on much longer then
try to split it up?

> I also think some compilers won't like:
> 
> + compressLevel = parsed;
> 
> given that "parsed" is int64 and "compressLevel" is int, surely some
> compilers will warn of possible truncation? An explicit cast to int
> should fix those

Good point, I bet some compilers (justly) warn about truncation. We've
checked the range so I'll add an explicit cast.

> or you could consider just writing a version of the function for int32
> and int64 and directly passing in the variable to be set.

One complication is that the destination values are often int rather
than int32, and I don't know their width in general (probably 32, maybe
16, but *possibly* 64?).  The pg_strtoint64_range() function with range
argument of INT_MAX is flexible enough to handle whatever situation we
encounter. Am I overthinking this part?

-- 
Joe Nelson  https://begriffs.com

Re: Ordering of header file inclusion

2019-10-07 Thread Amit Kapila

On Wed, Oct 2, 2019 at 2:57 PM vignesh C  wrote:
>
> Hi,
>
> I noticed that some of the header files inclusion is not ordered as
> per the usual standard that is followed.
> The attached patch contains the fix for the order in which the header
> files are included.
> Let me know your thoughts on the same.
>

+1.   I think this will make an order of header inclusions consistent
throughout code.   One thing which will be slightly tricky is we might
not be able to back-patch this as some of this belongs to a recent
version(s) and others to older versions as well.   OTOH, I have not
investigated how much of this is relevant to back branches.  I think
most of these will apply to 12, but I am not sure if it is worth the
effort to segregate the changes which apply to back branches.  What do
you think?

 Few minor comments after a quick read:
 #include "lib/ilist.h"
-
+#include "miscadmin.h"

I think we shouldn't remove the extra line as part of the above change.

--- a/src/bin/psql/variables.c
+++ b/src/bin/psql/variables.c
@@ -8,10 +8,8 @@
 #include "postgres_fe.h"

 #include "common.h"
-#include "variables.h"
-
 #include "common/logging.h"
-
+#include "variables.h"

Same as above.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Updated some links which are not working with new links

2019-10-07 Thread vignesh C

On Tue, Oct 8, 2019 at 11:06 AM Michael Paquier  wrote:
>
> On Mon, Oct 07, 2019 at 02:14:05PM +0530, vignesh C wrote:
> > Sorry Michael for the miscommunication, the patch was present in the
> > first mail of this mail thread.
> > I'm re-attaching the patch in this mail.
> > Let me know if anything is required.
>
> Thanks.  It looks like I have been able to miss the actual patch, and
> got confused as there were two threads about more or less the same
> matter.  The links were redirected to an https equivalent, so applied
> with that.
Thanks Michael.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

2019-10-07 Thread Amit Kapila

On Tue, Oct 8, 2019 at 12:57 AM Robert Haas  wrote:
>
> On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila  wrote:
> > We wanted to decide how a parallel vacuum should use memory?  Can each 
> > worker consume maintenance_work_mem to clean up the gin Index or all 
> > workers should use no more than maintenance_work_mem?  We were thinking of 
> > later but before we decide what is the right behavior for parallel vacuum, 
> > I thought it is better to once discuss if the current memory usage model is 
> > right.
>
> Well, I had the idea when we were developing parallel query that we
> should just ignore the problem of work_mem: every node can use X
> amount of work_mem, and if there are multiple copies of the node in
> multiple processes, then you probably end up using more memory.  I
> have been informed by Thomas Munro -- in very polite terminology --
> that this was a terrible decision which is causing all kinds of
> problems for users. I haven't actually encountered that situation
> myself, but I don't doubt that it's an issue.
>
> I think it's a lot easier to do better when we're talking about
> maintenance commands rather than queries. Maintenance operations
> typically don't have the problem that queries do with an unknown
> number of nodes using memory; you typically know all of your memory
> needs up front.  So it's easier to budget that out across workers or
> whatever. It's a little harder in this case, because you could have
> any number of GIN indexes (1 to infinity) and the amount of memory you
> can use depends on not only on how many of them there are but,
> presumably also, the number of those that are going to be vacuumed at
> the same time.  So you might have 8 indexes, 3 workers, and 2 of the
> indexes are GIN. In that case, you know that you can't have more than
> 2 GIN indexes being processed at the same time, but it's likely to be
> only one, and maybe with proper scheduling you could make it sure it's
> only one. On the other hand, if you dole out the memory assuming it's
> only 1, what happens if you start that one, then process all 6 of the
> non-GIN indexes, and that one isn't done yet. I guess you could wait
> to start cleanup on the other GIN indexes until the previous index
> cleanup finishes, but that kinda sucks too. So I'm not really sure how
> to handle this particular case. I think the principle of dividing up
> the memory rather than just using more is probably a good one, but
> figuring out exactly how that should work seems tricky.
>

Yeah and what if we have workers equal to indexes, so doing the clean
up of Gin indexes serially (wait for the prior index to finish before
starting the clean up of next Gin index) in that case would be bad
too.  I think we can do something simple like choose minimum among
'number of Gin Indexes', 'number of workers requested for parallel
vacuum' and 'number of max_parallel_maintenance_workers' and then
divide the maintenance_work_mem by that to get the memory used by each
of the Gin indexes.  I think it has some caveats like we might not be
able to launch the number of workers we decided and in that case we
probably could have computed bigger value of work_mem that can be used
by Gin indexes.   I think whatever we pick here can be good for some
cases and not-so-good for others, so why not pick something general
and simple.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

2019-10-07 Thread Amit Kapila

On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan  wrote:
>
> On Mon, Oct 7, 2019 at 12:28 PM Robert Haas  wrote:
> > I would say that sucks, because it makes it harder to set
> > maintenance_work_mem correctly.  Not sure how hard it would be to fix,
> > though.
>
> ginInsertCleanup() may now be the worst piece of code in the entire
> tree, so no surprised that it gets this wrong too.
>
> 2016's commit e2c79e14d99 ripped out the following comment about the
> use of maintenance_work_mem by ginInsertCleanup():
>
> @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate,
>   * Is it time to flush memory to disk? Flush if we are at the end of
>   * the pending list, or if we have a full row and memory is getting
>   * full.
> - *
> - * XXX using up maintenance_work_mem here is probably unreasonably
> - * much, since vacuum might already be using that much.
>   */
>
> ISTM that the use of maintenance_work_mem wasn't given that much
> thought originally.
>

One idea to something better could be to check, if there is a GIN
index on a table, then use 1/4 (25% or whatever) of
maintenance_work_mem for GIN indexes and 3/4 (75%) of
maintenance_work_mem for collection dead tuples.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Updated some links which are not working with new links

2019-10-07 Thread Michael Paquier

On Mon, Oct 07, 2019 at 02:14:05PM +0530, vignesh C wrote:
> Sorry Michael for the miscommunication, the patch was present in the
> first mail of this mail thread.
> I'm re-attaching the patch in this mail.
> Let me know if anything is required.

Thanks.  It looks like I have been able to miss the actual patch, and
got confused as there were two threads about more or less the same
matter.  The links were redirected to an https equivalent, so applied
with that.
--
Michael


signature.asc
Description: PGP signature

Re: Non-Active links being referred in our source code

2019-10-07 Thread Michael Paquier

On Mon, Oct 07, 2019 at 05:11:40PM +0200, Juan José Santamaría Flecha wrote:
> About the broken links in win32_port.h, they are all referring to
> ntstatus. As for first case that shows the code groups, there is an up
> to date alternative. There is also an alternative for second case that
> points to their codes and descriptions. On the other hand, the last
> case is quoting a document that is no longer available, I would
> suggest to rephrase the comment, thus eliminating the quote.
> 
> Please find attached a patch with the proposed alternatives.

Thanks Juan for the patch.  I have checked your suggestions and it
looks good to me, so committed.  Good idea to tell about
WIN32_NO_STATUS.  I have noticed one typo on the way.
--
Michael


signature.asc
Description: PGP signature

Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)

2019-10-07 Thread Noah Misch

On Mon, Oct 07, 2019 at 09:56:20PM +0200, Peter Eisentraut wrote:
> On 2019-10-06 04:20, Noah Misch wrote:
> >  elog(ERROR, \
> >   "%s yielded %u, expected %s (%u) in file \"%s\" line %u", \
> >   #result_expr, result, #expected_expr, expected, __FILE__, __LINE__); \

> I've been meaning to propose some JUnit-style more-specific Assert
> variants such as AssertEquals for this reason.  But as Tom writes
> nearby, it should be a straight wrapper around Assert, not elog.  So
> these need to be named separately.

Agreed.

> Btw., JUnit uses the ordering convention assertEquals(expected, actual),
> whereas Perl Test::More uses is(actual, expected).  Let's make sure we
> pick something and stick with it.

Since we write "if (actual == expected)", I prefer f(actual, expected).  CUnit
uses CU_ASSERT_EQUAL(actual, expected).

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-07 Thread Michael Paquier

On Mon, Oct 07, 2019 at 03:31:45PM +0300, Alexey Kondratov wrote:
> On 07.10.2019 4:06, Michael Paquier wrote:
>> - --dry-run coverage is basically the same with the local and remote
>> modes, so it seems like a waste of resource to run it for all the
>> tests and all the modes.
> 
> My point was to test --dry-run + --write-recover-conf in remote, since the
> last one may cause recovery configuration write without doing any actual
> work, due to some wrong refactoring for example.

Yes, that's possible.  I agree that it would be nice to have an extra
test for that, still I would avoid making that run in all the tests.

>> Patch v3-0002 also had a test to make sure that the source server is
>> shut down cleanly before using it.  I have included that part as
>> well, as the flow feels right.
>> 
>> So, Alexey, what do you think?
> 
> It looks good for me. Two minor remarks:
> 
> +    # option combinations.  As the code paths taken by those tests
> +    # does not change for the "local" and "remote" modes, just run them
> 
> I am far from being fluent in English, but should it be 'do not change'
> instead?

That was wrong, fixed.

> +command_fails(
> +    [
> +        'pg_rewind', '--target-pgdata',
> +        $primary_pgdata, '--source-pgdata',
> +        $standby_pgdata, 'extra_arg1'
> +    ],
> 
> Here and below I would prefer traditional options ordering "'--key',
> 'value'". It should be easier to recognizefrom the reader perspective:

While I agree with you, the perl indentation we use has formatted the
code this way.  There is also an argument for keeping it at the end
for clarity (I recall that Windows also requires extra args to be
last when parsing options).  Anyway, I have used a trick by adding
--debug to reach command, which is still useful, so the order of the
options is better at the end.
--
Michael


signature.asc
Description: PGP signature

RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-07 Thread Smith, Peter

From: Amit Kapila  Sent: Friday, 4 October 2019 4:50 PM

>>How about I just define them both the same?
>>#define INIT_ALL_ELEMS_ZERO     {0}
>>#define INIT_ALL_ELEMS_FALSE    {0}
>
>I think using one define would be preferred, but you can wait and see if 
>others prefer defining different macros for the same thing.

While nowhere near unanimous, it seems majority favour using a macro (if only 
to protect the unwary and document the behaviour).
And of those in favour of macros, using INIT_ALL_ELEMS_ZERO even for bool array 
is a clear preference.

So, please find attached the updated patch, which now has just 1 macro.

Kind Regards
--
Peter Smith
Fujitsu Australia


c99_init_nulls_4.patch
Description: c99_init_nulls_4.patch

Re: PATCH: Add uri percent-encoding for binary data

2019-10-07 Thread Isaac Morland

On Mon, 7 Oct 2019 at 03:15, Anders Åstrand  wrote:

> Hello
>
> Attached is a patch for adding uri as an encoding option for
> encode/decode. It uses what's called "percent-encoding" in rfc3986
> (https://tools.ietf.org/html/rfc3986#section-2.1).
>
> The background for this patch is that I could easily build urls in
> plpgsql, but doing the actual encoding of the url parts is painfully
> slow. The list of available encodings for encode/decode looks quite
> arbitrary to me, so I can't see any reason this one couldn't be in
> there.
>
> In modern web scenarios one would probably most likely want to encode
> the utf8 representation of a text string for inclusion in a url, in
> which case correct invocation would be ENCODE(CONVERT_TO('some text in
> database encoding goes here', 'UTF8'), 'uri'), but uri
> percent-encoding can of course also be used for other text encodings
> and arbitrary binary data.
>

This seems like a useful idea to me. I've used the equivalent in Python and
it provides more options:

https://docs.python.org/3/library/urllib.parse.html#url-quoting

I suggest reviewing that documentation there, because there are a few
details that need to be checked carefully. Whether or not space should be
encoded as plus and whether certain byte values should be exempt from
%-encoding is something that depends on the application. Unfortunately, as
far as I can tell there isn't a single version of URL encoding that
satisfies all situations (thus explaining the complexity of the Python
implementation). It might be feasible to suppress some of the Python
options (I'm wondering about the safe= parameter) but I'm pretty sure you
at least need the equivalent of quote and quote_plus.

Re: maintenance_work_mem used by Vacuum

2019-10-07 Thread Peter Geoghegan

On Mon, Oct 7, 2019 at 12:28 PM Robert Haas  wrote:
> I would say that sucks, because it makes it harder to set
> maintenance_work_mem correctly.  Not sure how hard it would be to fix,
> though.

ginInsertCleanup() may now be the worst piece of code in the entire
tree, so no surprised that it gets this wrong too.

2016's commit e2c79e14d99 ripped out the following comment about the
use of maintenance_work_mem by ginInsertCleanup():

@@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate,
  * Is it time to flush memory to disk? Flush if we are at the end of
  * the pending list, or if we have a full row and memory is getting
  * full.
- *
- * XXX using up maintenance_work_mem here is probably unreasonably
- * much, since vacuum might already be using that much.
  */

ISTM that the use of maintenance_work_mem wasn't given that much
thought originally.

-- 
Peter Geoghegan

Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)

2019-10-07 Thread Andres Freund

Hi,

On 2019-10-07 21:58:08 +0200, Peter Eisentraut wrote:
> On 2019-10-07 19:57, Tom Lane wrote:
> > I'd just put them all in c.h.  I see no reason why a new header
> > is helpful.
> 
> Assert stuff is already in there, but surely stuff that calls elog()
> doesn't belong in there?

Make it call an ExpectFailure() (or similar) function that takes the
various parameters (expression, file, line, severity, format string,
args) as an argument? And that's implemented in terms of elog() in the
backend, and pg_log* + abort() (when appropriate) in the frontend?

Greetings,

Andres Freund

Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)

2019-10-07 Thread Andres Freund

Hi,

On 2019-10-07 13:57:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-10-05 17:08:38 +, Noah Misch wrote:
> >> Report test_atomic_ops() failures consistently, via macros.
>
> > I wonder if we should put these (and a few more, for other types), into
> > a more general place. I would like to have them for writing both tests
> > like regress.c:test_atomic_ops(), and for writing assertions that
> > actually display useful error messages.  For the former it makes sense
> > to ERROR out, for the latter they ought to abort, as currently.
>
> IMO, anything named like "assert" ought to act like Assert does now,
> ie (1) it's a no-op in a non-assert build and (2) you get an abort()
> on failure.

No disagreement at all.


> No strong opinions about what the test-and-elog variant
> should be called -- but it seems like we might have some difficulty
> agreeing on what the appropriate error level is for that.  If it's
> morally like an Assert except we want it on all the time, should
> it be PANIC?

Perhaps it ought to just take elevel as a parameter? Could even be
useful for debugging...


> What will happen in frontend code?

Hm. Map to pg_log_*, and abort() if it's an erroring elevel?


> > Seems like putting ASSERT_{EQ,LT,...}_{U32,S32,...} (or Assert_Eq_...,
> > but that'd imo look weirder than the inconsistency) into c.h would make
> > sense, and EXPECT_ somewhere in common/pg_test.h or such?
>
> I'd just put them all in c.h.  I see no reason why a new header
> is helpful.

WFM.

Greetings,

Andres Freund

Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)

2019-10-07 Thread Tom Lane

Peter Eisentraut  writes:
> On 2019-10-07 19:57, Tom Lane wrote:
>> I'd just put them all in c.h.  I see no reason why a new header
>> is helpful.

> Assert stuff is already in there, but surely stuff that calls elog()
> doesn't belong in there?

True, though I had the impression that Andres wanted to propose things
that would work in either frontend or backend, presumably with different
implementations.  You could argue it either way as to whether to have
that in c.h (with an #ifdef) or separately in postgres.h and
postgres_fe.h.

regards, tom lane

Re: Missed check for too-many-children in bgworker spawning

2019-10-07 Thread Tom Lane

Robert Haas  writes:
> On Sun, Oct 6, 2019 at 1:17 PM Tom Lane  wrote:
>> The attached proposed patch fixes this by making bgworker spawning
>> include a canAcceptConnections() test.

> I think it used to work this way -- not sure if it was ever committed
> this way, but it at least did during development -- and we ripped it
> out because somebody (Magnus?) pointed out that if you got close to
> the connection limit, you could see parallel queries start failing,
> and that would suck. Falling back to non-parallel seems more OK in
> that situation than actually failing.

I'm not following your point?  Whatever you might think the appropriate
response is, I'm pretty sure "elog(FATAL) out of the postmaster" is not
it.  Moreover, we have to --- and already do, I trust --- deal with
other resource-exhaustion errors in exactly the same code path, notably
fork(2) failure which we simply can't predict or prevent.  Doesn't the
parallel query logic already deal sanely with failure to obtain as many
workers as it wanted?

regards, tom lane

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Bruce Momjian

On Mon, Oct  7, 2019 at 09:40:22PM +0200, Tomas Vondra wrote:
> On Mon, Oct 07, 2019 at 10:22:22AM -0400, Bruce Momjian wrote:
> > > So essentially the argument is - if you have hw crypto acceleration (aka
> > > AES-NI), then the overhead of all-cluster encryption is so low it does
> > > not make sense to bother with lowering it with column encryption.
> > 
> > Yes, I think that is true.  Column-level encryption can be useful in
> > giving different people control of the keys, but I think that feature
> > should be developed at the SQL level so clients can unlock the key and
> > backups include the encryption keys.
> > 
> 
> FWIW that's not how the column encryption (at least in Oracle works). It
> uses the same encryption keys (with 2-tier key architecture), and the
> keys are stored in a wallet. The user only supplies a passphrase (well,
> a DBA does that, because it happens only once after the instance starts).
> 
> Not sure what exactly you mean by "SQL level" but I agree it's clearly
> much higher up the stack than encryption at the block level.

Right, what I was saying is that column encryption where they keys are
unlocked by the administrator are really only useful to reduce
encryption overhead, and I think we will find it just isn't worth the
API complexity to allow that.

Per-user keys are useful for cases beyond performance, but require
SQL-level control.

> > > IMO that's a good argument against column encryption (at least when used
> > > to reduce overhead), although 10% still quite a bit.
> > 
> > I think that test was a worst-case one and I think it needs to be
> > optimized before we draw any conclusions.
> 
> What test? I was really referring to the PDF, which talks about 10%
> threshold for the tablespace encryption. And in another section it says
> 
>  Internal benchmark tests and customers reported a performance impact of 4
>  to 8% in end-user response time, and an increase of 1 to 5% in CPU usage.
> 
> Of course, this is not on PostgreSQL, but I'd expect to have comparable
> overhead, despite architectural differences. Ultimately, even if it's 15
> or 20%, the general rule is likely to remain the same, i.e. column
> encryption has significantly higher overhead, and can only beat
> tablespace encryption when very small fraction of columns is encrypted.

Right, and I doubt it will be worth it, but I think we need to complete
all-cluster encryption and then run some tests so see what the overhead
is.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)

2019-10-07 Thread Peter Eisentraut

On 2019-10-07 19:57, Tom Lane wrote:
> I'd just put them all in c.h.  I see no reason why a new header
> is helpful.

Assert stuff is already in there, but surely stuff that calls elog()
doesn't belong in there?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)

2019-10-07 Thread Peter Eisentraut

On 2019-10-06 04:20, Noah Misch wrote:
>> Seems like putting ASSERT_{EQ,LT,...}_{U32,S32,...} (or Assert_Eq_...,
>> but that'd imo look weirder than the inconsistency) into c.h would make
>> sense, and EXPECT_ somewhere in common/pg_test.h or such?
> 
> Sounds reasonable.  For broader use, I would include the expected value, not
> just expected_expr:
> 
>  elog(ERROR, \
>   "%s yielded %u, expected %s (%u) in file \"%s\" line %u", \
>   #result_expr, result, #expected_expr, expected, __FILE__, __LINE__); \
> 
> I didn't do that for the atomics tests, where expected_expr is always trivial.
> The codebase has plenty of Assert(x == y) where either of x or y could have
> the surprising value.

I've been meaning to propose some JUnit-style more-specific Assert
variants such as AssertEquals for this reason.  But as Tom writes
nearby, it should be a straight wrapper around Assert, not elog.  So
these need to be named separately.

Btw., JUnit uses the ordering convention assertEquals(expected, actual),
whereas Perl Test::More uses is(actual, expected).  Let's make sure we
pick something and stick with it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Missed check for too-many-children in bgworker spawning

2019-10-07 Thread Robert Haas

On Sun, Oct 6, 2019 at 1:17 PM Tom Lane  wrote:
> Over in [1] we have a report of a postmaster shutdown that seems to
> have occurred because some client logic was overaggressively spawning
> connection requests, causing the postmaster's child-process arrays to
> be temporarily full, and then some parallel query tried to launch a
> new bgworker process.  The postmaster's bgworker-spawning logic lacks
> any check for the arrays being full, so when AssignPostmasterChildSlot
> failed to find a free slot, kaboom!
>
> The attached proposed patch fixes this by making bgworker spawning
> include a canAcceptConnections() test.  That's perhaps overkill, since
> we really just need to check the CountChildren() total; but I judged
> that going through that function and having it decide what to test or
> not test was a better design than duplicating the CountChildren() test
> elsewhere.
>
> I'd first imagined also replacing the one-size-fits-all check
>
> if (CountChildren(BACKEND_TYPE_ALL) >= MaxLivePostmasterChildren())
> result = CAC_TOOMANY;
>
> with something like
>
> switch (backend_type)
> {
> case BACKEND_TYPE_NORMAL:
> if (CountChildren(backend_type) >= 2 * MaxConnections)
> result = CAC_TOOMANY;
> break;
> case BACKEND_TYPE_AUTOVAC:
> if (CountChildren(backend_type) >= 2 * autovacuum_max_workers)
> result = CAC_TOOMANY;
> break;
> ...
> }
>
> so as to subdivide the pool of child-process slots and prevent client
> requests from consuming slots meant for background processes.  But on
> closer examination that's not really worth the trouble, because this
> pool is already considerably bigger than MaxBackends; so even if we
> prevented a failure here we could still have bgworker startup failure
> later on when it tries to acquire a PGPROC.
>
> Barring objections, I'll apply and back-patch this soon.

I think it used to work this way -- not sure if it was ever committed
this way, but it at least did during development -- and we ripped it
out because somebody (Magnus?) pointed out that if you got close to
the connection limit, you could see parallel queries start failing,
and that would suck. Falling back to non-parallel seems more OK in
that situation than actually failing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: Add uri percent-encoding for binary data

2019-10-07 Thread Bruce Momjian

On Mon, Oct  7, 2019 at 09:14:38AM +0200, Anders Åstrand wrote:
> Hello
> 
> Attached is a patch for adding uri as an encoding option for
> encode/decode. It uses what's called "percent-encoding" in rfc3986
> (https://tools.ietf.org/html/rfc3986#section-2.1).

Oh, that's a cool idea.  Can you add it to the commit-fest?

https://commitfest.postgresql.org/25/

---


> 
> The background for this patch is that I could easily build urls in
> plpgsql, but doing the actual encoding of the url parts is painfully
> slow. The list of available encodings for encode/decode looks quite
> arbitrary to me, so I can't see any reason this one couldn't be in
> there.
> 
> In modern web scenarios one would probably most likely want to encode
> the utf8 representation of a text string for inclusion in a url, in
> which case correct invocation would be ENCODE(CONVERT_TO('some text in
> database encoding goes here', 'UTF8'), 'uri'), but uri
> percent-encoding can of course also be used for other text encodings
> and arbitrary binary data.
> 
> Regards,
> Anders

> diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
> index 7293d66de5..33cf7bb57c 100644
> --- a/src/backend/utils/adt/encode.c
> +++ b/src/backend/utils/adt/encode.c
> @@ -512,6 +512,131 @@ esc_dec_len(const char *src, unsigned srclen)
>   return len;
>  }
>  
> +/*
> + * URI percent encoding
> + *
> + * Percent encodes all byte values except the unreserved ASCII characters as 
> per RFC3986.
> + */
> +
> +static const char upper_hex_digits[] = "0123456789ABCDEF";
> +
> +static unsigned
> +uri_encode(const char *src, unsigned srclen, char *dst)
> +{
> + char*d = dst;
> +
> + for (const char *s = src; s < src + srclen; s++)
> + {
> + if ((*s >= 'A' && *s <= 'Z') ||
> + (*s >= 'a' && *s <= 'z') ||
> + (*s >= '0' && *s <= '9') ||
> + *s == '-' ||
> + *s == '_' ||
> + *s == '.' ||
> + *s == '~')
> + {
> + *d++ = *s;
> + }
> + else
> + {
> + *d++ = '%';
> + *d++ = upper_hex_digits[(*s >> 4) & 0xF];
> + *d++ = upper_hex_digits[*s & 0xF];
> + }
> + }
> + return d - dst;
> +}
> +
> +static unsigned
> +uri_decode(const char *src, unsigned srclen, char *dst)
> +{
> + const char *s = src;
> + const char *srcend = src + srclen;
> + char*d = dst;
> + charval;
> +
> + while (s < srcend)
> + {
> + if (*s == '%')
> + {
> + if (s > srcend - 3) {
> + /* This will never get triggered since 
> uri_dec_len already takes care of validation
> +  */
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("invalid uri percent 
> encoding"),
> +  errhint("Input data ends 
> prematurely.")));
> + }
> +
> + /* Skip '%' */
> + s++;
> +
> + val = get_hex(*s++) << 4;
> + val += get_hex(*s++);
> + *d++ = val;
> + }
> + else
> + {
> + *d++ = *s++;
> + }
> + }
> + return d - dst;
> +}
> +
> +static unsigned
> +uri_enc_len(const char *src, unsigned srclen)
> +{
> + int len = 0;
> +
> + for (const char *s = src; s < src + srclen; s++)
> + {
> + if ((*s >= 'A' && *s <= 'Z') ||
> + (*s >= 'a' && *s <= 'z') ||
> + (*s >= '0' && *s <= '9') ||
> + *s == '-' ||
> + *s == '_' ||
> + *s == '.' ||
> + *s == '~')
> + {
> + len++;
> + }
> + else
> + {
> + len += 3;
> + }
> + }
> + return len;
> +}
> +
> +static unsigned
> +uri_dec_len(const char *src, unsigned srclen)
> +{
> + const char *s = src;
> + const char *srcend = src + srclen;
> + int len = 0;
> +
> + while (s < srcend)
> + {
> + if (*s == '%')
> + {
> + if (s > srcend - 3) {
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("invalid uri percent 
> encoding"),
> +  errhint("Input data ends 
>

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska  wrote:
> However the design doesn't seem to be stable enough at the
> moment for coding to make sense.

Well, I think the question is whether working further on your patch
could produce some things that everyone would agree are a step
forward.  If every iota of that patch is garbage dredged up from the
depths of the Mos Eisley sewers, then let's forget about it, but I
don't think that's the case. As I said on the thread about that patch,
and have also said here, what I learned from looking at that patch is
that the system probably needs some significant restructuring before
there's any hope of incorporating encryption in a reasonably-sized,
reasonably clean patch.  For example, some files need to be written a
block at a time instead of a character at a time. The idea just
discussed -- changing the CLOG page format to leave room for the
encryption nonce and a checksum -- also fall into that category. I
think there are probably a number of others.

No matter what anybody thinks about whether we should have one key,
multiple keys, passwords inside the database, passwords outside the
database, whatever ... that kind of restructuring work has got to be
done first. And it seems like by having all this discussion about the
design, we're basically getting to a situation where we're making no
progress on that stuff. So that's bad. There's nothing *wrong* with
talking about how many keys we had and how key management ought to
work and where passwords should be stored, and we need to make sure
that whatever we do initially doesn't close the door to doing more and
better things later. But, if those discussions have the effect of
blocking work on the basic infrastructure tasks that need to be done,
that's actually counterproductive at this stage.

We should all put our heads together and agree that however we think
key management ought to be handled, it'll be a lot easier to get our
preferred form of key management into PostgreSQL if, while that
discussion rages on, we knocked down some of the infrastructure
problems that *absolutely any patch* for this kind of feature is
certain to face.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Tomas Vondra

On Mon, Oct 07, 2019 at 10:22:22AM -0400, Bruce Momjian wrote:

On Sat, Oct  5, 2019 at 09:13:59PM +0200, Tomas Vondra wrote:

On Fri, Oct 04, 2019 at 08:14:44PM -0400, Bruce Momjian wrote:
> On Sat, Oct  5, 2019 at 12:54:35AM +0200, Tomas Vondra wrote:
> > On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
> > > For full-cluster TDE with AES-NI-enabled, the performance impact is
> > > usually ~4%, so doing anything more granular doesn't seem useful.  See
> > > this PGCon presentation with charts:
> > >
> > >  https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
> > >
> > > Having anthing more fine-grained that all-cluster didn't seem worth it.
> > > Using per-user keys is useful, but also much harder to implement.
> > >
> >
> > Not sure I follow. I thought you are asking why Oracle apparently does
> > not leverage AES-NI for column-level encryption (at least according to
> > the document I linked)? And I don't know why that's the case.
>
> No, I read it as Oracle saying that there isn't much value to per-column
> encryption if you have crypto hardware acceleration, because the
> all-cluster encryption overhead is so minor.
>

So essentially the argument is - if you have hw crypto acceleration (aka
AES-NI), then the overhead of all-cluster encryption is so low it does
not make sense to bother with lowering it with column encryption.

Yes, I think that is true.  Column-level encryption can be useful in
giving different people control of the keys, but I think that feature
should be developed at the SQL level so clients can unlock the key and
backups include the encryption keys.

FWIW that's not how the column encryption (at least in Oracle works). It
uses the same encryption keys (with 2-tier key architecture), and the
keys are stored in a wallet. The user only supplies a passphrase (well,
a DBA does that, because it happens only once after the instance starts).

Not sure what exactly you mean by "SQL level" but I agree it's clearly
much higher up the stack than encryption at the block level.

IMO that's a good argument against column encryption (at least when used
to reduce overhead), although 10% still quite a bit.

I think that test was a worst-case one and I think it needs to be
optimized before we draw any conclusions.

What test? I was really referring to the PDF, which talks about 10%
threshold for the tablespace encryption. And in another section it says

 Internal benchmark tests and customers reported a performance impact of 4
 to 8% in end-user response time, and an increase of 1 to 5% in CPU usage.

Of course, this is not on PostgreSQL, but I'd expect to have comparable
overhead, despite architectural differences. Ultimately, even if it's 15
or 20%, the general rule is likely to remain the same, i.e. column
encryption has significantly higher overhead, and can only beat
tablespace encryption when very small fraction of columns is encrypted.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: maintenance_work_mem used by Vacuum

2019-10-07 Thread Robert Haas

On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila  wrote:
> As per docs [1] (see maintenance_work_mem), the maximum amount of memory used 
> by the Vacuum command must be no more than maintenance_work_mem.  However, 
> during the review/discussion of the "parallel vacuum" patch [2], we observed 
> that it is not true.  Basically, if there is a gin index defined on a table, 
> then the vacuum on that table can consume up to 2 * maintenance_work_mem 
> memory space.  The vacuum can use maintenance_work_mem memory space to keep 
> track of dead tuples and another maintenance_work_mem memory space to move 
> tuples from pending pages into regular GIN structure (see ginInsertCleanup).  
>  The behavior related to Gin index consuming extra maintenance_work_mem 
> memory is introduced by commit  e2c79e14d998cd31f860854bc9210b37b457bb01.  It 
> is not clear to me if this is acceptable behavior and if so, shouldn't we 
> document it?

I would say that sucks, because it makes it harder to set
maintenance_work_mem correctly.  Not sure how hard it would be to fix,
though.

> We wanted to decide how a parallel vacuum should use memory?  Can each worker 
> consume maintenance_work_mem to clean up the gin Index or all workers should 
> use no more than maintenance_work_mem?  We were thinking of later but before 
> we decide what is the right behavior for parallel vacuum, I thought it is 
> better to once discuss if the current memory usage model is right.

Well, I had the idea when we were developing parallel query that we
should just ignore the problem of work_mem: every node can use X
amount of work_mem, and if there are multiple copies of the node in
multiple processes, then you probably end up using more memory.  I
have been informed by Thomas Munro -- in very polite terminology --
that this was a terrible decision which is causing all kinds of
problems for users. I haven't actually encountered that situation
myself, but I don't doubt that it's an issue.

I think it's a lot easier to do better when we're talking about
maintenance commands rather than queries. Maintenance operations
typically don't have the problem that queries do with an unknown
number of nodes using memory; you typically know all of your memory
needs up front.  So it's easier to budget that out across workers or
whatever. It's a little harder in this case, because you could have
any number of GIN indexes (1 to infinity) and the amount of memory you
can use depends on not only on how many of them there are but,
presumably also, the number of those that are going to be vacuumed at
the same time.  So you might have 8 indexes, 3 workers, and 2 of the
indexes are GIN. In that case, you know that you can't have more than
2 GIN indexes being processed at the same time, but it's likely to be
only one, and maybe with proper scheduling you could make it sure it's
only one. On the other hand, if you dole out the memory assuming it's
only 1, what happens if you start that one, then process all 6 of the
non-GIN indexes, and that one isn't done yet. I guess you could wait
to start cleanup on the other GIN indexes until the previous index
cleanup finishes, but that kinda sucks too. So I'm not really sure how
to handle this particular case. I think the principle of dividing up
the memory rather than just using more is probably a good one, but
figuring out exactly how that should work seems tricky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-07 Thread Tom Lane

Noah Misch  writes:
> [ fetch-add-gcc-xlc-unify-v2.patch ]

This still fails on Apple's compilers.  The first failure I get is

ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I../../../src/include 
-I/usr/local/include -isysroot /Developer/SDKs/MacOSX10.5.sdk-c -o 
nodeHashjoin.o nodeHashjoin.c
/var/tmp//ccXUM8ep.s:449:Parameter error: r0 not allowed for parameter 2 (code 
as 0 not r0)

Line 449 of the assembly file is the addi in

LM87:
sync
lwarx   r0,0,r2
addir11,r0,1
stwcx.  r11,0,r2
bne $-12
isync

which I suppose comes out of PG_PPC_FETCH_ADD.  I find this idea of
constructing assembly code by string-pasting pretty unreadable and am not
tempted to try to debug it, but I don't immediately see why this doesn't
work when the existing s_lock.h code does.  I think that the assembler
error message is probably misleading: while it seems to be saying to
s/r0/0/ in the addi, gcc consistently uses "rN" syntax for the second
parameter elsewhere.  I do note that gcc never generates r0 as addi's
second parameter in several files I checked through, so maybe what it
means is "you need to use some other register"?  (Which would imply that
the constraint for this asm argument is too loose.)

I'm also wondering why this isn't following s_lock.h's lead as to
USE_PPC_LWARX_MUTEX_HINT and USE_PPC_LWSYNC.

regards, tom lane

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Antonin Houska

Robert Haas  wrote:

> On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian  wrote:
> > We spend a lot of time figuring out exactly how to safely encrypt WAL,
> > heap, index, and pgsql_tmp files.   The idea of doing this for another
> > 20 types of files --- to find a safe nonce, to be sure a file rewrite
> > doesn't reuse the nonce, figuring the API, crash recovery, forensics,
> > tool interface --- is something I would like to avoid.  I want to avoid
> > it not because I don't like work, but because I am afraid the code
> > impact and fragility will doom the feature.
> 
> I'm concerned about that, too, but there's no getting around the fact
> that there are a bunch of types of files and that they do all need to
> be dealt with. If we have a good scheme for doing that, hopefully
> extending it to additional types of files is not that bad, which would
> then spare us the trouble of arguing about each one individually, and
> also be more secure.
> 
> As I also said to Stephen, the people who are discussing this here
> should *really really really* be looking at the Cybertec patch instead
> of trying to invent everything from scratch -

Maybe it's enough to check the README.encryption file that [1] contains. Or
should I publish this (in shorter form) on the wiki [2] ?

> In fact, I can think of a couple pretty clear examples, like the stats
> files, which clearly contain user data.

Specifically this part was removed because I expected that [3] will be
committed earlier than the encryption. This expectation still seems to be
valid.

The thread on encryption was very alive when I was working on the last version
of our patch, so it was hard to participate in the discussion. I tried to
catch up later, and I think I could understand most of the problems. It became
clear that it's better to collaborate then to incorporate the new ideas into
[1]. I proposed to Masahiko Sawada that we're ready to collaborate on coding
and he agreed. However the design doesn't seem to be stable enough at the
moment for coding to make sense.

As for the design, I spent some time thinking about it, especially on the
per-table/tablespace keys (recovery issues etc.), but haven't invented
anything new. If there's anything useful I can do about the feature, I'll be
glad to help.

[1] https://commitfest.postgresql.org/25/2104/

[2] https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

[3] https://commitfest.postgresql.org/25/1708/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Magnus Hagander

On Mon, Oct 7, 2019 at 5:48 PM Bruce Momjian  wrote:

> On Mon, Oct  7, 2019 at 11:26:24AM -0400, Robert Haas wrote:
> > On Mon, Oct 7, 2019 at 11:02 AM Bruce Momjian  wrote:
> > > For clog, it is not append-only, and bytes are rewritten (from zero to
> > > non-zero), so there would have to be a new nonce for every clog file
> > > write to the file system.  We can store the nonce in a separate file,
> > > but the clog contents and nonce would have to be always synchronized or
> > > the file could not be properly read.  Basically every file we want to
> > > encrypt, needs this kind of study.
> >
> > Yeah. It's a big problem/project.
> >
> > Another approach to this problem would be to adjust the block format
> > to leave room for the nonce. If encryption is not in use, then those
> > bytes would just be zeroed or something. That would make upgrading a
> > bit tricky, but pg_upgrade could be taught to do the necessary
> > conversions for SLRUs without too much pain, I think.
>
> Yes, that is exactly the complexity we have deal with, both in terms of
> code complexity, reliability, and future maintenance.  Currently the
> file format is unchanged, but as we add more encrypted files, we might
> need to change it.  Fortunately, I think heap/index files don't need to
> change, so pg_upgrade will not require changes.
>

It does sound very similar to the problem of being able to add checksums to
the clog files (and other SLRUs). So if that can get done, it would help
both of those cases (if done right).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: identity column behavior in WHEN condition for BEFORE EACH ROW trigger

2019-10-07 Thread Peter Eisentraut

On 2019-10-03 16:08, Suraj Kharage wrote:
> It is been observed that when we define the generated columns in WHEN
> condition for BEFORE EACH ROW trigger then server throw an error from
> CreateTrigger().

> whereas, for identity columns, server allows us to create trigger for
> same and trigger gets invoked as defined. Is this behavior expected? or
> we need to restrict the identity columns in such scenario because anyone
> one override the identity column value in trigger.

This is per SQL standard: Identity columns are assigned before triggers,
generated columns are computed after BEFORE triggers.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: expressive test macros (was: Report test_atomic_ops() failures consistently, via macros)

2019-10-07 Thread Tom Lane

Andres Freund  writes:
> On 2019-10-05 17:08:38 +, Noah Misch wrote:
>> Report test_atomic_ops() failures consistently, via macros.

> I wonder if we should put these (and a few more, for other types), into
> a more general place. I would like to have them for writing both tests
> like regress.c:test_atomic_ops(), and for writing assertions that
> actually display useful error messages.  For the former it makes sense
> to ERROR out, for the latter they ought to abort, as currently.

IMO, anything named like "assert" ought to act like Assert does now,
ie (1) it's a no-op in a non-assert build and (2) you get an abort()
on failure.  No strong opinions about what the test-and-elog variant
should be called -- but it seems like we might have some difficulty
agreeing on what the appropriate error level is for that.  If it's
morally like an Assert except we want it on all the time, should
it be PANIC?  What will happen in frontend code?

> Seems like putting ASSERT_{EQ,LT,...}_{U32,S32,...} (or Assert_Eq_...,
> but that'd imo look weirder than the inconsistency) into c.h would make
> sense, and EXPECT_ somewhere in common/pg_test.h or such?

I'd just put them all in c.h.  I see no reason why a new header
is helpful.

regards, tom lane

Re: [PATCH] Do not use StdRdOptions in Access Methods

2019-10-07 Thread Dent John

Hi Nikolay,

I like the new approach. And I agree with the ambition — to split out the 
representation from StdRdOptions.

However, with that change, in the AM’s *options() function, it looks as if you 
could simply add new fields to the relopt_parse_elt list. That’s still not 
true, because parseRelOptions() will fail to find a matching entry, causing 
numoptions to be left zero, and an early exit made. (At least, that’s if I 
correctly understand how things work.)

I think that is fine as an interim limitation, because your change has not yet 
fully broken the connection to the boolRelOpts, intRelOpts, realRelOpts and 
stringRelOpts strutures. But perhaps a comment would help to make it clear. 
Perhaps add something like this above the tab[]: "When adding or changing a 
relopt in the relopt_parse_elt tab[], ensure the corresponding change is made 
to boolRelOpts, intRelOpts, realRelOpts or stringRelOpts."

denty.

> On 6 Oct 2019, at 14:45, Nikolay Shaplov  wrote:
> 
> Hi! I am starting a new thread as commitfest wants new thread for new patch.
> 
> This new thread is a follow up to an https://www.postgresql.org/message-id/
> 2620882.s52SJui4ql%40x200m thread, where I've been trying to get rid of 
> StdRdOpions structure, and now I've splitted the patch into smaller parts.
> 
> Read the quote below, to get what this patch is about
> 
>> I've been thinking about this patch and came to a conclusion that it
>> would be better to split it to even smaller parts, so they can be
>> easily reviewed, one by one. May be even leaving most complex
>> Heap/Toast part for later.
>> 
>> This is first part of this patch. Here we give each Access Method it's
>> own option binary representation instead of StdRdOptions.
>> 
>> I think this change is obvious. Because, first, Access Methods do not
>> use most of the values defined in StdRdOptions.
>> 
>> Second this patch make Options structure code unified for all core
>> Access Methods. Before some AM used StdRdOptions, some AM used it's own
>> structure, now all AM uses own structures and code is written in the
>> same style, so it would be more easy to update it in future.
>> 
>> John Dent, would you join reviewing this part of the patch? I hope it
>> will be more easy now...
> 
> And now I have a newer version of the patch, as I forgot to remove 
> vacuum_cleanup_index_scale_factor from StdRdOptions as it was used only in 
> Btree index and now do not used at all. New version fixes it.
> 
> -- 
> Software Developer: https://www.upwork.com/freelancers/~014a87e140ff02c0da
> Body-oriented Therapist: https://vk.com/nataraj_rebalancing  
> (Russian)

Re: format of pg_upgrade loadable_libraries warning

2019-10-07 Thread Bruce Momjian

On Fri, Oct  4, 2019 at 11:55:21PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Oct  4, 2019 at 05:40:08PM -0500, Justin Pryzby wrote:
> >> I would argue to include in 12.1, since 12 is what most everyone will use 
> >> for
> >> upgrades, and patch for .1 will help people upgrading for 11 of the next 12
> >> months.  (But, your patch is more general than mine).
> 
> > No, there might be tools that depend on the existing format, and this is
> > the first report of confusion I have read.
> 
> Translations will also lag behind any such change.  Speaking of which,
> it might be a good idea to include some translator: annotations to
> help translators understand what context these fragmentary phrases
> are used in.  I'd actually say that my biggest concern with these
> messages is whether they can translate into something sane.

Uh, I looked at the pg_ugprade code and the error message is:

pg_fatal("Your installation contains \"contrib/isn\" functions which 
rely on the\n"
 "bigint data type.  Your old and new clusters pass bigint 
values\n"
 "differently so this cluster cannot currently be upgraded.  
You can\n"
 "manually upgrade databases that use \"contrib/isn\" 
facilities and remove\n"
 "\"contrib/isn\" from the old cluster and restart the upgrade. 
 A list of\n"
 "the problem functions is in the file:\n"
 "%s\n\n", output_path);

and the "in database" (which I have changed to capitalized "In database"
in the attached patch), looks like:

   fprintf(script, "In database: %s\n", active_db->db_name);

meaning it _isn't_ an output error message, but rather something that
appears in an error file.  I don't think either of these are translated.
Is that wrong?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
new file mode 100644
index 617270f..e7bf48a
*** a/src/bin/pg_upgrade/check.c
--- b/src/bin/pg_upgrade/check.c
*** check_for_isn_and_int8_passing_mismatch(
*** 858,864 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
--- 858,864 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "In database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
*** check_for_tables_with_oids(ClusterInfo *
*** 937,943 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
--- 937,943 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "In database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
*** check_for_reg_data_type_usage(ClusterInf
*** 1046,1052 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
--- 1046,1052 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "In database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
*** check_for_jsonb_9_4_usage(ClusterInfo *c
*** 1137,1143 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
--- 1137,1143 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "In database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
new file mode 100644
index 0c66d1c..3cbaab6
*** a/src/bin/pg_upgrade/function.c
--- b/src/bin/pg_upgrade/function.c
*** check_loadable_libraries(void)
*** 256,262 
  		}
  
  		if (was_load_failure)
! 			fprintf(script, _("Database: %s\n"),
  	old_cluster.dbarr.dbs[os_info.libraries[libnum].dbnum].db_name);
  	}
  
--- 256,262 
  		}
  
  		if (was_load_failure)
! 			fprintf(script, _("In database: %s\n"),
  	old_cluster.dbarr.dbs[os_info.libraries[libnum].dbnum].db_name);
  	}
  
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
new file mode 100644
index 10cb362..42f1ce7
*** a/src/bin/pg_upgrade/version.c
--- b/src/bin/pg_upgrade/version.c
*** old_9_3_check_for_line_data_type_usage(C
*** 157,163

Re: recovery_min_apply_delay in archive recovery causes assertion failure in latch

2019-10-07 Thread Fujii Masao

On Fri, Oct 4, 2019 at 9:03 PM Fujii Masao  wrote:
>
> On Mon, Sep 30, 2019 at 12:49 AM Fujii Masao  wrote:
> >
> > Hi,
> >
> > I got the following assertion failure when I enabled 
> > recovery_min_apply_delay
> > and started archive recovery (i.e., I put only recovery.signal not
> > standby.signal).
> >
> > TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File:
> > "latch.c", Line: 522)
> >
> > Here is the example to reproduce the issue:
> >
> > 
> > initdb -D data
> > pg_ctl -D data start
> > psql -c "alter system set recovery_min_apply_delay to '60s'"
> > psql -c "alter system set archive_mode to on"
> > psql -c "alter system set archive_command to 'cp %p ../arch/%f'"
> > psql -c "alter system set restore_command to 'cp ../arch/%f %p'"
> > mkdir arch
> > pg_basebackup -D bkp -c fast
> > pgbench -i
> > pgbench -t 1000
> > pg_ctl -D data -m i stop
> > rm -rf bkp/pg_wal
> > mv data/pg_wal bkp
> > rm -rf data
> > mv bkp data
> > touch data/recovery.signal
> > pg_ctl -D data -W start
> > 
> >
> > The latch that causes this assertion failure is recoveryWakeupLatch.
> > The ownership of this latch is taken only when standby mode is
> > requested. But this latch can be used when starting archive recovery
> > with recovery_min_apply_delay set even though it's unowned.
> > So the assertion failure happened.
> >
> > Attached patch fixes this issue by making archive recovery always ignore
> > recovery_min_apply_delay. This change is OK because
> > recovery_min_apply_delay was introduced for standby mode, I think.
>
> No, I found the following description in the doc.
>
> --
> This parameter is intended for use with streaming replication deployments;
> however, if the parameter is specified it will be honored in all cases
> --
>
> So archive recovery with recovery_min_apply_delay enabled would be
> intended configuration. My patch that changes archive recovery so that
> it always ignores thesetting might be in wrong direction. Maybe we should
> make recovery_min_apply_delay work properly even in archive recovery.
> Thought?

Patch attached. This patch allows archive recovery with
recovery_min_apply_delay set, but not crash recovery.

Regards,

-- 
Fujii Masao


allow_recovery_min_apply_delay_during_archive_recovery_v1.patch
Description: Binary data

Re: [PATCH] Add some useful asserts into View Options macroses

2019-10-07 Thread Robert Haas

On Sat, Oct 5, 2019 at 5:23 PM Nikolay Shaplov  wrote:
> This thread is a follow up to the thread 
> https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m where I've 
> been trying to remove StdRdOptions
> structure and replace it with unique structure for each relation kind.
>
> I've decided to split that patch into smaller parts.
>
> This part adds some asserts to ViewOptions macroses.
> Since an option pointer there is converted into (ViewOptions *) it would be
> really good to make sure that this macros is called in proper context, and we
> do the convertation properly. At least when running tests with asserts turned
> on.

Seems like a good idea.  Should we try to do something similar for the
macros in that header file that cast to StdRdOptions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How to retain lesser paths at add_path()?

2019-10-07 Thread Tom Lane

Kohei KaiGai  writes:
> 2019年10月7日(月) 23:44 Robert Haas :
>> But if we want to stick with the ad-hoc method, we could also just
>> have four possible return values: dominates, dominated-by, both, or
>> neither.

> It seems to me this is a bit different from the purpose of this hook.
> I never intend to overwrite existing cost-based decision by this hook.
> The cheapest path at a particular level is the cheapest one regardless
> of the result of this hook. However, this hook enables to prevent
> immediate elimination of a particular path that we (extension) want to
> use it later and may have potentially cheaper cost (e.g; a pair of
> custom GpuAgg + GpuJoin by reduction of DMA cost).

I do not think this will work for the purpose you wish, for the reason
Tomas already pointed out: if you don't also modify the accept_new
decision, there's no guarantee that the path you want will get into
the relation's path list in the first place.

Another problem with trying to manage this only by preventing removals
is that it is likely to lead to keeping extra paths and thereby wasting
planning effort.  What if there's more than one path having the property
you want to keep?

Given the way that add_path works, you really have to think about the
problem as adding an additional figure-of-merit or comparison dimension.
Anything else is going to have some unpleasant behaviors.

> One other design in my mind is, add a callback function pointer on Path
> structure. Only if Path structure has valid pointer (not-NULL), add_path()
> calls extension's own logic to determine whether the Path can be
> eliminated now.

While I'm not necessarily against having a per-path callback, I don't
see how it helps us solve this problem, especially not in the presence
of multiple extensions trying to add different paths.  I do not wish
to see things ending up with extensions saying "don't delete my path
no matter what", because that's going to be costly in terms of later
planning effort.  But I'm not seeing how this wouldn't degenerate to
pretty much that behavior.

regards, tom lane

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 12:34 PM Bruce Momjian  wrote:
> On Mon, Oct  7, 2019 at 12:30:37PM -0400, Robert Haas wrote:
> > On Mon, Oct 7, 2019 at 11:48 AM Bruce Momjian  wrote:
> > > Well, I am starting with the things I _know_ need encrypting, and am
> > > then waiting for others to tell me what to add.   Cybertec has not
> > > provided a list and reasons yet, that I have seen.  This is why I
> > > started this public thread, so we could get a list and agree on it.
> >
> > Well that's fine, but you could also open up the patch and have a look
> > at it. Even if you just looked at which files it modifies, it would
> > enable you to add some important things do your list.
>
> Uh, I am really then just importing what one group decided, which seems
> unsafe.  I think it needs a fresh look at all files.

A fresh look at all files is a good idea, but that doesn't making
looking at the work other people have already done a bad idea.

I don't understand the theory that it's useful to have multiple
100+-message email threads about what we ought to do, but that looking
at the already-written code is not useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Bruce Momjian

On Mon, Oct  7, 2019 at 12:30:37PM -0400, Robert Haas wrote:
> On Mon, Oct 7, 2019 at 11:48 AM Bruce Momjian  wrote:
> > Well, I am starting with the things I _know_ need encrypting, and am
> > then waiting for others to tell me what to add.   Cybertec has not
> > provided a list and reasons yet, that I have seen.  This is why I
> > started this public thread, so we could get a list and agree on it.
> 
> Well that's fine, but you could also open up the patch and have a look
> at it. Even if you just looked at which files it modifies, it would
> enable you to add some important things do your list.

Uh, I am really then just importing what one group decided, which seems
unsafe.  I think it needs a fresh look at all files.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 11:48 AM Bruce Momjian  wrote:
> Well, I am starting with the things I _know_ need encrypting, and am
> then waiting for others to tell me what to add.   Cybertec has not
> provided a list and reasons yet, that I have seen.  This is why I
> started this public thread, so we could get a list and agree on it.

Well that's fine, but you could also open up the patch and have a look
at it. Even if you just looked at which files it modifies, it would
enable you to add some important things do your list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: abort-time portal cleanup

2019-10-07 Thread Robert Haas

On Tue, Sep 24, 2019 at 6:34 AM Dilip Kumar  wrote:
> On Fri, Sep 13, 2019 at 2:13 AM Robert Haas  wrote:
> >
> /*
> * Otherwise, do nothing to cursors held over from a previous
> * transaction.
> */
> if (portal->createSubid == InvalidSubTransactionId)
> continue;
>
> /*
> * Do nothing to auto-held cursors.  This is similar to the case of a
> * cursor from a previous transaction, but it could also be that the
> * cursor was auto-held in this transaction, so it wants to live on.
> */
> if (portal->autoHeld)
> continue;
>
> I have one doubt that why do we need the second check. Because before
> setting portal->autoHeld to true we always call HoldPortal therein we
> set portal->createSubid to InvalidSubTransactionId.  So it seems to me
> that the second condition will never reach.  Am I missing something?

Not that I can see, but I don't necessarily think this patch needs to
change it, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: abort-time portal cleanup

2019-10-07 Thread Robert Haas

On Tue, Sep 24, 2019 at 5:31 AM Amit Kapila  wrote:
> After this cleanup, I think we don't need At(Sub)Abort_Portals in
> AbortOutOfAnyTransaction() for the states TBLOCK_(SUB)ABORT and
> friends. This is because AbortTransaction itself would have zapped the
> portal.

Not if the ROLLBACK itself failed - in that case, the portal would
have been active at the time, and thus not subject to removal. And, as
the existing comments in xact.c state, that's exactly why that
function call is there.

> 2. You seem to forgot removing AtCleanup_Portals() from portal.h

Oops. Fixed in the attached version.

> - if (portal->status == PORTAL_READY)
> - MarkPortalFailed(portal);
>
> Why it is safe to remove this check?  It has been explained in commit
> 7981c342 why we need that check.  I don't see any explanation in email
> or patch which justifies this code removal.  Is it because you removed
> PortalCleanup?  If so, that is still called from PortalDrop?

All MarkPortalFailed() does is change the status to PORTAL_FAILED and
call the cleanup hook. PortalDrop() calls the cleanup hook, and we
don't need to change the status if we're removing it completely.

> 4.
> + * If the status is PORTAL_ACTIVE, then we must be
> executing a command
> + * that uses multiple transactions internally. In that
> case, the
> + * command in question must be one that does not
> internally rely on
> + * any transaction-lifetime resources, because they
> would disappear
> + * in the upcoming transaction-wide cleanup.
>   */
>   if (portal->status == PORTAL_ACTIVE)
>
> I am not able to understand how we can reach with the portal state as
> 'active' for a multi-transaction command.  It seems wherever we mark
> portal as active, we don't relinquish the control unless its state is
> changed.  Can you share some example where this can happen?

Yeah -- a plpgsql function or procedure that does "ROLLBACK;"
internally. The calling code doesn't relinquish control, but it does
reach AbortTransaction().

If you want to see it happen, just put an elog() inside that block and
run make -C src/pl/plpgsql check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v2-0001-Improve-handling-of-portals-after-sub-transaction.patch
Description: Binary data

Re: How to retain lesser paths at add_path()?

2019-10-07 Thread Kohei KaiGai

2019年10月7日(月) 23:44 Robert Haas :
>
> On Mon, Oct 7, 2019 at 9:56 AM Tom Lane  wrote:
> > We could imagine, maybe, that a hook for the purpose of allowing an
> > additional dimension to be considered would be essentially a path
> > comparison function, returning -1, +1, or 0 depending on whether
> > path A is dominated by path B (on this new dimension), dominates
> > path B, or neither.  However, I do not see how multiple extensions
> > could usefully share use of such a hook.
>
> Typically, we support hook-sharing mostly by ad-hoc methods: when
> installing a hook, you remember the previous value of the function
> pointer, and arrange to call that function yourself.  That's not a
> great method.  One problem with it is that you can't reasonably
> uninstall a hook function, which would be a nice thing to be able to
> do. We could do better by reusing the technique from on_shmem_exit or
> RegisterXactCallbacks: keep an array of pointers, and call them in
> order. I wish we'd retrofit all of our hooks to work more like that;
> being able to unload shared libraries would be a good feature.
>
> But if we want to stick with the ad-hoc method, we could also just
> have four possible return values: dominates, dominated-by, both, or
> neither.
>
It seems to me this is a bit different from the purpose of this hook.
I never intend to overwrite existing cost-based decision by this hook.
The cheapest path at a particular level is the cheapest one regardless
of the result of this hook. However, this hook enables to prevent
immediate elimination of a particular path that we (extension) want to
use it later and may have potentially cheaper cost (e.g; a pair of
custom GpuAgg + GpuJoin by reduction of DMA cost).

So, I think expected behavior when multiple extensions would use
this hook is clear. If any of call-chain on the hook wanted to preserve
the path, it should be kept on the pathlist. (Of couse, it is not a cheapest
one)

> Still, this doesn't feel like very scalable paradigm, because this
> code gets called a lot.  Unless both calling the hook functions and
> the hook functions themselves are dirt-cheap, it's going to hurt, and
> TBH, I wonder if even the cost of detecting that the hook is unused
> might be material.
>
> I wonder whether we might get a nicer solution to this problem if our
> method of managing paths looked less something invented by a LISP
> hacker, but I don't have a specific proposal in mind.
>
One other design in my mind is, add a callback function pointer on Path
structure. Only if Path structure has valid pointer (not-NULL), add_path()
calls extension's own logic to determine whether the Path can be
eliminated now.
This design may minimize the number of callback invocation.

One potential downside of this approach is, function pointer makes
hard to implement readfuncs of Path nodes, even though we have
no read handler of them, right now.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: How to retain lesser paths at add_path()?

2019-10-07 Thread Tom Lane

Robert Haas  writes:
> On Mon, Oct 7, 2019 at 9:56 AM Tom Lane  wrote:
>> We could imagine, maybe, that a hook for the purpose of allowing an
>> additional dimension to be considered would be essentially a path
>> comparison function, returning -1, +1, or 0 depending on whether
>> path A is dominated by path B (on this new dimension), dominates
>> path B, or neither.  However, I do not see how multiple extensions
>> could usefully share use of such a hook.

> ... if we want to stick with the ad-hoc method, we could also just
> have four possible return values: dominates, dominated-by, both, or
> neither.

Right, and then *each* user of the hook would have to be prepared
to merge its result with the result from the previous user(s),
which is a complicated bit of logic that somebody would surely
get wrong, especially if (a) there's no prototype to copy from
and (b) testing only their own extension would not exercise it.

[ thinks a bit... ]  Maybe that could be improved if we can express
the result as a bitmask, defined in such a way that OR'ing (or maybe
AND'ing? haven't worked it out) the results from different comparisons
does the right thing.

> Still, this doesn't feel like very scalable paradigm, because this
> code gets called a lot.  Unless both calling the hook functions and
> the hook functions themselves are dirt-cheap, it's going to hurt, and
> TBH, I wonder if even the cost of detecting that the hook is unused
> might be material.

Yeah, I'm worried about that too.  This is quite a hot code path,
and so I don't think we can just assume that changes are free.
Still, if we could come up with a cleaner paradigm, maybe we could
buy back a few cycles in the core-code comparison logic, and thus
not come out behind from adding a hook.

regards, tom lane

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Bruce Momjian

On Mon, Oct  7, 2019 at 11:26:24AM -0400, Robert Haas wrote:
> On Mon, Oct 7, 2019 at 11:02 AM Bruce Momjian  wrote:
> > For clog, it is not append-only, and bytes are rewritten (from zero to
> > non-zero), so there would have to be a new nonce for every clog file
> > write to the file system.  We can store the nonce in a separate file,
> > but the clog contents and nonce would have to be always synchronized or
> > the file could not be properly read.  Basically every file we want to
> > encrypt, needs this kind of study.
> 
> Yeah. It's a big problem/project.
> 
> Another approach to this problem would be to adjust the block format
> to leave room for the nonce. If encryption is not in use, then those
> bytes would just be zeroed or something. That would make upgrading a
> bit tricky, but pg_upgrade could be taught to do the necessary
> conversions for SLRUs without too much pain, I think.

Yes, that is exactly the complexity we have deal with, both in terms of
code complexity, reliability, and future maintenance.  Currently the
file format is unchanged, but as we add more encrypted files, we might
need to change it.  Fortunately, I think heap/index files don't need to
change, so pg_upgrade will not require changes.

> In my opinion, it is desirable to maintain as much consistency as
> possible between what we store on disk in the encrypted case and what
> we store on disk in the not-encrypted case.  If we have to add
> additional forks in the encrypted case, or change the file of the
> format and not just the contents, it seems likely to add complexity
> and bugs that we might be able to avoid via another approach.

Agreed.

> > > In other words: maybe I'm wrong here, but it looks to me like we're
> > > laboriously reinventing the wheel when we could be working on
> > > improving the working prototype.
> >
> > The work being done is building on that prototype.
> 
> That's good, but then I'm puzzled as to why your list of things to
> encrypt doesn't include all the things it already covers.

Well, I am starting with the things I _know_ need encrypting, and am
then waiting for others to tell me what to add.   Cybertec has not
provided a list and reasons yet, that I have seen.  This is why I
started this public thread, so we could get a list and agree on it.

FYI, I realize this is all very complex, and requires cryptography and
server internals knowledge.  I am happy to discuss it via voice with
anyone.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Standby accepts recovery_target_timeline setting?

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 9:14 AM Fujii Masao  wrote:
> Agreed, too. Do you have any idea to implement that? I've not found out
> "smart" way to do that yet.
>
> One idea is, as Michael suggested, to use SetConfigOption() for all the
> archive recovery parameters at the beginning of the startup process as 
> follows,
> to forcibly set the default values if crash recovery is running. But this
> seems not smart for me.
>
> SetConfigOption("restore_command", ...);
> SetConfigOption("archive_cleanup_command", ...);
> SetConfigOption("recovery_end_command", ...);
> ...
>
> Maybe we should make GUC mechanism notice signal files and ignore
> archive recovery-related parameters when none of those files exist?
> This change seems overkill at least in v12, though.

I think this approach is going in the wrong direction. In every other
part of the system, it's the job of the code around the GUC system to
use parameters when they're relevant and ignore them when they should
be ignored. Deciding that the parameters that were formerly part of
recovery.conf are an exception to that rule and that the GUC system is
responsible for making sure they're set only when we pay attention to
them seems like it's bringing back or exacerbating a code-level split
between recovery.conf parameters and postgresql.conf parameters when,
meanwhile, we've been wanting to eradicate that split so that the
things we allow for postgresql.conf parameters -- e.g. changing them
while they are running -- can be applied to these parameters also.

I don't particularly like the use of SetConfigOption() either,
although it does have some precedent in autovacuum, for example.
Generally, it's right and proper that the GUC system sets the
variables to which the parameters it controls are tied -- and then the
rest of the code has to do the right thing around that. It sounds like
the patch that got rid of recovery.conf wasn't considered carefully
enough, and missed the fact that it was introducing some inadvertent
behavior changes. That's too bad, but let's not overreact. It seems
totally fine to me to just add ad-hoc checks that rule out
inappropriately relying on these parameters while performing crash
recovery - and be done with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 11:02 AM Bruce Momjian  wrote:
> For clog, it is not append-only, and bytes are rewritten (from zero to
> non-zero), so there would have to be a new nonce for every clog file
> write to the file system.  We can store the nonce in a separate file,
> but the clog contents and nonce would have to be always synchronized or
> the file could not be properly read.  Basically every file we want to
> encrypt, needs this kind of study.

Yeah. It's a big problem/project.

Another approach to this problem would be to adjust the block format
to leave room for the nonce. If encryption is not in use, then those
bytes would just be zeroed or something. That would make upgrading a
bit tricky, but pg_upgrade could be taught to do the necessary
conversions for SLRUs without too much pain, I think.

In my opinion, it is desirable to maintain as much consistency as
possible between what we store on disk in the encrypted case and what
we store on disk in the not-encrypted case.  If we have to add
additional forks in the encrypted case, or change the file of the
format and not just the contents, it seems likely to add complexity
and bugs that we might be able to avoid via another approach.

> > In other words: maybe I'm wrong here, but it looks to me like we're
> > laboriously reinventing the wheel when we could be working on
> > improving the working prototype.
>
> The work being done is building on that prototype.

That's good, but then I'm puzzled as to why your list of things to
encrypt doesn't include all the things it already covers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Non-Active links being referred in our source code

2019-10-07 Thread Juan José Santamaría Flecha

On Sun, Oct 6, 2019 at 9:41 AM Michael Paquier  wrote:
>
> On Sun, Oct 06, 2019 at 09:06:44AM +0530, vignesh C wrote:
> > I could not find the equivalent links for the same.
> > Should we update the links for the same?
>
> If it could be possible to find equivalent links which could update
> to, it would be nice.

About the broken links in win32_port.h, they are all referring to
ntstatus. As for first case that shows the code groups, there is an up
to date alternative. There is also an alternative for second case that
points to their codes and descriptions. On the other hand, the last
case is quoting a document that is no longer available, I would
suggest to rephrase the comment, thus eliminating the quote.

Please find attached a patch with the proposed alternatives.

Regards,

Juan José Santamaría Flecha

0001-Non-Active-links-in-win32_port.patch
Description: Binary data

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Bruce Momjian

On Mon, Oct  7, 2019 at 09:44:30AM -0400, Robert Haas wrote:
> On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian  wrote:
> > We spend a lot of time figuring out exactly how to safely encrypt WAL,
> > heap, index, and pgsql_tmp files.   The idea of doing this for another
> > 20 types of files --- to find a safe nonce, to be sure a file rewrite
> > doesn't reuse the nonce, figuring the API, crash recovery, forensics,
> > tool interface --- is something I would like to avoid.  I want to avoid
> > it not because I don't like work, but because I am afraid the code
> > impact and fragility will doom the feature.
> 
> I'm concerned about that, too, but there's no getting around the fact
> that there are a bunch of types of files and that they do all need to
> be dealt with. If we have a good scheme for doing that, hopefully
> extending it to additional types of files is not that bad, which would
> then spare us the trouble of arguing about each one individually, and
> also be more secure.

Well, do to encryption properly, there is the requirement of the nonce. 
If you ever rewrite a bit, you technically have to have a new nonce. 
For WAL, since it is append-only, you can use the WAL file name.  For
heap/index files, we change the LSN on every rewrite (with
wal_log_hints=on), and we never use the same LSN for writing multiple
relations, so LSN+page-offset is a sufficient nonce.

For clog, it is not append-only, and bytes are rewritten (from zero to
non-zero), so there would have to be a new nonce for every clog file
write to the file system.  We can store the nonce in a separate file,
but the clog contents and nonce would have to be always synchronized or
the file could not be properly read.  Basically every file we want to
encrypt, needs this kind of study.

> As I also said to Stephen, the people who are discussing this here
> should *really really really* be looking at the Cybertec patch instead
> of trying to invent everything from scratch - unless that patch has,

Someone from Cybertec is on the voice calls we have, and is actively
involved.

> like, typhoid, or something, in which case please let me know so that
> I, too, can avoid looking at it. Even if you wanted to use 0% of the
> code, you could look at the list of file types that they consider
> encrypting and think about whether you agree with the decisions they
> made. I suspect that you would quickly find that you've left some
> things out of your list. In fact, I can think of a couple pretty clear
> examples, like the stats files, which clearly contain user data.

I am asking here because I don't think the Cybertec approach has gotten
enough study compared to what this group can contribute.

> Another reason that you should go look at that patch is because it
> actually tries to grapple with the exact problem that you're worrying
> about in the abstract: there are a LOT of different kinds of files and
> they all need to be handled somehow. Even if you can convince yourself
> that things like pg_clog don't need encryption, which I think is a
> pretty tough sell, there are LOT of file types that directly contain
> user data and do need to be handled. A lot of the code that writes
> those various types of files is pretty ad-hoc. It doesn't necessarily
> do nice things like build up a block of data and then write it out
> together; it may for example write a byte a time. That's not going to
> work well for encryption, I think, so the Cybertec patch changes that

Actually, byte-at-a-time works fine with CTR mode, though that mode is
very sensitive to the reuse of the nonce since the user data is not part
of the input for future encryption blocks.

> stuff around. I personally don't think that the patch does that in a
> way that is sufficiently clean and carefully considered for it to be
> integrated into core, and my plan had been to work on that with the
> patch authors.
> 
> However, that plan has been somewhat derailed by the fact that we now
> have hundreds of emails arguing about the design, because I don't want
> to be trying to push water up a hill if everyone else is going in a
> different direction. It looks to me, though, like we haven't really
> gotten beyond the point where that patch already was. The issues of
> nonce and many file types have already been thought about carefully
> there. I rather suspect that they did not get it all right. But, it
> seems to me that it would be a lot more useful to look at the code
> actually written and think about what it gets right and wrong than to
> discuss these points as a strictly theoretical matter.
> 
> In other words: maybe I'm wrong here, but it looks to me like we're
> laboriously reinventing the wheel when we could be working on
> improving the working prototype.

The work being done is building on that prototype.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+

Re: Revert back to standard AC_STRUCT_TIMEZONE Autoconf macro

2019-10-07 Thread Peter Eisentraut

On 2019-10-02 07:30, Peter Eisentraut wrote:
> On 2019-09-30 21:36, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> Instead of AC_STRUCT_TIMEZONE we use our own variant called
>>> PGAC_STRUCT_TIMEZONE that checks for tzname even if other variants were
>>> found first.  But since 63bd0db12199c5df043e1dea0f2b574f622b3a4c we
>>> don't use tzname anymore, so we don't need this anymore.
>>
>> Hmm.  I wonder if we need AC_STRUCT_TIMEZONE either?  Seems like
>> we should only be using our own struct pg_tm.
> 
> There are a few places that seem to need it, such as initdb/findtimezone.c.
> 
>> If we could get
>> rid of that configure macro altogether, we could remove some dubious
>> junk like plpython.h's "#undef HAVE_TZNAME".
> 
> We could keep just the part of AC_STRUCT_TIMEZONE that we need, namely
> the check for tm_zone, and remove the part about tzname.
> 
> New patch attached.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: How to retain lesser paths at add_path()?

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 9:56 AM Tom Lane  wrote:
> We could imagine, maybe, that a hook for the purpose of allowing an
> additional dimension to be considered would be essentially a path
> comparison function, returning -1, +1, or 0 depending on whether
> path A is dominated by path B (on this new dimension), dominates
> path B, or neither.  However, I do not see how multiple extensions
> could usefully share use of such a hook.

Typically, we support hook-sharing mostly by ad-hoc methods: when
installing a hook, you remember the previous value of the function
pointer, and arrange to call that function yourself.  That's not a
great method.  One problem with it is that you can't reasonably
uninstall a hook function, which would be a nice thing to be able to
do. We could do better by reusing the technique from on_shmem_exit or
RegisterXactCallbacks: keep an array of pointers, and call them in
order. I wish we'd retrofit all of our hooks to work more like that;
being able to unload shared libraries would be a good feature.

But if we want to stick with the ad-hoc method, we could also just
have four possible return values: dominates, dominated-by, both, or
neither.

Still, this doesn't feel like very scalable paradigm, because this
code gets called a lot.  Unless both calling the hook functions and
the hook functions themselves are dirt-cheap, it's going to hurt, and
TBH, I wonder if even the cost of detecting that the hook is unused
might be material.

I wonder whether we might get a nicer solution to this problem if our
method of managing paths looked less something invented by a LISP
hacker, but I don't have a specific proposal in mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Bruce Momjian

On Sat, Oct  5, 2019 at 09:13:59PM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 08:14:44PM -0400, Bruce Momjian wrote:
> > On Sat, Oct  5, 2019 at 12:54:35AM +0200, Tomas Vondra wrote:
> > > On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
> > > > For full-cluster TDE with AES-NI-enabled, the performance impact is
> > > > usually ~4%, so doing anything more granular doesn't seem useful.  See
> > > > this PGCon presentation with charts:
> > > >
> > > > https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
> > > >
> > > > Having anthing more fine-grained that all-cluster didn't seem worth it.
> > > > Using per-user keys is useful, but also much harder to implement.
> > > >
> > > 
> > > Not sure I follow. I thought you are asking why Oracle apparently does
> > > not leverage AES-NI for column-level encryption (at least according to
> > > the document I linked)? And I don't know why that's the case.
> > 
> > No, I read it as Oracle saying that there isn't much value to per-column
> > encryption if you have crypto hardware acceleration, because the
> > all-cluster encryption overhead is so minor.
> > 
> 
> So essentially the argument is - if you have hw crypto acceleration (aka
> AES-NI), then the overhead of all-cluster encryption is so low it does
> not make sense to bother with lowering it with column encryption.

Yes, I think that is true.  Column-level encryption can be useful in
giving different people control of the keys, but I think that feature
should be developed at the SQL level so clients can unlock the key and
backups include the encryption keys.

> IMO that's a good argument against column encryption (at least when used
> to reduce overhead), although 10% still quite a bit.

I think that test was a worst-case one and I think it needs to be
optimized before we draw any conclusions.

> But I'm not sure it's what the document is saying. I'm sure if they
> could, they'd use AES-NI even for column encryption, to make it more
> efficient. Because why wouldn't you do that? But the doc explicitly
> says:
> 
>Hardware cryptographic acceleration for TDE column encryption is
>not supported.

Oh, wow, that is something!

> So there has to be a reason why that's not supported. Either there's
> something that prevents this mode from using AES-NI at all, or it simply
> can't be sped-up.

Yeah, good question.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: How to retain lesser paths at add_path()?

2019-10-07 Thread Tom Lane

Robert Haas  writes:
> On Sun, Oct 6, 2019 at 3:23 PM Tomas Vondra
>  wrote:
>> Can you be more specific why you don't think this approach is not
>> useful? I'm not sure whether you consider all hooks to have this issue
>> or just this proposed one.

> I'll start by admitting that that remark was rather off-the-cuff.  On
> further reflection, add_path() is not really a crazy place to try to
> add a new dimension of merit, which is really what KaiGai wants to do
> here.  On the other hand, as Tom and I noted upthread, that system is
> creaking under its weight as it is, and making it extensible seems
> like it might therefore be a bad idea - specifically, because it might
> slow down planning quite a bit on large join problems, either because
> of the additional cost testing the additional dimension of merit or
> because of the additional cost of dealing with the extra paths that
> get kept.

FWIW, I think that the patch as proposed would most certainly have
nasty performance problems.  To make intelligent decisions, the
hook function would basically have to re-do a large fraction of
the calculations that add_path itself does.  It's also fairly
unclear (or at least undocumented) how things would work if multiple
path providers wanted to make use of the hook; except that that
performance issue would get worse, as each of them redoes that work.

Also, as Robert says, the real goal here is to allow some additional
comparison dimension to be considered.  Simply preventing pre-existing
paths from being removed isn't sufficient for that --- you have to be
able to change the accept_new decisions as well, as Tomas already
worried about.  But if we phrase that as an additional hook that
only concerns itself with accept_new, then the duplicate-calculation
problem gets really substantially worse: I think actual use of a hook
like that would require reconsidering the entire existing path list.

I'm not very sure what a good design for adding new comparison dimensions
would look like, but I think this isn't it.

We could imagine, maybe, that a hook for the purpose of allowing an
additional dimension to be considered would be essentially a path
comparison function, returning -1, +1, or 0 depending on whether
path A is dominated by path B (on this new dimension), dominates
path B, or neither.  However, I do not see how multiple extensions
could usefully share use of such a hook.

regards, tom lane

Re: WIP/PoC for parallel backup

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 9:43 AM Ibrar Ahmed  wrote:
> What about have an API to get the single file or list of files? We will use a 
> single file in
> our application and other tools can get the benefit of list of files.

That sounds a bit speculative to me. Who is to say that anyone will
find that useful? I mean, I think it's fine and good to build the
functionality that we need in a way that maximizes the likelihood that
other tools can reuse that functionality, and I think we should do
that. But I don't think it's smart to build functionality that we
don't really need in the hope that somebody else will find it useful
unless we're pretty sure that they actually will. I don't see that as
being the case here; YMMV.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-07 Thread Robert Haas

On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian  wrote:
> We spend a lot of time figuring out exactly how to safely encrypt WAL,
> heap, index, and pgsql_tmp files.   The idea of doing this for another
> 20 types of files --- to find a safe nonce, to be sure a file rewrite
> doesn't reuse the nonce, figuring the API, crash recovery, forensics,
> tool interface --- is something I would like to avoid.  I want to avoid
> it not because I don't like work, but because I am afraid the code
> impact and fragility will doom the feature.

I'm concerned about that, too, but there's no getting around the fact
that there are a bunch of types of files and that they do all need to
be dealt with. If we have a good scheme for doing that, hopefully
extending it to additional types of files is not that bad, which would
then spare us the trouble of arguing about each one individually, and
also be more secure.

As I also said to Stephen, the people who are discussing this here
should *really really really* be looking at the Cybertec patch instead
of trying to invent everything from scratch - unless that patch has,
like, typhoid, or something, in which case please let me know so that
I, too, can avoid looking at it. Even if you wanted to use 0% of the
code, you could look at the list of file types that they consider
encrypting and think about whether you agree with the decisions they
made. I suspect that you would quickly find that you've left some
things out of your list. In fact, I can think of a couple pretty clear
examples, like the stats files, which clearly contain user data.

Another reason that you should go look at that patch is because it
actually tries to grapple with the exact problem that you're worrying
about in the abstract: there are a LOT of different kinds of files and
they all need to be handled somehow. Even if you can convince yourself
that things like pg_clog don't need encryption, which I think is a
pretty tough sell, there are LOT of file types that directly contain
user data and do need to be handled. A lot of the code that writes
those various types of files is pretty ad-hoc. It doesn't necessarily
do nice things like build up a block of data and then write it out
together; it may for example write a byte a time. That's not going to
work well for encryption, I think, so the Cybertec patch changes that
stuff around. I personally don't think that the patch does that in a
way that is sufficiently clean and carefully considered for it to be
integrated into core, and my plan had been to work on that with the
patch authors.

However, that plan has been somewhat derailed by the fact that we now
have hundreds of emails arguing about the design, because I don't want
to be trying to push water up a hill if everyone else is going in a
different direction. It looks to me, though, like we haven't really
gotten beyond the point where that patch already was. The issues of
nonce and many file types have already been thought about carefully
there. I rather suspect that they did not get it all right. But, it
seems to me that it would be a lot more useful to look at the code
actually written and think about what it gets right and wrong than to
discuss these points as a strictly theoretical matter.

In other words: maybe I'm wrong here, but it looks to me like we're
laboriously reinventing the wheel when we could be working on
improving the working prototype.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

2019-10-07 Thread Ibrar Ahmed

On Mon, Oct 7, 2019 at 6:06 PM Robert Haas  wrote:

> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman  wrote:
> > Sure. Though the backup manifest patch calculates and includes the
> checksum of backup files and is done
> > while the file is being transferred to the frontend-end. The manifest
> file itself is copied at the
> > very end of the backup. In parallel backup, I need the list of filenames
> before file contents are transferred, in
> > order to divide them into multiple workers. For that, the manifest file
> has to be available when START_BACKUP
> >  is called.
> >
> > That means, backup manifest should support its creation while excluding
> the checksum during START_BACKUP().
> > I also need the directory information as well for two reasons:
> >
> > - In plain format, base path has to exist before we can write the file.
> we can extract the base path from the file
> > but doing that for all files does not seem a good idea.
> > - base backup does not include the content of some directories but those
> directories although empty, are still
> > expected in PGDATA.
> >
> > I can make these changes part of parallel backup (which would be on top
> of backup manifest patch) or
> > these changes can be done as part of manifest patch and then parallel
> can use them.
> >
> > Robert what do you suggest?
>
> I think we should probably not use backup manifests here, actually. I
> initially thought that would be a good idea, but after further thought
> it seems like it just complicates the code to no real benefit.  I
> suggest that the START_BACKUP command just return a result set, like a
> query, with perhaps four columns: file name, file type ('d' for
> directory or 'f' for file), file size, file mtime. pg_basebackup will
> ignore the mtime, but some other tools might find that useful
> information.
>
> I wonder if we should also split START_BACKUP (which should enter
> non-exclusive backup mode) from GET_FILE_LIST, in case some other
> client program wants to use one of those but not the other.  I think
> that's probably a good idea, but not sure.
>
> I still think that the files should be requested one at a time, not a
> huge long list in a single command.
>

What about have an API to get the single file or list of files? We will use
a single file in
our application and other tools can get the benefit of list of files.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
Ibrar Ahmed

Re: WIP/PoC for parallel backup

2019-10-07 Thread Asif Rehman

On Mon, Oct 7, 2019 at 6:05 PM Robert Haas  wrote:

> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman  wrote:
> > Sure. Though the backup manifest patch calculates and includes the
> checksum of backup files and is done
> > while the file is being transferred to the frontend-end. The manifest
> file itself is copied at the
> > very end of the backup. In parallel backup, I need the list of filenames
> before file contents are transferred, in
> > order to divide them into multiple workers. For that, the manifest file
> has to be available when START_BACKUP
> >  is called.
> >
> > That means, backup manifest should support its creation while excluding
> the checksum during START_BACKUP().
> > I also need the directory information as well for two reasons:
> >
> > - In plain format, base path has to exist before we can write the file.
> we can extract the base path from the file
> > but doing that for all files does not seem a good idea.
> > - base backup does not include the content of some directories but those
> directories although empty, are still
> > expected in PGDATA.
> >
> > I can make these changes part of parallel backup (which would be on top
> of backup manifest patch) or
> > these changes can be done as part of manifest patch and then parallel
> can use them.
> >
> > Robert what do you suggest?
>
> I think we should probably not use backup manifests here, actually. I
> initially thought that would be a good idea, but after further thought
> it seems like it just complicates the code to no real benefit.


Okay.


>   I
> suggest that the START_BACKUP command just return a result set, like a
> query, with perhaps four columns: file name, file type ('d' for
> directory or 'f' for file), file size, file mtime. pg_basebackup will
> ignore the mtime, but some other tools might find that useful
> information.
>
yes current patch already returns the result set. will add the additional
information.


> I wonder if we should also split START_BACKUP (which should enter
> non-exclusive backup mode) from GET_FILE_LIST, in case some other
> client program wants to use one of those but not the other.  I think
> that's probably a good idea, but not sure.
>

Currently pg_basebackup does not enter in exclusive backup mode and other
tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that. Since
we are breaking
backup into multiple command, I believe it would be a good idea to have
this option. I will include
it in next revision of this patch.


>
> I still think that the files should be requested one at a time, not a
> huge long list in a single command.
>
sure, will make the change.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: Standby accepts recovery_target_timeline setting?

2019-10-07 Thread Fujii Masao

On Fri, Oct 4, 2019 at 6:09 PM Michael Paquier  wrote:
>
> On Wed, Oct 02, 2019 at 03:30:38AM -0400, Stephen Frost wrote:
> > * David Steele (da...@pgmasters.net) wrote:
> >> On 9/28/19 1:26 PM, Fujii Masao wrote:
> >>> There might be some recovery parameters that we can safely use
> >>> even in crash recovery, e.g., maybe recovery_end_command
> >>> (now, you can see that recovery_end_command is executed in
> >>> crash recovery in v12). But at this stage of v12, it's worth thinking to
> >>> just cause crash recovery to exit with an error when any recovery
> >>> parameter is set. Thought?
> >>
> >> I dislike the idea of crash recovery throwing fatal errors because there
> >> are recovery settings in postgresql.auto.conf.  Since there is no
> >> defined mechanism for cleaning out old recovery settings we have to
> >> assume that they will persist (and accumulate) more or less forever.
>
> Yeah, I don't think that's a good thing either.  That's a recipe to
> make the user experience more confusing.
>
> >>> Or if that change is overkill, alternatively we can make crash recovery
> >>> "ignore" any recovery parameters, e.g., by forcibly disabling
> >>> the parameters.
> >>
> >> I'd rather load recovery settings *only* if recovery.signal or
> >> standby.signal is present and do this only after crash recovery is
> >> complete, i.e. in the absence of backup_label.
> >>
> >> I think blindly loading recovery settings then trying to ignore them
> >> later is pretty much why we are having these issues in the first place.
> >>  I'd rather not extend that pattern if possible.
> >
> > Agreed.

Agreed, too. Do you have any idea to implement that? I've not found out
"smart" way to do that yet.

One idea is, as Michael suggested, to use SetConfigOption() for all the
archive recovery parameters at the beginning of the startup process as follows,
to forcibly set the default values if crash recovery is running. But this
seems not smart for me.

SetConfigOption("restore_command", ...);
SetConfigOption("archive_cleanup_command", ...);
SetConfigOption("recovery_end_command", ...);
...

Maybe we should make GUC mechanism notice signal files and ignore
archive recovery-related parameters when none of those files exist?
This change seems overkill at least in v12, though.

Regards,

-- 
Fujii Masao

Re: Shared memory

2019-10-07 Thread Robert Haas

On Fri, Oct 4, 2019 at 5:51 AM Natarajan R  wrote:
> Why postgres not providing freeing shared memory?

Because it's intended to be used mostly for data structures that live
for the entire server lifetime.

There are some cases, such as various hash tables, where the number of
entries can grow and shrink over time. It might be useful to return
memory that is freed up when the hash table shrinks to the common
pool, but it would be complex, because then we'd have to keep track of
multiple chunks of freed memory and consolidate adjacent chunks and so
forth. I don't see that we'd be likely to get much benefit from such a
system, since a lot of cases memory fragmentation would prevent us
from getting any real benefit.

If you need a shared data structure that is temporary, you might want
to check out DSM and DSA.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

2019-10-07 Thread Robert Haas

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman  wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum 
> of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file 
> itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames 
> before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has 
> to be available when START_BACKUP
>  is called.
>
> That means, backup manifest should support its creation while excluding the 
> checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we 
> can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those 
> directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of 
> backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use 
> them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.  I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other.  I think
that's probably a good idea, but not sure.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: stress test for parallel workers

2019-10-07 Thread Robert Haas

On Tue, Jul 23, 2019 at 7:29 PM Tom Lane  wrote:
> Parallel workers aren't ever allowed to write, in the current
> implementation, so it's not real obvious why they'd have any
> WAL log files open at all.

Parallel workers are not forbidden to write WAL, nor are they
forbidden to modify blocks. They could legally HOT-prune, for example,
though I'm not positive whether they actually do.

The prohibition is at a higher level: they can't create new tuples or
delete existing ones.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

2019-10-07 Thread Asif Rehman

On Mon, Oct 7, 2019 at 1:52 PM Rushabh Lathia 
wrote:

> Thanks Asif for the patch.  I am opting this for a review.  Patch is
> bit big, so here are very initial comments to make the review process
> easier.
>

Thanks Rushabh for reviewing the patch.

> 1) Patch seems doing lot of code shuffling, I think it would be easy
> to review if you can break the clean up patch separately.
>
> Example:
> a: setup_throttle
> b: include_wal_files
>
> 2) As I can see this patch basically have three major phase.
>
> a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
> STOP_BACKUP.
> b) Implementation of actual parallel backup.
> c) Testcase
>
> I would suggest, if you can break out in three as a separate patch that
> would be nice.  It will benefit in reviewing the patch.
>

Sure, why not. I will break them into multiple patches.

>
> 3) In your patch you are preparing the backup manifest (file which
> giving the information about the data files). Robert Haas, submitted
> the backup manifests patch on another thread [1], and I think we
> should use that patch to get the backup manifests for parallel backup.
>

Sure. Though the backup manifest patch calculates and includes the checksum
of backup files and is done
while the file is being transferred to the frontend-end. The manifest file
itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames
before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file has
to be available when START_BACKUP
 is called.

That means, backup manifest should support its creation while excluding the
checksum during START_BACKUP().
I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file. we
can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those
directories although empty, are still
expected in PGDATA.

I can make these changes part of parallel backup (which would be on top of
backup manifest patch) or
these changes can be done as part of manifest patch and then parallel can
use them.

Robert what do you suggest?

-- 
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-07 Thread Alexey Kondratov


On 07.10.2019 4:06, Michael Paquier wrote:

On Fri, Oct 04, 2019 at 05:21:25PM +0300, Alexey Kondratov wrote:

I've checked your patch, but it seems that it cannot be applied as is, since
it e.g. adds a comment to 005_same_timeline.pl without actually changing the
test. So I've slightly modified your patch and tried to fit both dry-run and
ensureCleanShutdown testing together. It works just fine and fails
immediately if any of recent fixes is reverted. I still think that dry-run
testing is worth adding, since it helped to catch this v12 refactoring
issue, but feel free to throw it way if it isn't commitable right now, of
course.

I can guarantee the last patch I sent can be applied on top of HEAD:
https://www.postgresql.org/message-id/20191004083721.ga1...@paquier.xyz


Yes, it did, but my comment was about these lines:

diff --git a/src/bin/pg_rewind/t/005_same_timeline.pl 
b/src/bin/pg_rewind/t/005_same_timeline.pl

index 40dbc44caa..df469d3939 100644
--- a/src/bin/pg_rewind/t/005_same_timeline.pl
+++ b/src/bin/pg_rewind/t/005_same_timeline.pl
@@ -1,3 +1,7 @@
+#
+# Test that running pg_rewind with the source and target clusters
+# on the same timeline runs successfully.
+#

You have added this new comment section, but kept the old one, which was 
pretty much the same [1].



Regarding the rest, I have hacked my way through as per the attached.
The previous set of patches did the following, which looked either
overkill or not necessary:
- Why running test 005 with the remote mode?


OK, it was definitely an overkill, since remote control file fetch will 
be also tested in any other remote test case.



- --dry-run coverage is basically the same with the local and remote
modes, so it seems like a waste of resource to run it for all the
tests and all the modes.


My point was to test --dry-run + --write-recover-conf in remote, since 
the last one may cause recovery configuration write without doing any 
actual work, due to some wrong refactoring for example.



- There is no need for the script checking for options combinations to
initialize a data folder.  It is important to design the tests to be
cheap and meaningful.


Yes, I agree, moving some of those tests to just a 001_basic seems to be 
a proper optimization.



Patch v3-0002 also had a test to make sure that the source server is
shut down cleanly before using it.  I have included that part as
well, as the flow feels right.

So, Alexey, what do you think?


It looks good for me. Two minor remarks:

+    # option combinations.  As the code paths taken by those tests
+    # does not change for the "local" and "remote" modes, just run them

I am far from being fluent in English, but should it be 'do not change' 
instead?


+command_fails(
+    [
+        'pg_rewind', '--target-pgdata',
+        $primary_pgdata, '--source-pgdata',
+        $standby_pgdata, 'extra_arg1'
+    ],

Here and below I would prefer traditional options ordering "'--key', 
'value'". It should be easier to recognizefrom the reader perspective:


+command_fails(
+    [
+        'pg_rewind',
+        '--target-pgdata', $primary_pgdata,
+        '--source-pgdata', $standby_pgdata,
+    'extra_arg1'
+    ],


[1] 
https://github.com/postgres/postgres/blob/caa078353ecd1f3b3681c0d4fa95ad4bb8c2308a/src/bin/pg_rewind/t/005_same_timeline.pl#L15



--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] Block level parallel vacuum

2019-10-07 Thread Masahiko Sawada

On Fri, Oct 4, 2019 at 7:05 PM Dilip Kumar  wrote:
>
> On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila  wrote:
> >
> > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada  
> > wrote:
> >>
> Some more comments..

Thank you!

> 1.
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + if (!for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + else
> + {
> + /* Cleanup one index and update index statistics */
> + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
> +vacrelstats->tupcount_pages < vacrelstats->rel_pages);
> +
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> I think instead of checking for_cleanup variable for every index of
> the loop we better move loop inside, like shown below?

Fixed.

>
> if (!for_cleanup)
> for (idx = 0; idx < nindexes; idx++)
> lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> else
> for (idx = 0; idx < nindexes; idx++)
> {
> lazy_cleanup_index
> lazy_update_index_statistics
> ...
> }
>
> 2.
> +static void
> +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
> +int nindexes, IndexBulkDeleteResult **stats,
> +LVParallelState *lps, bool for_cleanup)
> +{
> + int idx;
> +
> + Assert(!IsParallelWorker());
> +
> + /* no job if the table has no index */
> + if (nindexes <= 0)
> + return;
>
> Wouldn't it be good idea to call this function only if nindexes > 0?
>

I realized the callers of this function should pass nindexes > 0
because they attempt to do index vacuuming or index cleanup. So it
should be an assertion rather than returning. Thoughts?

> 3.
> +/*
> + * Vacuum or cleanup indexes with parallel workers. This function must be 
> used
> + * by the parallel vacuum leader process.
> + */
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
> Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps, bool for_cleanup)
>
> If you see this function there is no much common code between
> for_cleanup and without for_cleanup except these 3-4 statement.
> LaunchParallelWorkers(lps->pcxt);
> /* Create the log message to report */
> initStringInfo(&buf);
> ...
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> Other than that you have got a lot of checks like this
> + if (!for_cleanup)
> + {
> + }
> + else
> + {
> }
>
> I think code would be much redable if we have 2 functions one for
> vaccum (lazy_parallel_vacuum_indexes) and another for
> cleanup(lazy_parallel_cleanup_indexes).

Seems good idea. Fixed.

>
> 4.
>  * of index scans performed.  So we don't use maintenance_work_mem memory for
>   * the TID array, just enough to hold as many heap tuples as fit on one page.
>   *
> + * Lazy vacuum supports parallel execution with parallel worker processes. In
> + * parallel lazy vacuum, we perform both index vacuuming and index cleanup 
> with
> + * parallel worker processes. Individual indexes are processed by one vacuum
>
> Spacing after the "." is not uniform, previous comment is using 2
> space and newly
> added is using 1 space.
>

FIxed.

The code has been fixed in my local repository. After incorporated the
all comments I got so far I'll submit the updated version patch.

Regards,

--
Masahiko Sawada

Re: [PATCH] use separate PartitionedRelOptions structure to store partitioned table options

2019-10-07 Thread Nikolay Shaplov

В письме от понедельник, 7 октября 2019 г. 14:57:14 MSK пользователь Michael 
Paquier написал:
> On Sun, Oct 06, 2019 at 03:47:46PM +0300, Nikolay Shaplov wrote:
> > This message is follow up to the "Get rid of the StdRdOptions" patch
> > thread: https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m
> > 
> > I've split patch into even smaller parts and commitfest want each patch in
> > separate thread. So it is new thread.
> 
> Splitting concepts into different threads may be fine, and usually
> recommended.  Splitting a set of patches into multiple entries to ease
> review and your goal to get a patch integrated and posted all these
> into the same thread is usually recommended.  Now posting a full set
> of patches across multiple threads, in way so as they have
> dependencies with each other, is what I would call a confusing
> situation.  That's hard to follow.
I understand that. I've tried to add new patches to original thread, but 
commitfest did not accept that for some reason. You can try to add patch from 
this letter https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m 
just to see how it works.

Since discussion actually did not started yet, it can be moved anywhere you 
suggest, but please tell how exactly it should be done, because I do not 
understand what is the better way.

> > The idea of this patch is following: If you read the code, partitioned
> > tables do not have any options (you will not find RELOPT_KIND_PARTITIONED
> > in boolRelOpts, intRelOpts, realRelOpts, stringRelOpts and enumRelOpts in
> > reloption.c), but it uses StdRdOptions to store them (these no options).
> I am not even sure that we actually need that.  What kind of reloption
> you would think would suit into this subset?

Actually I do not know. But the author of partitioned patch, added a stub for 
partitioned tables to have some reloptions in future. But this stub is 
designed to use StdRdOptions. Which is not correct, as I presume. So here I am 
correcting the stub.

-- 
Software Developer: https://www.upwork.com/freelancers/~014a87e140ff02c0da
Body-oriented Therapist: https://vk.com/nataraj_rebalancing  (Russian)

signature.asc
Description: This is a digitally signed message part.

Re: WIP/PoC for parallel backup

2019-10-07 Thread Rushabh Lathia

Thanks Asif for the patch.  I am opting this for a review.  Patch is
bit big, so here are very initial comments to make the review process
easier.

1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice.  It will benefit in reviewing the patch.

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.

Further, I will continue to review patch but meanwhile if you can
break the patches - so that review process be easier.

[1]
https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=f7heo2zna5t...@mail.gmail.com

Thanks,

On Fri, Oct 4, 2019 at 4:32 PM Asif Rehman  wrote:

>
>
> On Thu, Oct 3, 2019 at 6:40 PM Robert Haas  wrote:
>
>> On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman 
>> wrote:
>> >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in
>> given list.
>> >> > pg_basebackup will then send back a list of filenames in this
>> command. This commands will be send by each worker and that worker will be
>> getting the said files.
>> >>
>> >> Seems reasonable, but I think you should just pass one file name and
>> >> use the command multiple times, once per file.
>> >
>> > I considered this approach initially,  however, I adopted the current
>> strategy to avoid multiple round trips between the server and clients and
>> save on query processing time by issuing a single command rather than
>> multiple ones. Further fetching multiple files at once will also aid in
>> supporting the tar format by utilising the existing ReceiveTarFile()
>> function and will be able to create a tarball for per tablespace per worker.
>>
>> I think that sending multiple filenames on a line could save some time
>> when there are lots of very small files, because then the round-trip
>> overhead could be significant.
>>
>> However, if you've got mostly big files, I think this is going to be a
>> loser. It'll be fine if you're able to divide the work exactly evenly,
>> but that's pretty hard to do, because some workers may succeed in
>> copying the data faster than others for a variety of reasons: some
>> data is in memory, some data has to be read from disk, different data
>> may need to be read from different disks that run at different speeds,
>> not all the network connections may run at the same speed. Remember
>> that the backup's not done until the last worker finishes, and so
>> there may well be a significant advantage in terms of overall speed in
>> putting some energy into making sure that they finish as close to each
>> other in time as possible.
>>
>> To put that another way, the first time all the workers except one get
>> done while the last one still has 10GB of data to copy, somebody's
>> going to be unhappy.
>>
>
> I have updated the patch (see the attached patch) to include tablespace
> support, tar format support and all other backup base backup options to
> work in parallel mode as well. As previously suggested, I have removed
> BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
> backup. The tar format will write multiple tar files depending upon the
> number of workers specified. Also made all commands
> (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
> base_backup_opt_list. This way the command-line options can also be
> provided to these commands. Since the command-line options don't change
> once the backup initiates, I went this way instead of storing them in
> shared state.
>
> The START_BACKUP command will now return a sorted list of files in
> descending order based on file sizes. This way, the larger files will be on
> top of the list. hence these files will be assigned to workers one by one,
> making it so that the larger files will be copied before other files.
>
> Based on my understanding your main concern is that the files won't be
> distributed fairly i.e one worker might get a big file and take more time
> while others get done early with smaller files? In this approach I have
> created a list of files in descending order based on there sizes so all the
> big size files will come at the top. The maximum file size in PG is 1GB so
> if we have four workers who are picking up file from the list one by one,
> the worst case scenario is that one worker gets a file of 1GB to process
> while others get files of smaller size. However with this approach of
> descending files based on size and handing it out to workers one by one,

Re: Updated some links which are not working with new links

2019-10-07 Thread vignesh C

On Mon, Oct 7, 2019 at 1:18 PM Michael Paquier  wrote:
>
> Hi Vignesh,
>
> On Mon, Oct 07, 2019 at 09:38:41AM +0530, vignesh C wrote:
> > The attached patch in previous mail contain the changes for the updated
> > links requested in [1]. It is not the complete set, but it is the first set
> > for which I could find the equivalent links.
> > https://www.postgresql.org/message-id/20191006074122.GC14532%40paquier.xyz
>
> I may be missing something of course, but I do not see a patch neither
> on this message nor on the previous one.  Could you send a patch you
> think is correct?
Sorry Michael for the miscommunication, the patch was present in the
first mail of this mail thread.
I'm re-attaching the patch in this mail.
Let me know if anything is required.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


0001-Updated-the-links-that-are-not-working.patch
Description: Binary data

Re: Updated some links which are not working with new links

2019-10-07 Thread Michael Paquier

Hi Vignesh,

On Mon, Oct 07, 2019 at 09:38:41AM +0530, vignesh C wrote:
> The attached patch in previous mail contain the changes for the updated
> links requested in [1]. It is not the complete set, but it is the first set
> for which I could find the equivalent links.
> https://www.postgresql.org/message-id/20191006074122.GC14532%40paquier.xyz

I may be missing something of course, but I do not see a patch neither
on this message nor on the previous one.  Could you send a patch you
think is correct?
--
Michael


signature.asc
Description: PGP signature

Re: refactoring - share str2*int64 functions

2019-10-07 Thread Michael Paquier

On Fri, Oct 04, 2019 at 02:27:44PM +0530, Ashutosh Sharma wrote:
> Is there any specific reason for hard coding the *base* of a number
> representing the string in strtouint64(). I understand that currently
> strtouint64() is being used just to convert an input string to decimal
> unsigned value but what if we want it to be used for hexadecimal
> values or may be some other values, in that case it can't be used.
> Further, the function name is strtouint64() but the comments atop it's
> definition says it's pg_strtouint64(). That needs to be corrected.

Performance, as Andres has already stated upthread.  Moving away from
strtol gives roughly a 40% improvement with a call-to-call comparison:
https://www.postgresql.org/message-id/20190909052814.ga26...@paquier.xyz

> At few places, I could see that the function call to
> pg_strtoint32_check() is followed by an error handling. Isn't that
> already being done in pg_strtoint32_check function itself. For e.g. in
> refint.c the function call to pg_strtoint32_check is followed by a if
> condition that checks for an error which I assume shouldn't be there
> as it is already being done by pg_strtoint32_check.

pg_strtoint32_check is used for a signed integer, so it would complain
about incorrect input syntax, but not when the parsed integer is less
or equal than 0, which is what refint.c complains about.
--
Michael


signature.asc
Description: PGP signature

PATCH: Add uri percent-encoding for binary data

2019-10-07 Thread Anders Åstrand

Hello

Attached is a patch for adding uri as an encoding option for
encode/decode. It uses what's called "percent-encoding" in rfc3986
(https://tools.ietf.org/html/rfc3986#section-2.1).

The background for this patch is that I could easily build urls in
plpgsql, but doing the actual encoding of the url parts is painfully
slow. The list of available encodings for encode/decode looks quite
arbitrary to me, so I can't see any reason this one couldn't be in
there.

In modern web scenarios one would probably most likely want to encode
the utf8 representation of a text string for inclusion in a url, in
which case correct invocation would be ENCODE(CONVERT_TO('some text in
database encoding goes here', 'UTF8'), 'uri'), but uri
percent-encoding can of course also be used for other text encodings
and arbitrary binary data.

Regards,
Anders
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 7293d66de5..33cf7bb57c 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -512,6 +512,131 @@ esc_dec_len(const char *src, unsigned srclen)
 	return len;
 }
 
+/*
+ * URI percent encoding
+ *
+ * Percent encodes all byte values except the unreserved ASCII characters as per RFC3986.
+ */
+
+static const char upper_hex_digits[] = "0123456789ABCDEF";
+
+static unsigned
+uri_encode(const char *src, unsigned srclen, char *dst)
+{
+	char		*d = dst;
+
+	for (const char *s = src; s < src + srclen; s++)
+	{
+		if ((*s >= 'A' && *s <= 'Z') ||
+			(*s >= 'a' && *s <= 'z') ||
+			(*s >= '0' && *s <= '9') ||
+			*s == '-' ||
+			*s == '_' ||
+			*s == '.' ||
+			*s == '~')
+		{
+			*d++ = *s;
+		}
+		else
+		{
+			*d++ = '%';
+			*d++ = upper_hex_digits[(*s >> 4) & 0xF];
+			*d++ = upper_hex_digits[*s & 0xF];
+		}
+	}
+	return d - dst;
+}
+
+static unsigned
+uri_decode(const char *src, unsigned srclen, char *dst)
+{
+	const char *s = src;
+	const char *srcend = src + srclen;
+	char		*d = dst;
+	char		val;
+
+	while (s < srcend)
+	{
+		if (*s == '%')
+		{
+			if (s > srcend - 3) {
+/* This will never get triggered since uri_dec_len already takes care of validation
+ */
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("invalid uri percent encoding"),
+		 errhint("Input data ends prematurely.")));
+			}
+
+			/* Skip '%' */
+			s++;
+
+			val = get_hex(*s++) << 4;
+			val += get_hex(*s++);
+			*d++ = val;
+		}
+		else
+		{
+			*d++ = *s++;
+		}
+	}
+	return d - dst;
+}
+
+static unsigned
+uri_enc_len(const char *src, unsigned srclen)
+{
+	int			len = 0;
+
+	for (const char *s = src; s < src + srclen; s++)
+	{
+		if ((*s >= 'A' && *s <= 'Z') ||
+			(*s >= 'a' && *s <= 'z') ||
+			(*s >= '0' && *s <= '9') ||
+			*s == '-' ||
+			*s == '_' ||
+			*s == '.' ||
+			*s == '~')
+		{
+			len++;
+		}
+		else
+		{
+			len += 3;
+		}
+	}
+	return len;
+}
+
+static unsigned
+uri_dec_len(const char *src, unsigned srclen)
+{
+	const char *s = src;
+	const char *srcend = src + srclen;
+	int			len = 0;
+
+	while (s < srcend)
+	{
+		if (*s == '%')
+		{
+			if (s > srcend - 3) {
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("invalid uri percent encoding"),
+		 errhint("Input data ends prematurely.")));
+			}
+			s++;
+			get_hex(*s++);
+			get_hex(*s++);
+		}
+		else {
+			s++;
+		}
+		len++;
+	}
+	return len;
+}
+
 /*
  * Common
  */
@@ -541,6 +666,12 @@ static const struct
 			esc_enc_len, esc_dec_len, esc_encode, esc_decode
 		}
 	},
+	{
+		"uri",
+		{
+			uri_enc_len, uri_dec_len, uri_encode, uri_decode
+		}
+	},
 	{
 		NULL,
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 2483966576..f89c5ec1c3 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -1870,3 +1870,24 @@ SELECT encode(overlay(E'Th\\000omas'::bytea placing E'\\002\\003'::bytea from 5
  Th\000o\x02\x03
 (1 row)
 
+SET bytea_output TO hex;
+SELECT encode(E'en\\300\\336d'::bytea, 'uri');
+  encode   
+---
+ en%C0%DEd
+(1 row)
+
+SELECT decode('%De%c0%DEd', 'uri');
+   decode   
+
+ \xdec0de64
+(1 row)
+
+SELECT decode('error%Ex', 'uri');
+ERROR:  invalid hexadecimal digit: "x"
+SELECT decode('error%E', 'uri');
+ERROR:  invalid uri percent encoding
+HINT:  Input data ends prematurely.
+SELECT decode('error%', 'uri');
+ERROR:  invalid uri percent encoding
+HINT:  Input data ends prematurely.
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b5e75c344f..1d03836b6e 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -641,3 +641,10 @@ SELECT btrim(E'\\000trim\\000'::bytea, ''::bytea);
 SELECT encode(overlay(E'Th\\000omas'::bytea placing E'Th\\001omas'::bytea from 2),'escape');
 SELECT encode(overlay(E'Th\\000omas'::bytea placing E'\\002\\003'::bytea from 8),'escape');
 SELECT encode(overlay(E'Th\\000omas'::bytea placing E'\\002\\003'::bytea from 5 for 3),'escape');

Re: Change atoi to strtol in same place

2019-10-07 Thread Ashutosh Sharma

On Mon, Oct 7, 2019 at 11:05 AM David Rowley
 wrote:
>
> On Mon, 7 Oct 2019 at 18:27, Ashutosh Sharma  wrote:
> > AFAIU from the information given in the wiki page -[1], the port
> > numbers in the range of 1-1023 are for the standard protocols and
> > services. And there is nowhere mentioned that it is only true for some
> > OS and not for others. But, having said that I've just verified it on
> > Linux so I'm not aware of the behaviour on Windows.
> >
> > [1] - https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
>
> Here are the results of a quick test on a windows machine:
>
> L:\Projects\Postgres\install\bin>echo test > c:\windows\test.txt
> Access is denied.
>
> L:\Projects\Postgres\install\bin>cat ../data/postgresql.conf | grep "port = "
> port = 543  # (change requires restart)
>
> L:\Projects\Postgres\install\bin>psql -p 543 postgres
> psql (11.5)
> WARNING: Console code page (850) differs from Windows code page (1252)
>  8-bit characters might not work correctly. See psql reference
>  page "Notes for Windows users" for details.
> Type "help" for help.
>
> postgres=#
>

Oh then that means all the unused ports (be it dedicated for some
particular protocol or service) can be used on Windows. I just tried
using port number 21 and 443 for postgres on my old Windows setup and
it worked. See below,

C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\pg_ctl -D
..\data -c -w -l logfile -o "
-p 21" start
waiting for server to start done
server started

C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\psql -d
postgres -p 21
psql (10.5)
WARNING: Console code page (437) differs from Windows code page (1252)
 8-bit characters might not work correctly. See psql reference
 page "Notes for Windows users" for details.
Type "help" for help.

postgres=# \q

C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\pg_ctl -D
..\data -c -w -l logfile stop

waiting for server to shut down done
server stopped

C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\pg_ctl -D
..\data -c -w -l logfile -o "
-p 443" start
waiting for server to start done
server started

C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\psql -d
postgres -p 443
psql (10.5)
WARNING: Console code page (437) differs from Windows code page (1252)
 8-bit characters might not work correctly. See psql reference
 page "Notes for Windows users" for details.
Type "help" for help.

This looks a weird behaviour to me. I think this is probably one
reason why people don't prefer using Windows. Anyways, thanks David
for putting that point, it was really helpful.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: Fix for Bug #16032

2019-10-07 Thread Michael Paquier

Hi Rob,

On Thu, Oct 03, 2019 at 06:27:34PM +0100, Rob wrote:
> I stumbled on a windows-only bug in pg_basebackup which I've reported as
> #16032 
> (https://www.postgresql.org/message-id/16032-4ba56823a2b2805f%40postgresql.org).
> 
> I'm pretty sure I've fixed it in the attached patch.

Could it be possible to keep the discussion on the original thread?  I
already replied to it, and there are no problems with discussing
patches dealing with bugs directly on pgsql-bugs.  Thanks for caring.

Thanks,
--
Michael

signature.asc
Description: PGP signature

Re: pgsql: Remove pqsignal() from libpq's official exports list.

2019-10-07 Thread Michael Paquier

On Fri, Oct 04, 2019 at 11:56:31AM +0200, Christoph Berg wrote:
> This is starting to hurt in several places:
> 
> 04 11:41  mha@xindi:~$ psql
> 04 11:41  /usr/lib/postgresql/9.2/bin/psql: symbol lookup error:
>/usr/lib/postgresql/9.2/bin/psql: undefined symbol: 
> pqsignal
> 
> pg_repack linked against libpq5 11 breaks with libpq5 12:

Ouch.  So that's commit f7ab802.  I agree that this is not cool, and
libpq so version is not likely going to be bumped up (if that happens
I have code I could wipe out).  Could we reconsider this decision?  It
seems to me that we should not silently break things.
--
Michael


signature.asc
Description: PGP signature

70 matches

Mail list logo