Re: [HACKERS] Proposal: Generic WAL logical messages

2016-01-30 Thread Simon Riggs
On 29 January 2016 at 21:11, Alexander Korotkov 
wrote:

> Hi, Petr!
>
> On Sat, Jan 23, 2016 at 1:22 AM, Petr Jelinek 
> wrote:
>
>> here is updated version of this patch, calling the messages logical
>> (decoding) messages consistently everywhere and removing any connection to
>> standby messages. Moving this to it's own module gave me place to write
>> some brief explanation about this so the code documentation has hopefully
>> improved as well.
>>
>> The functionality itself didn't change.
>
>
> I'd like to mention that there is my upcoming patch which is named generic
> WAL records.
> *http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com
> *
> But it has to be distinct feature from your generic WAL logical messages.
> Theoretically, we could have generic messages with arbitrary content and
> both having custom WAL reply function and being decoded by output plugin.
> But custom WAL reply function would let extension bug break recovery,
> archiving and physical replication. And that doesn't seem to be acceptable.
> This is why we have to develop these as separate features.
>
> Should we think more about naming? Does two kinds of generic records
> confuse people?
>

Logical messages

Generic WAL records

Seems like I can tell them apart. Worth checking, but I think we're OK.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-30 Thread Alexander Korotkov
On Sat, Jan 30, 2016 at 10:58 AM, Amit Kapila 
wrote:

> On Fri, Jan 29, 2016 at 6:47 PM, Robert Haas 
> wrote:
> >
> > On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov
> >  wrote:
> > > On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > > wrote:
> > >> So far as I can tell, there are three patches in flight here:
> > >>
> > >> * replication slot IO lwlocks
> > >> * ability of extensions to request tranches dynamically
> > >> * PGPROC
> > >>
> > >> The first one hasn't been reviewed at all, but the other two have
> seen a
> > >> bit of discussion and evolution.  Is anyone doing any more reviewing?
> > >
> > > I'd like to add another one: fixed tranche id for each SLRU.
> >
> > +1 for this.  The patch looks good and I will commit it if nobody
> objects.
> >
>
> +1. Patch looks good to me as well, but I have one related question:
> Is there a reason why we should not assign ReplicationOrigins a
> fixed tranche id  and then we might want to even get away with
> LWLockRegisterTranche()?
>

+1. I think we should do this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Fwd: Core dump with nested CREATE TEMP TABLE

2016-01-30 Thread Robert Haas
On Sat, Jan 30, 2016 at 2:13 PM, Noah Misch  wrote:
> You could offer that paragraph as an objection to almost all Assert(), elog(),
> and automated tests.  Why levy it against this patch?  The valuable ways
> assertions and tests supplement review are well-established.

Sure, that's true, but I don't view all situations in the same way, so
I don't write the same thing in answer to each one.  I think I've
pretty much said what I have to say about this; if nothing I wrote up
until now swayed you, it's unlikely that anything else I say after
this point will either.

>> You may be right, but then again Tom had a different opinion, even
>> after seeing your patch, and he's no dummy.
>
> Eh?  Tom last posted to this thread before I first posted a patch.

http://www.postgresql.org/message-id/29758.1451780...@sss.pgh.pa.us
seems to me to be a vote against the concept embodied by the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] better systemd integration

2016-01-30 Thread Peter Eisentraut
On 1/29/16 4:15 PM, Pavel Stehule wrote:
> Hi
> 
> >
> >
> > You sent only rebased code of previous version. I didn't find additional
> > checks.
> 
> Oops.  Here is the actual new code.
> 
> 
> New test is working as expected
> 
> I did lot of tests - and this code works perfect in single server mode,
> and with slave hot-standby mode.
> 
> It doesn't work with only standby mode

Yeah, I hadn't though of that.  How about this change in addition:

diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 2e7f1d7..d983a50 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4933,6 +4933,11 @@ sigusr1_handler(SIGNAL_ARGS)
if (XLogArchivingAlways())
PgArchPID = pgarch_start();

+#ifdef USE_SYSTEMD
+   if (!EnableHotStandby)
+   sd_notify(0, "READY=1");
+#endif
+
pmState = PM_RECOVERY;
}
if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&

> Default timeout on FC is 90 sec - it is should not to be enough for
> large servers with large shared buffers and high checkpoint segments. It
> should be mentioned in service file.

Good point.  I think we should set TimeoutSec=0 in the suggested service file.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] better systemd integration

2016-01-30 Thread Peter Eisentraut
On 1/28/16 9:46 AM, Christoph Berg wrote:
> If a cluster is configured for non-hot-standby replication, the
> READY=1 seems to never happen. Did you check if that doesn't trigger
> any timeouts with would make the unit "fail" or the like?

As Pavel showed, it doesn't work for that.  I'll look into that.

> Also, I'm wondering how hard it would be to get socket activation work
> with that? (I wouldn't necessarily recommend that for production use,
> but on my desktop it would certainly be helpful not to have all those
> 8.4/9.0/.../9.6 clusters running all the time doing nothing.)

I had looked into socket activation, and it looks feasible, but it's a
separate feature.  I couldn't really think of a strong use case, but
what you describe makes sense.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] better systemd integration

2016-01-30 Thread Pavel Stehule
2016-01-30 22:38 GMT+01:00 Peter Eisentraut :

> On 1/29/16 4:15 PM, Pavel Stehule wrote:
> > Hi
> >
> > >
> > >
> > > You sent only rebased code of previous version. I didn't find
> additional
> > > checks.
> >
> > Oops.  Here is the actual new code.
> >
> >
> > New test is working as expected
> >
> > I did lot of tests - and this code works perfect in single server mode,
> > and with slave hot-standby mode.
> >
> > It doesn't work with only standby mode
>
> Yeah, I hadn't though of that.  How about this change in addition:
>
> diff --git a/src/backend/postmaster/postmaster.c
> b/src/backend/postmaster/postmaster.c
> index 2e7f1d7..d983a50 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -4933,6 +4933,11 @@ sigusr1_handler(SIGNAL_ARGS)
> if (XLogArchivingAlways())
> PgArchPID = pgarch_start();
>
> +#ifdef USE_SYSTEMD
> +   if (!EnableHotStandby)
> +   sd_notify(0, "READY=1");
> +#endif
> +
> pmState = PM_RECOVERY;
> }
> if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
>
> > Default timeout on FC is 90 sec - it is should not to be enough for
> > large servers with large shared buffers and high checkpoint segments. It
> > should be mentioned in service file.
>
> Good point.  I think we should set TimeoutSec=0 in the suggested service
> file.
>

probably no other is safe

Pavel


Re: [HACKERS] Template for commit messages

2016-01-30 Thread Christopher Browne
On 30 January 2016 at 05:11, Robert Haas  wrote:
>
> Well, this gets at one of the problems here, which is that you can't
> fix a commit message once the commit has been pushed.  So even if we
> all agreed in principle to a standard format, it's not clear that you
> could enforce compliance with that format to a degree sufficient to
> make machine-parseability a reality.

Yep, there's the rub in it.

Commit messages are authoritative for the things that ARE found in
commit messages.

If making them authoritative for a whole bunch of things means it is
necessary to force everyone to run some piece of commit-message-
monitoring code against their own repo, and any failure to run the
message monitoring code causes the project to fail to have
authoritative information, then it should be clear that we've
constructed something a wee bit too fragile.

A thing plausible instead is to collect the authoritative-at-commit-time
message information into an external system (hey, I wonder if anyone
has some sort of structured data store implemented that could be
good at this!), and then have a way to feed additional information into
that system that would become the authoritative source for any
requirements for the extra metadata.

That, of course, smells like a database that draws metadata from git,
and augments with further streams of inputs.  There is certainly
something problematic about assuming that we can *always* get
supplementary data.  Begs the question of how we shame people
into going back and filling the blanks we wanted filled.

It seems foolish to me to imagine that we can ensure that the
data *always* arrives at commit time; any laziness there represents
a permanent "data fault"; making it asynchronous shifts the problem
to a different spot.  I suspect we can only approach success on it,
and get *partial* metadata, at best.  If it's enough better than nothing,
then maybe that's good enough.  And I'll bet that the Commitfest
database already contains a lot of the data desired to fill blanks...

Further, if the point is to encourage reviews by making sure credit
(and hence data to support GIVING credit) is given, then it is not
inapropos for those keen on receiving credit to be responsible for
marking off the patches they know they contributed to.  That's
less fragile than expecting all credit to be attached by the
committer at commit time.

--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] [PATCH] better systemd integration

2016-01-30 Thread Peter Eisentraut
On 1/28/16 10:08 AM, Alvaro Herrera wrote:
> I wonder if instead of HAVE_SYSTEMD at each callsite we shouldn't
> instead have a pg_sd_notify() call that's a no-op when not systemd.

We do this for other optional features as well, and I think it keeps the
code clearest, especially if the ifdef'ed sections are short.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes && superuser review

2016-01-30 Thread Craig Ringer
On 29 January 2016 at 22:41, Stephen Frost  wrote:

> Michael,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
> > On Fri, Jan 29, 2016 at 6:37 AM, Stephen Frost 
> wrote:
> > > * Robert Haas (robertmh...@gmail.com) wrote:
> > >> On Thu, Jan 28, 2016 at 11:04 AM, Stephen Frost 
> > wrote:
> > >> > Personally, I don't have any particular issue having both, but the
> > >> > desire was stated that it would be better to have the regular
> > >> > GRANT EXECUTE ON catalog_func() working before we consider having
> > >> > default roles for same.  That moves the goal posts awful far
> though, if
> > >> > we're to stick with that and consider how we might extend the GRANT
> > >> > system in the future.
> > >>
> > >> I don't think it moves the goal posts all that far.  Convincing
> > >> pg_dump to dump grants on system functions shouldn't be a crazy large
> > >> patch.
> > >
> > > I wasn't clear as to what I was referring to here.  I've already
> written
> > > a patch to pg_dump to support grants on system objects and agree that
> > > it's at least reasonable.
> >
> > Is it already posted somewhere? I don't recall seeing it. Robert and Noah
> > have a point that this would be useful for users who would like to dump
> > GRANT/REVOKE rights on system functions & all, using a new option in
> > pg_dumpall, say --with-system-acl or --with-system-privileges.
>
> Multiple versions were posted on this thread.  I don't fault you for not
> finding it, this thread is a bit long in the tooth.  The one I'm
> currently working from is
>
>
It strikes me that this thread would possibly benefit from a wiki page
outlining the permissions, overall concepts, etc, as it's getting awfully
hard to follow.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-30 Thread Craig Ringer
On 29 January 2016 at 18:16, Andres Freund  wrote:

> Hi,
>
> so, I'm reviewing the output of:
>

Thankyou very much for the review.


> > + pglogical_output_plhooks \
>
> I'm doubtful we want these plhooks. You aren't allowed to access normal
> (non user catalog) tables in output plugins. That seems too much to
> expose to plpgsql function imo.
>

You're right. We've got no way to make sure the user sticks to things
that're reasonably safe.

The intent of that module was to allow people to write row and origin
filters in plpgsql, to serve as an example of how to implement hooks, and
to provide a tool usable in testing pglogical_output from pg_regress.

An example can be in C, since it's not safe to do it in plpgsql as you
noted. A few toy functions will be sufficient for test use.

As for allowing users to flexibly filter, I'm stating to think that hooks
in pglogical_output aren't really the best long term option. They're needed
for now, but for 9.7+ we should look at whether it's practical to separate
"what gets forwarded" policy from the mechanics of how it gets sent.
pglogical_output currently just farms part of the logical decoding hook out
to its own hooks, but it wouldn't have to do that if logical decoding let
you plug in policy on what you send separately to how you send it. Either
via catalogs or plugin functions separate to the output plugin.

(Kinda off topic, though, and I think we need the hooks for now, just not
the plpgsql implementation).



> > +++ b/contrib/pglogical_output/README.md
>
> I don't think we've markdown in postgres so far - so let's just keep the
> current content and remove the .md
>

I'm halfway through turning it all into SGML anyway. I just got sidetracked
by other work. I'd be just as happy to leave it as markdown but figured
SGML would likely be required.


>
> > + Table metadata header
> > +
> > +|===
> > +|*Message*|*Type/Size*|*Notes*
> > +
> > +|Message type|signed char|Literal ‘**R**’ (0x52)
> > +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not
> recognised.
> > +|relidentifier|uint32|Arbitrary relation id, unique for this upstream.
> In practice this will probably be the upstream table’s oid, but the
> downstream can’t assume anything.
> > +|nspnamelength|uint8|Length of namespace name
> > +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> > +|relnamelength|uint8|Length of relation name
> > +|relname|char[relname]|Relation name (null terminated)
> > +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> > +|natts|uint16|number of attributes
> > +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each
> of which begins with a column delimiter followed by zero or more column
> metadata blocks, each with the same column metadata block header.
>
> That's a fairly high overhead. Hm.
>

Yeah, and it shows in Oleksandr's measurements.  However, that's a metadata
message that is sent only pretty infrequently if you enable relation
metadata caching. Which is really necessary to get reasonable performance
on anything but the simplest workloads, and is only optional because it
makes it easier to write and test a client without it first.


> > +== JSON protocol
> > +
> > +If `proto_format` is set to `json` then the output plugin will emit JSON
> > +instead of the custom binary protocol. JSON support is intended mainly
> for
> > +debugging and diagnostics.
> > +
>
> I'm fairly strongly opposed to including two formats in one output
> plugin. I think the demand for being able to look into the binary
> protocol should instead be satisfied by having a function that "expands"
> the binary data returned into something easier to understand.
>

Per our discussion yesterday I think I agree with you on that now.

My thinking is that we should patch pg_recvlogical to be able to load a
decoder plugin. Then extract the protocol decoding support from pglogical
into a separately usable library that can be loaded by pg_recvlogical,
pglogical downstream, and by SQL-level debug/test helper functions.

pg_recvlogical won't be able to decode binary or internal format field
values, but you can simply not request that they be sent.


> > + case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> > + val = get_param_value(elem, false,
> OUTPUT_PARAM_TYPE_UINT32);
> > +
>  data->client_binary_basetypes_major_version = DatumGetUInt32(val);
> > + break;
>
> Why is the major version tied to basetypes (by name)? Seem more
> generally useful.
>

I found naming that param rather awkward.

The idea is that we can rely on the Pg major version only for types defined
in core.  It's mostly safe for built-in extensions in that few if any
people ever upgrade them, but it's not strictly correct even there. Most of
them (hstore, etc) don't expose their own versions so it's hard to know
what to do about them.

What I want(ed?) to do is let a downstream enumerate 

Re: [HACKERS] Template for commit messages

2016-01-30 Thread Robert Haas
On Fri, Jan 29, 2016 at 6:05 PM, Alvaro Herrera
 wrote:
>> I think the best question to ask is:
>>
>> "What is the problem we are trying to solve?"
>
> The problem is alluring more patch reviewers, beta testers and bug
> reporters.  One of the offers is to credit them (I'm not exactly clear
> on what is the group to benefit from this, but the phrasing used in the
> meeting was "contributors to the release") by having a section somewhere
> in the release notes with a list of their names.  This proposal is
> different from the previous proposal because their names wouldn't appear
> next to each feature.
>
> So the problem, of course, is collating that list of names, and the
> point of having a commit template is to have a single, complete source
> of truth from where to extract the info.
>
> Personally I don't see value in having the commit message follow a
> machine-parseable format; like if you say "Backpatch to" instead of
> "Backpatch-through:" makes your commit message wrong.  I think what was
> being proposed is to have committers ensure that the commit messages
> always carried the necessary info (which, as far as I know, they do.)

Well, this gets at one of the problems here, which is that you can't
fix a commit message once the commit has been pushed.  So even if we
all agreed in principle to a standard format, it's not clear that you
could enforce compliance with that format to a degree sufficient to
make machine-parseability a reality.

But let's say you could somehow make it so that every commit message
was machine-parseable.  Then you could write a script which went
through the commit log since the branch point for the prior release
and extracted every name listed as a reviewer and printed them all
out.  Then you could put that in the release notes or on the project
web site someplace and say "The PostgreSQL Project thanks the
following list of people for reviewing patches during the 9.23 release
cycle".  Doing that same thing without machine-parseable commit
messages can of course also be done, but it's going to require
somebody to manually review every commit, which is a lot more work.

Similarly, you could do write a script to do things like "show me all
patches with person X as an author".  Right now that's actually kind
of hard to do, especially if that author happens to be a committer.
Or you could write a script to print out all commits from the last
week in the format: COMMITTER committed a patch from AUTHOR to do
SUMMARY, and then you could stick that into PWN, again instead of
having to do it by manually reviewing the commit message.

I'm not sure that those benefits are sufficient to justify the hassle
of trying to make a fixed template work.  But I think it's a little
strange to hear a bunch of people whose develop the world's most
advanced open source relational database system argue that structured
data is an intrinsically unused thing to have, and therefore we should
just store all of our information freeform in one very wide text
column.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-30 Thread Petr Jelinek
On 29 January 2016 at 23:59, Robert Haas  wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane  wrote:
>> Alexander Korotkov  writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
>>> wrote:
 I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
 like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1.  If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>

Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.

The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.

In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-30 Thread Petr Jelinek
On 30 January 2016 at 13:48, Robert Haas  wrote:
> On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek  wrote:
>> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
>> conflict now, that's why my proposal was different, I didn't want to
>> add more keywords. I think Alvaro's proposal is fine as well.
>
> I missed your proposal, I guess, so please don't take as having any
> position on whether it's better or worse than Alvaro's.  I was only
> intending to vote for the proposition that the type of access method
> should follow the name of the access method.
>

No worries didn't mean it that way.

>> In any case this is slightly premature IMHO as DDL is somewhat unless
>> until we have sequence access methods implementation we can agree on,
>> or the generic WAL logging so that custom indexes can be crash safe.
>
> Generic WAL logging seems like a *great* idea to me.  But I think it's
> largely independent from the access method stuff.  If we have generic
> WAL logging, people can create WAL-logged stuff that is not a new
> access method.  If we have access methods, they can create new access
> methods that are not WAL-logged.  If we have both, then they can
> create WAL-logged access methods which of course is the payoff pitch,
> but I don't see it as necessary or desirable for the two systems to be
> tied together in any way.

Okay, I am only debating the usefulness of DDL for access methods in
the current situation where we only have custom index access methods
which can't create WAL records. In other words trying to nudge people
slightly back towards the actual patch(es).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE behind-the-scenes effects' CONTEXT

2016-01-30 Thread Marko Tiikkaja

On 2016-01-21 04:17, Simon Riggs wrote:

Marko, I was/am waiting for an updated patch. Could you comment please?


Sorry, I've not found time to work on this recently.

Thanks for everyone's comments so far.  I'll move this to the next CF 
and try and get an updated patch done in time for that one.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Core dump with nested CREATE TEMP TABLE

2016-01-30 Thread Noah Misch
On Sat, Jan 30, 2016 at 07:37:45AM -0500, Robert Haas wrote:
> On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch  wrote:
> > As you say, forbidding things makes friction in the event that someone comes
> > along wanting to do the forbidden thing.  Forbidding things also simplifies
> > the system, making it easier to verify.  This decision should depend on the
> > API's audience.  We have prominently-public APIs like lsyscache.h,
> > stringinfo.h and funcapi.h.  They should cater to reasonably-foreseeable use
> > cases, not just what yesterday's users have actually used.  We then have
> > narrow-interest APIs like subselect.h and view.h.  For those, the value of a
> > simpler system exceeds the risk-adjusted cost of friction.  They should 
> > impose
> > strict constraints on their callers.
> >
> > Portal status belongs to the second category.  PortalRun(), PortalRunFetch()
> > and PersistHoldablePortal() are the three core functions that place portals
> > into PORTAL_ACTIVE status.  No PGXN module does so; none so much as 
> > references
> > a PortalStatus constant or MarkPortal{Active,Done,Failed}() function.  If
> > someone adds a fourth core portal runner, the system will be simpler and
> > better if that function replicates the structure of the existing three.
> 
> I won't argue with that, but it strikes me that if I were reviewing a
> patch to do such a thing, I'd surely compare the new caller against
> the existing ones, so any failure to adhere to the same design pattern
> would be caught that way.  I expect other reviewers and/or committers
> would almost certainly do something similar.  If we presuppose
> incompetence on the part of future reviewers and committers, no action
> taken now can secure us.

You could offer that paragraph as an objection to almost all Assert(), elog(),
and automated tests.  Why levy it against this patch?  The valuable ways
assertions and tests supplement review are well-established.

> You may be right, but then again Tom had a different opinion, even
> after seeing your patch, and he's no dummy.

Eh?  Tom last posted to this thread before I first posted a patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-30 Thread Alvaro Herrera
Robert Haas wrote:
> On Fri, Jan 29, 2016 at 6:05 PM, Alvaro Herrera
>  wrote:

> > Personally I don't see value in having the commit message follow a
> > machine-parseable format; like if you say "Backpatch to" instead of
> > "Backpatch-through:" makes your commit message wrong.  I think what was
> > being proposed is to have committers ensure that the commit messages
> > always carried the necessary info (which, as far as I know, they do.)
> 
> Well, this gets at one of the problems here, which is that you can't
> fix a commit message once the commit has been pushed.

Yes, I'm aware that this is a problem.  I tried to raise the point that
we could use "git notes" to provide additional information after the
fact but was quickly made to shut up before it could be recorded in the
minutes.

If we were to adopt git notes or a similar system(*), we could use those
as a mechanism to install the machine-parseable data for each commit,
which I think fixes all the problems you point out.

(*) Another idea that comes to mind now that you mention this database
thingy of yours is to make a table or tables with commits and their
associated data, which could initially be populated from the commit
message and later updated.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Core dump with nested CREATE TEMP TABLE

2016-01-30 Thread Robert Haas
On Fri, Jan 29, 2016 at 11:19 PM, Noah Misch  wrote:
> As you say, forbidding things makes friction in the event that someone comes
> along wanting to do the forbidden thing.  Forbidding things also simplifies
> the system, making it easier to verify.  This decision should depend on the
> API's audience.  We have prominently-public APIs like lsyscache.h,
> stringinfo.h and funcapi.h.  They should cater to reasonably-foreseeable use
> cases, not just what yesterday's users have actually used.  We then have
> narrow-interest APIs like subselect.h and view.h.  For those, the value of a
> simpler system exceeds the risk-adjusted cost of friction.  They should impose
> strict constraints on their callers.
>
> Portal status belongs to the second category.  PortalRun(), PortalRunFetch()
> and PersistHoldablePortal() are the three core functions that place portals
> into PORTAL_ACTIVE status.  No PGXN module does so; none so much as references
> a PortalStatus constant or MarkPortal{Active,Done,Failed}() function.  If
> someone adds a fourth core portal runner, the system will be simpler and
> better if that function replicates the structure of the existing three.

I won't argue with that, but it strikes me that if I were reviewing a
patch to do such a thing, I'd surely compare the new caller against
the existing ones, so any failure to adhere to the same design pattern
would be caught that way.  I expect other reviewers and/or committers
would almost certainly do something similar.  If we presuppose
incompetence on the part of future reviewers and committers, no action
taken now can secure us.

>> The code you quote emits a warning
>> about a reasonably forseeable situation that can never be right, but
>> there's no particular reason to think that MarkPortalFailed is the
>> wrong thing to do here if that situation came up.
>
> I have two reasons to expect these MarkPortalFailed() calls will be the wrong
> thing for hypothetical code reaching them.  First, PortalRun() and peers reset
> ActivePortal and PortalContext on error in addition to fixing portal status
> with MarkPortalFailed().  If code forgets to update the status, there's a
> substantial chance it forgot the other two.  My patch added a comment
> explaining the second reason:
>
> +   /*
> +* See similar code in AtSubAbort_Portals().  This would fire 
> if code
> +* orchestrating multiple top-level transactions within a 
> portal, such
> +* as VACUUM, caught errors and continued under the same 
> portal with a
> +* fresh transaction.  No part of core PostgreSQL functions 
> that way,
> +* though it's a fair thing to want.  Such code would wish 
> the portal
> +* to remain ACTIVE, as in PreCommit_Portals(); we don't 
> cater to
> +* that.  Instead, like AtSubAbort_Portals(), interpret this 
> as a bug.
> +*/

You may be right, but then again Tom had a different opinion, even
after seeing your patch, and he's no dummy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-01-30 Thread Michael Paquier
On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO  wrote:
> +/* overflow check (needed for INT64_MIN) */
> +if (lval != 0 && (*retval < 0 == lval < 0))
>
> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
> If it is really needed for some reason, I think that a comment could help.

Checking for PG_INT64_MIN only would be fine as well, so let's do so.
I thought honestly that we had better check if the result and the left
argument are not of the same sign, but well.
-- 
Michael
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d5f242c..6a17990 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -967,7 +967,28 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval)
 			fprintf(stderr, "division by zero\n");
 			return false;
 		}
-		*retval = lval / rval;
+		/*
+		 * INT64_MIN / -1 is problematic, since the result
+		 * can't be represented on a two's-complement
+		 * machine. Some machines produce INT64_MIN, some
+		 * produce zero, some throw an exception. We can
+		 * dodge the problem by recognizing that division
+		 * by -1 is the same as negation.
+		 */
+		if (rval == -1)
+		{
+			*retval = -lval;
+
+			/* overflow check (needed for INT64_MIN) */
+			if (lval == PG_INT64_MIN)
+			{
+fprintf(stderr, "bigint out of range\n");
+return false;
+			}
+		}
+		else
+			*retval = lval / rval;
+
 		return true;
 
 	case '%':
@@ -976,7 +997,15 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval)
 			fprintf(stderr, "division by zero\n");
 			return false;
 		}
-		*retval = lval % rval;
+		/*
+		 * Some machines throw a floating-point exception
+		 * for INT64_MIN % -1, the correct answer being
+		 * zero in any case.
+		 */
+		if (rval == -1)
+			*retval = 0;
+		else
+			*retval = lval % rval;
 		return true;
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-30 Thread Robert Haas
On Sat, Jan 30, 2016 at 5:22 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Fri, Jan 29, 2016 at 6:05 PM, Alvaro Herrera
>>  wrote:
>
>> > Personally I don't see value in having the commit message follow a
>> > machine-parseable format; like if you say "Backpatch to" instead of
>> > "Backpatch-through:" makes your commit message wrong.  I think what was
>> > being proposed is to have committers ensure that the commit messages
>> > always carried the necessary info (which, as far as I know, they do.)
>>
>> Well, this gets at one of the problems here, which is that you can't
>> fix a commit message once the commit has been pushed.
>
> Yes, I'm aware that this is a problem.  I tried to raise the point that
> we could use "git notes" to provide additional information after the
> fact but was quickly made to shut up before it could be recorded in the
> minutes.

Yeah, shut up, Alvaro!  We don't want to hear about you and your fancy
technological solutions!  :-)

> If we were to adopt git notes or a similar system(*), we could use those
> as a mechanism to install the machine-parseable data for each commit,
> which I think fixes all the problems you point out.

I haven't experimented enough to know whether it would or not, but I
think it would be interesting to find out whether it would or not.

> (*) Another idea that comes to mind now that you mention this database
> thingy of yours is to make a table or tables with commits and their
> associated data, which could initially be populated from the commit
> message and later updated.

Yeah.  It's a crazy thought, but it's almost like this databasey thing
was designed precisely for the purpose of allowing people to structure
their data in relational ways and then interact with it in convenient
fashion.

Personally, I think that if we had a database of commit metadata, we
could use that to do all sorts of interesting reporting.  I don't
think any of that reporting is absolutely necessary for the survival
of the project, but sometimes things that aren't absolutely necessary
can still be awfully nice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-01-30 Thread Robert Haas
On Sat, Jan 30, 2016 at 2:25 AM, Peter Geoghegan  wrote:
> On Fri, Jan 29, 2016 at 2:58 PM, Robert Haas  wrote:
>> I don't quite know what you mean by these numbers.  Add a generic,
>> conservative threshold to what?
>
> I meant use "quicksort with spillover" simply because an estimated
> 90%+ of all tuples have already been consumed. Don't consider the
> tuple width, etc.

Hmm, it's a thought.

>> Thinking about this some more, I really think we should think hard
>> about going back to the strategy which you proposed and discarded in
>> your original post: always generate the first run using replacement
>> selection, and every subsequent run by quicksorting.  In that post you
>> mention powerful advantages of this method: "even without a strong
>> logical/physical correlation, the algorithm tends to produce runs that
>> are about twice the size of work_mem. (It's also notable that
>> replacement selection only produces one run with mostly presorted
>> input, even where input far exceeds work_mem, which is a neat trick.)"
>>  You went on to dismiss that strategy, saying that "despite these
>> upsides, replacement selection is obsolete, and should usually be
>> avoided."  But I don't see that you've justified that statement.
>
> Really? Just try it with a heap that is not tiny. Performance tanks.
> The fact that replacement selection can produce one long run then
> becomes a liability, not a strength. With a work_mem of something like
> 1GB, it's *extremely* painful.

I'm not sure exactly what you think I should try.  I think a couple of
people have expressed the concern that your patch might regress things
on data that is all in order, but I'm not sure if you think I should
try that case or some case that is not-quite-in-order.  "I don't see
that you've justified that statement" is referring to the fact that
you presented no evidence in your original post that it's important to
sometimes use quicksorting even for run #1.  If you've provided some
test data illustrating that point somewhere, I'd appreciate a pointer
back to it.

> A compromise that may be acceptable is to always do a "quicksort with
> spillover" when there is a very low work_mem setting and the estimate
> of the number of input tuples is less than 10x of what we've seen so
> far. Maybe less than 20MB. That will achieve the same thing.

How about always starting with replacement selection, but limiting the
amount of memory that can be used with replacement selection to some
small value?  It could be a separate GUC, or a hard-coded constant
like 20MB if we're fairly confident that the same value will be good
for everyone.  If the tuples aren't in order, then we'll pretty
quickly come to the end of the first run and switch to quicksort.  If
we do end up using replacement selection for the whole sort, the
smaller heap is an advantage.  What I like about this sort of thing is
that it adds no reliance on any estimate; it's fully self-tuning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-30 Thread Robert Haas
On Fri, Jan 29, 2016 at 3:46 AM, Ashutosh Bapat
 wrote:
> PFA patch to move code to deparse SELECT statement into a function
> deparseSelectStmtForRel(). This code is duplicated in
> estimate_path_cost_size() and postgresGetForeignPlan(), so moving it into a
> function avoids that duplication. As a side note, estimate_path_cost_size()
> doesn't add FOR SHARE/UPDATE clause to the statement being EXPLAINed, even
> if the actual statement during execution would have it. This difference
> looks unintentional to me. This patch corrects it as well.
> appendOrderByClause and appendWhereClause both create a context within
> themselves and pass it to deparseExpr. This patch creates the context once
> in deparseSelectStmtForRel() and then passes it to the other deparse
> functions avoiding repeated context creation.

Right, OK.  I think this is good, so, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-30 Thread Robert Haas
On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek  wrote:
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.

I missed your proposal, I guess, so please don't take as having any
position on whether it's better or worse than Alvaro's.  I was only
intending to vote for the proposition that the type of access method
should follow the name of the access method.

> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.

Agreed.

> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.

Generic WAL logging seems like a *great* idea to me.  But I think it's
largely independent from the access method stuff.  If we have generic
WAL logging, people can create WAL-logged stuff that is not a new
access method.  If we have access methods, they can create new access
methods that are not WAL-logged.  If we have both, then they can
create WAL-logged access methods which of course is the payoff pitch,
but I don't see it as necessary or desirable for the two systems to be
tied together in any way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-30 Thread Michael Paquier
On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote:
> Well, to put it short, I am just trying to find a way to make the
> backend skip unnecessary checkpoints on an idle system, which results
> in the following WAL pattern if system is completely idle:
> CHECKPOINT_ONLINE
> RUNNING_XACTS
> RUNNING_XACTS
> [etc..]
>
> The thing is that I am lost with the meaning of this condition to
> decide if a checkpoint should be skipped or not:
> if (prevPtr == ControlFile->checkPointCopy.redo &&
> prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
> {
> WALInsertLockRelease();
> LWLockRelease(CheckpointLock);
> return;
> }
> As at least one standby snapshot is logged before the checkpoint
> record, the redo position is never going to match the previous insert
> LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
> Skipping such unnecessary checkpoints is what you would like to
> address, no? Because that's what I would like to do here first. And
> once we got that right, we could think about addressing the case where
> WAL segments are forcibly archived for idle systems.

I have put a bit more of brain power into that, and finished with the
patch attached. A new field called chkpProgressLSN is attached to
XLogCtl, which is to the current insert position of the checkpoint
when wal_level <= archive, or the LSN position of the standby snapshot
taken after a checkpoint. The bgwriter code is modified as well so as
it uses this progress LSN and compares it with the current insert LSN
to determine if a standby snapshot should be logged or not. On an
instance of Postgres completely idle, no useless checkpoints, and no
useless standby snapshots are generated anymore.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a2846c4..ccef3f0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -527,6 +527,8 @@ typedef struct XLogCtlData
 	TransactionId ckptXid;
 	XLogRecPtr	asyncXactLSN;	/* LSN of newest async commit/abort */
 	XLogRecPtr	replicationSlotMinLSN;	/* oldest LSN needed by any slot */
+	XLogRecPtr	ckptProgressLSN;	/* LSN tracking progress of checkpoint
+	 * and standby snapshots */
 
 	XLogSegNo	lastRemovedSegNo;		/* latest removed/recycled XLOG
 		 * segment */
@@ -6314,6 +6316,7 @@ StartupXLOG(void)
 	 checkPoint.newestCommitTsXid);
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
+	XLogCtl->ckptProgressLSN = checkPoint.redo;
 
 	/*
 	 * Initialize replication slots, before there's a chance to remove
@@ -7869,6 +7872,22 @@ GetFlushRecPtr(void)
 }
 
 /*
+ * GetProgressRecPtr -- Returns the reference position of last checkpoint,
+ * taking into account standby snapshots generated by checkpoints.
+ */
+XLogRecPtr
+GetProgressRecPtr(void)
+{
+	XLogRecPtr	progress_lsn;
+
+	SpinLockAcquire(>info_lck);
+	progress_lsn = XLogCtl->ckptProgressLSN;
+	SpinLockRelease(>info_lck);
+
+	return progress_lsn;
+}
+
+/*
  * Get the time of the last xlog segment switch
  */
 pg_time_t
@@ -8133,6 +8152,7 @@ CreateCheckPoint(int flags)
 	XLogRecPtr	PriorRedoPtr;
 	XLogRecPtr	curInsert;
 	XLogRecPtr	prevPtr;
+	XLogRecPtr	progress_lsn;
 	VirtualTransactionId *vxids;
 	int			nvxids;
 
@@ -8214,10 +8234,13 @@ CreateCheckPoint(int flags)
 
 	/*
 	 * We must block concurrent insertions while examining insert state to
-	 * determine the checkpoint REDO pointer.
+	 * determine the checkpoint REDO pointer. The progress LSN of this
+	 * to-be-created checkpoint is for now initialized as the current insert
+	 * position in WAL. This will get updated later on depending on if
+	 * a standby snapshot is logged.
 	 */
 	WALInsertLockAcquireExclusive();
-	curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
+	curInsert = progress_lsn = XLogBytePosToRecPtr(Insert->CurrBytePos);
 	prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos);
 
 	/*
@@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
 	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
 	{
-		if (prevPtr == ControlFile->checkPointCopy.redo &&
+		if (GetProgressRecPtr() == prevPtr &&
 			prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
 		{
 			WALInsertLockRelease();
@@ -8406,9 +8429,15 @@ CreateCheckPoint(int flags)
 	 *
 	 * If we are shutting down, or Startup process is completing crash
 	 * recovery we don't need to write running xact data.
+	 *
+	 * progress_lsn is aimed at tracking the WAL progress that happens
+	 * since this checkpoint. If a standby snapshot is logged, its record
+	 * postion need to be taken into account in the progress LSN position
+	 * that is used to evaluate if checkpoints are necessary or standby
+	 * snapshots need to be logged, and skip them on idle systems.
 	 */
 	if (!shutdown && XLogStandbyInfoActive())
-		LogStandbySnapshot();
+		progress_lsn = LogStandbySnapshot();
 
 	

Re: [HACKERS] Additional role attributes && superuser review

2016-01-30 Thread Michael Paquier
On Sun, Jan 31, 2016 at 5:32 AM, Craig Ringer  wrote:
> On 29 January 2016 at 22:41, Stephen Frost  wrote:
>>
>> Michael,
>>
>> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> > On Fri, Jan 29, 2016 at 6:37 AM, Stephen Frost 
>> > wrote:
>> > > * Robert Haas (robertmh...@gmail.com) wrote:
>> > >> On Thu, Jan 28, 2016 at 11:04 AM, Stephen Frost 
>> > wrote:
>> > >> > Personally, I don't have any particular issue having both, but the
>> > >> > desire was stated that it would be better to have the regular
>> > >> > GRANT EXECUTE ON catalog_func() working before we consider having
>> > >> > default roles for same.  That moves the goal posts awful far
>> > >> > though, if
>> > >> > we're to stick with that and consider how we might extend the GRANT
>> > >> > system in the future.
>> > >>
>> > >> I don't think it moves the goal posts all that far.  Convincing
>> > >> pg_dump to dump grants on system functions shouldn't be a crazy large
>> > >> patch.
>> > >
>> > > I wasn't clear as to what I was referring to here.  I've already
>> > > written
>> > > a patch to pg_dump to support grants on system objects and agree that
>> > > it's at least reasonable.
>> >
>> > Is it already posted somewhere? I don't recall seeing it. Robert and
>> > Noah
>> > have a point that this would be useful for users who would like to dump
>> > GRANT/REVOKE rights on system functions & all, using a new option in
>> > pg_dumpall, say --with-system-acl or --with-system-privileges.
>>
>> Multiple versions were posted on this thread.  I don't fault you for not
>> finding it, this thread is a bit long in the tooth.  The one I'm
>> currently working from is
>>
>
> It strikes me that this thread would possibly benefit from a wiki page
> outlining the permissions, overall concepts, etc, as it's getting awfully
> hard to follow.

+1. This has proved to be very beneficial for UPSERT.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] statistics for shared catalogs not updated when autovacuum is off

2016-01-30 Thread Peter Eisentraut
When autovacuum is off, the statistics in pg_stat_sys_tables for shared
catalogs (e.g., pg_authid, pg_database) never update.  So seq_scan
doesn't update when you read the table, last_analyze doesn't update when
you run analyze, etc.

But when you shut down the server and restart it with autovacuum on, the
updated statistics magically appear right away.  So seq_scan is updated
with the number of reads you did before the shutdown, last_analyze
updates with the time of the analyze you did before the shutdown, etc.
So the data is saved, just not propagated to the view properly.

I can reproduce this back to 9.3, but not 9.2.  Any ideas?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-30 Thread Vitaly Burovoy
Hello,Pavel!

On 1/26/16, Pavel Stehule  wrote:
> I am grateful for review - now this patch is better, and I hope near final
> stage.
>
> Regards
> Pavel

I'm pretty sure we are close to it.

Now besides a code design I mentioned[1] before (and no one has
answered me about it), there are a few small notes.

Now you have all messages in one style ("invalid size: \"%s\"") except
size units ("invalid unit \"%s\", there are two places). I guess
according to the Error Style Guide[2] it should be like
"errmsg("invalid size: \"%s\"", str), errdetail("Invalid size unit
\"%s\"", unitstr),".

If it is not hard for you, it'll look better if "select" is uppercased
in tests and a comment about sharing "memory_unit_conversion_table" is
close to the "SELECT pg_size_bytes('1PB');". Moreover your comment has
a flaw: "pg_size_pretty" doesn't use that array at all, so the phrase
"These functions shares" is wrong.

Also I've just found that numeric format supports values with exponent
like "1e5" which is not allowed in pg_size_bytes. I guess the phrase
"The format is numeric with an optional size unit" should be replaced
to "The format is a fixed point number with an optional size unit" in
the documentation.

Have a look for a formatting: in the rows "num = DatumGetNumeric",
"mul_num = DatumGetNumeric", "num = DatumGetNumeric" and in error
reporting close to the "/* only alphabetic character are allowed */"
parameters in the next lines are not under the first parameters.

I insist there is no reason to call "pfree" for "str" and "buffer".
But if you do so, clean up also buffers from numeric_in, numeric_mul
and int8_numeric. If you insist it should be left as is, I leave that
decision to a committer.

P.S.: Have you thought to wrap the call "numeric_in" by a
PG_TRY/PG_CATCH instead of checking for correctness by yourself?

[1]http://www.postgresql.org/message-id/cakoswnm+ssggkysv09n_6hhfwzmrr+csjpgfhbpff5nnpfj...@mail.gmail.com
[2]http://www.postgresql.org/docs/devel/static/error-style-guide.html#AEN65
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-30 Thread Pavel Stehule
Hi


> P.S.: Have you thought to wrap the call "numeric_in" by a
> PG_TRY/PG_CATCH instead of checking for correctness by yourself?
>

I though about it, but it is not possible. Every PG_TRY/CATCH must be
finished by RETHROW. Only when you will open subtransaction and you are
playing with resource manager, you can do it. It is pretty expensive.

You can see in our code lot of situation when some function returns true,
false, "error message" instead raising a exception. I would not to refactor
numericin function in this style. This function is in critical path of COPY
FROM, and any more calls can decrease performance. And then I have to do
these checks before calling.

Regards

Pavel


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-01-30 Thread Dilip Kumar
On Sat, Jan 30, 2016 at 3:05 AM, Merlin Moncure  wrote:

> Probably want to run for at least 5 minutes via -T 300


Last time i run for 5 minutes and taken median of three runs, just missed
mentioning "-T 300" in the mail..

By looking at the results with scale factor 1000 and 100 i don't see any
reason why it will regress with scale factor 300.

So I will run the test again with scale factor 300 and this time i am
planning to run 2 cases.
1. when data fits in shared buffer
2. when data doesn't fit in shared buffer.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Optimizer questions

2016-01-30 Thread Konstantin Knizhnik

Unfortunately this two statements are not equivalent: second one can (in 
theory, but not for this particular data set) return more than 10 result 
records.
Such optimization will be correct if t2.i is declared as unique.

But the most efficient plan for this query will be generated if there is index 
for t1.v.
In this case no explicit sot is needed. Limit is still not pushed down, but it 
is not a problem because nested join is used which is not materializing its 
result
(produces records on demand):

# explain analyze select * from t1 left outer join t2 on t1.k=t2.k order by 
t1.v limit 10;
  QUERY PLAN
--
 Limit  (cost=0.58..4.38 rows=10 width=16) (actual time=0.069..0.157 rows=10 
loops=1)
   ->  Nested Loop Left Join  (cost=0.58..37926.63 rows=11 width=16) 
(actual time=0.067..0.154 rows=10 loops=1)
 ->  Index Scan using idxv on t1  (cost=0.29..3050.31 rows=11 
width=8) (actual time=0.046..0.053 rows=10 loops=1)
 ->  Index Scan using t2_pkey on t2  (cost=0.29..0.34 rows=1 width=8) 
(actual time=0.007..0.007 rows=1 loops=10)
   Index Cond: (t1.k = k)
 Planning time: 0.537 ms
 Execution time: 0.241 ms
(7 rows)


On 01/30/2016 01:01 AM, Alexander Korotkov wrote:

On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik > wrote:

Attached please find improved version of the optimizer patch for LIMIT 
clause.
Now I try to apply this optimization only for non-trivial columns requiring 
evaluation.
May be it will be better to take in account cost of this columns evaluation 
but right now I just detect non-variable columns.


We may think about more general feature: LIMIT pushdown. In the Konstantin's 
patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial. For 
instance, we can do LIMIT before JOINs. That is possible only for LEFT JOIN 
which is not used in filter and ordering clauses. See the example below.

# create table t1 as (select i, random() v from generate_series(1,100) i);
SELECT 100

# create table t2 as (select i, random() v from generate_series(1,100) i);
SELECT 100

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by t1.v 
limit 10;
  QUERY PLAN

 Limit  (cost=87421.64..87421.67 rows=10 width=24) (actual 
time=1486.276..1486.278 rows=10 loops=1)
   ->  Sort  (cost=87421.64..89921.64 rows=100 width=24) (actual 
time=1486.275..1486.275 rows=10 loops=1)
 Sort Key: t1.v
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Hash Left Join  (cost=27906.00..65812.00 rows=100 width=24) 
(actual time=226.180..1366.238 rows=100
   Hash Cond: (t1.i = t2.i)
 ->  Seq Scan on t1  (cost=0.00..15406.00 rows=100 width=12) (actual 
time=0.010..77.041 rows=100 l
 ->  Hash  (cost=15406.00..15406.00 rows=100 width=12) (actual 
time=226.066..226.066 rows=100 loop
 Buckets: 131072  Batches: 1  Memory Usage: 46875kB
 ->  Seq Scan on t2  (cost=0.00..15406.00 rows=100 width=12) (actual 
time=0.007..89.002 rows=100
 Planning time: 0.123 ms
 Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10) t1 
left join t2 on t1.i = t2.i;
  QUERY PLAN

 Hash Right Join  (cost=37015.89..56171.99 rows=10 width=24) (actual 
time=198.478..301.278 rows=10 loops=1)
   Hash Cond: (t2.i = t1.i)
   ->  Seq Scan on t2  (cost=0.00..15406.00 rows=100 width=12) (actual 
time=0.005..74.303 rows=100 loops=1)
   ->  Hash  (cost=37015.77..37015.77 rows=10 width=12) (actual 
time=153.584..153.584 rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  Limit  (cost=37015.64..37015.67 rows=10 width=12) (actual 
time=153.579..153.580 rows=10 loops=1)
 ->  Sort  (cost=37015.64..39515.64 rows=100 width=12) (actual 
time=153.578..153.578 rows=10 loops=1)
 Sort Key: t1.v
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on t1  (cost=0.00..15406.00 rows=100 width=12) (actual 
time=0.012..78.828 rows=100
 Planning time: 0.132 ms
 Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very 
nice if optimizer make this automatically.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company



--
Konstantin Knizhnik
Postgres Professional: 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-30 Thread Michael Paquier
On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
> By the discussions so far, I'm planning to have several replication
> methods such as 'quorum', 'complex' in the feature, and the each
> replication method specifies the syntax of s_s_names.
> It means that s_s_names could have the number of sync standbys like
> what current patch does.

What if the application_name of a standby node has the format of an integer?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-30 Thread Vitaly Burovoy
On 1/30/16, Pavel Stehule  wrote:
>> P.S.: Have you thought to wrap the call "numeric_in" by a
>> PG_TRY/PG_CATCH instead of checking for correctness by yourself?
>
> I though about it, but it is not possible. Every PG_TRY/CATCH must be
> finished by RETHROW.

No, src/include/utils/elog.h tells different (emphasizes are mine):
"The error recovery code can _optionally_ do PG_RE_THROW() to
propagate the _same_ error outwards."

So you can use it without rethrowing.

> Only when you will open subtransaction and you are playing with resource 
> manager, you can do it.

Really? I did not find it around the "#define PG_TRY()" definition in
"src/include/utils/elog.h".

I guess it is important to use a subtransaction if you want to catch
an exception and go further. In case of calling "numeric_in" from the
"pg_size_bytes" there is no reason to use a subtransaction that may
close any open relation etc., because a new ereport with different
message is emitted, which fails the current transaction anyway.

PG_TRY is only calls sigsetjmp and sets PG_exception_stack. If no
exception is emitted, penalty is the only sigsetjmp call (but I don't
know how heavy it is), if an exception is emitted, there is no matter
how long it handles.

> It is pretty expensive.

Ok. Performance makes sense.

> You can see in our code lot of situation when some function returns true,
> false, "error message" instead raising a exception.

I know. It is a common style in C programs.

> I would not to refactor numeric_in function in this style.

No doubt. It is not necessary.

> This function is in critical path of COPY
> FROM, and any more calls can decrease performance. And then I have to do
> these checks before calling.
>
> Regards
> Pavel

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-30 Thread Amit Kapila
On Thu, Jan 28, 2016 at 9:16 AM, Amit Kapila 
wrote:
>
> On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas 
wrote:
> >
> > On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de 
wrote:
> > > I do think there's a considerable benefit in improving the
> > > instrumentation here, but his strikes me as making live more complex
for
> > > more users than it makes it easier. At the very least this should be
> > > split into two fields (type & what we're actually waiting on). I also
> > > strongly suspect we shouldn't use in band signaling ("process not
> > > waiting"), but rather make the field NULL if we're not waiting on
> > > anything.
> >
> > +1 for splitting it into two fields.
> >
>
> I will take care of this.
>

As discussed, I have added a new field wait_event_type along with
wait_event in pg_stat_activity.  Changed the code return NULL, if
backend is not waiting.  Updated the docs as well.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


extend_pg_stat_activity_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS]

2016-01-30 Thread Vitaly Burovoy
Hackers,

I've just found a little bug: extracting "epoch" from the last 30
years before Postgres' "+Infinity" leads an integer overflow:

postgres=# SELECT x::timestamptz, extract(epoch FROM x::timestamptz)
postgres-# FROM
postgres-# (VALUES
postgres(#   ('294247-01-10 04:00:54.775805'),
postgres(#   ('294247-01-10 04:00:55.775806'),
postgres(#   ('294277-01-09 04:00:54.775806'), -- the last value before 'Inf'
postgres(#   ('294277-01-09 04:00:54.775807')  -- we've discussed, it
should be fixed
postgres(# ) as t(x);
x| date_part
-+---
 294247-01-10 04:00:54.775805+00 |  9223372036854.78
 294247-01-10 04:00:55.775806+00 | -9223372036853.78
 294277-01-09 04:00:54.775806+00 | -9222425352054.78
 infinity|  Infinity
(4 rows)

With the attached patch it becomes positive:
x|date_part
-+--
 294247-01-10 04:00:54.775805+00 | 9223372036854.78
 294247-01-10 04:00:55.775806+00 | 9223372036855.78
 294277-01-09 04:00:54.775806+00 | 9224318721654.78
 infinity| Infinity
(4 rows)

-- 
Best regards,
Vitaly Burovoy


fix_extract_overflow_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers