from:"Jeff Janes"

Re: [HACKERS] pg_basebackup --progress output for batch execution

2017-11-09 Thread Jeff Janes

On Fri, Sep 29, 2017 at 4:00 PM, Martin Marques <
martin.marq...@2ndquadrant.com> wrote:

> Hi,
>
> Some time ago I had to work on a system where I was cloning a standby
> using pg_basebackup, that didn't have screen or tmux. For that reason I
> redirected the output to a file and ran it with nohup.
>
> I normally (always actually ;) ) run pg_basebackup with --progress and
> --verbose so I can follow how much has been done. When done on a tty you
> get a nice progress bar with the percentage that has been cloned.
>
> The problem came with the execution and redirection of the output, as
> the --progress option will write a *very* long line!
>
> Back then I thought of writing a patch (actually someone suggested I do
> so) to add a --batch-mode option which would change the behavior
> pg_basebackup has when printing the output messages.
>

While separate lines in the output file is better than one very long line,
it still doesn't seem so useful.  If you aren't watching it in real time,
then you really need to have a timestamp on each line so that you can
interpret the output.  The lines are about one second apart, but I don't
know robust that timing is under all conditions.

I think I agree with Arthur that I'd rather have the decision made by
inspecting whether output is going to a tty, rather than by adding another
command line option.  But maybe that is not detected robustly enough across
all platforms and circumstances?

Cheers,

Jeff

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes

On Fri, Nov 3, 2017 at 1:34 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > With this v3 patch (assuming this is the one you just committed
> > as ec42a1dcb30de235b252f6d4) am now getting make check failures.
>
> There's a followup commit already :-(
>
> regards, tom lane
>

Yeah, that fixed it.  Thanks.

Jeff

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes

Hi Alvaro,

With this v3 patch (assuming this is the one you just committed
as ec42a1dcb30de235b252f6d4) am now getting make check failures.

brin_summarize_range is returning unexpected values.

CentOS6,
PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-18), 64-bit

Cheers,

Jeff


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] bgwriter_lru_maxpages range in postgresql.conf

2017-10-27 Thread Jeff Janes

With v10, the upper limit on bgwriter_lru_maxpages was changed from 1000 to
INT_MAX / 2, but the postgresql.conf.sample was not updated.

#bgwriter_lru_maxpages = 100# 0-1000 max buffers written/round

I don't see any precedence for including INT_MAX-type limits in the sample
config file, so maybe something like this:

#bgwriter_lru_maxpages = 100# max buffers written/round, 0 to turn
off

Cheers,

Jeff

Re: [HACKERS] postgres_fdw super user checks

2017-10-05 Thread Jeff Janes

On Thu, Oct 5, 2017 at 6:44 AM, Robert Haas  wrote:

> On Wed, Oct 4, 2017 at 6:13 PM, Jeff Janes  wrote:
> > OK.  And if you want the first one, you can wrap it in a view currently,
> but
> > if it were changed I don't know what you would do if you want the 2nd one
> > (other than having every user create their own set of foreign tables).
> So I
> > guess the current situation is more flexible.
>
> So where does that leave this patch?

Sorry, I thought we were just having a digression.  I didn't think that
part was about this patch specifically, but what more could be done later.

> I don't think it makes this
> patch a bad idea, although I kind of lean towads the view that the
> patch should just be checking superuser_arg(), not superuser() ||
> superuser_arg().
>

I don't see a reason to block a directly-logged-in superuser from using a
mapping.  I asked in the closed list whether the current (released)
behavior was a security bug, and the answer was no.  And I don't know why
else to block superusers from doing something other than as a security
bug.  Also it would create a backwards compatibility hazard to revoke the
ability now.

Cheers,

Jeff

Re: [HACKERS] postgres_fdw super user checks

2017-10-04 Thread Jeff Janes

On Thu, Sep 14, 2017 at 1:08 PM, Robert Haas  wrote:

> On Thu, Sep 14, 2017 at 2:33 PM, Jeff Janes  wrote:
> > I think that foreign tables ought to behave as views do, where they run
> as
> > the owner rather than the invoker.  No one has talked me out of it, but
> no
> > one has supported me on it either.  But I think it is too late to change
> > that now.
>
> That's an interesting point.  I think that you can imagine use cases
> for either method.  Obviously, if what you want to do is drill a hole
> through the Internet to another server and then expose it to some of
> your fellow users, having the FDW run with the owner's permissions
> (and credentials) is exactly right.  But there's another use case too,
> which is where you have something that looks like a multi-user
> sharding cluster.  You want each person's own credentials to carry
> over to everything they do remotely.
>

OK.  And if you want the first one, you can wrap it in a view currently,
but if it were changed I don't know what you would do if you want the 2nd
one (other than having every user create their own set of foreign tables).
So I guess the current situation is more flexible.

It does seem like it would then be a good idea to have a user mapping
option of "pass_the_hash" which would look up md5 hash from the pg_authid
(if the local username is the same as the remote user name) and use that to
connect to the foreign server, as an alternative option to recording the
password in plain text in the mapping itself.  But that would require some
changes to libpq, not just postgres_fdw.

And that wouldn't work for SCRAM.  I guess that SCRAM does have some
feature to allow this kind of delegation, but I don't know enough about it
to know how hard it would be to implement in postgres_fdw or how useful it
would be to have.

>
> I feel like the USER MAPPING stuff is a pretty clunky and annoying way
> of trying to make this work, no matter which of those use cases you
> happen to have.  But I'm not exactly sure what would be better,
> either, and like you say, it's a bit late to be breaking compatibility
> at this point.
>

Yeah, I have not been finding it enjoyable.  How much flexibility does the
SQL/MED spec even give us (I don't have access to the spec)?  From what I
could tell, it requires USER MAPPING to exist but doesn't give any details,
and doesn't say there can't be something else one could optionally use
instead.

Cheers,

Jeff

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-04 Thread Jeff Janes

On Tue, Oct 3, 2017 at 6:44 AM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Tue, Oct 3, 2017 at 6:33 AM, Tom Lane  wrote:
> >> I'm not an SSL expert, so insert appropriate grain of salt, but AIUI the
> >> question is what are you going to verify against?
>
> > One way to do it would be to default to the "system global certificate
> > store", which is what most other SSL apps do. For example on a typical
> > debian/ubuntu, that'd be the store in /etc/ssl/certs/ca-
> certificates.crt.
> > Exactly where to find them would be distribution-specific though, and we
> > would need to actually add support for a second certificate store. But
> that
> > would probably be a useful feature in itself.
>
> Maybe.  The impression I have is that it's very common for installations
> to use a locally-run CA to generate server and client certs.  I would not
> expect them to put such certs into /etc/ssl/certs.


Well, I would do it that way if it worked.  Not directly /etc/ssl/certs,
but /etc/pki/ca-trust/source/anchors/

I would like the locally-run CA to able to sign not just postgresql server
certs, but also apache server certs.  And then install the CA cert file in
one place per client  and have it work for psql, curl, wget, etc.

Cheers,

Jeff

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-04 Thread Jeff Janes

On Mon, Oct 2, 2017 at 9:33 PM, Tom Lane  wrote:

>
> It's possible that we could adopt some policy like "if the root.crt file
> exists then default to verify" ... but that seems messy and unreliable,
> so I'm not sure it would really add any security.
>

That is what we do.  If root.crt exists, we default to verify-ca.

And yes, it is messy and unreliable.  I don't know if it adds any security
or not.

Or do you mean we could default to verify-full instead of verify-ca?

Cheers,

Jeff

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-30 Thread Jeff Janes

On Fri, Sep 22, 2017 at 1:19 PM, Robert Haas  wrote:

> On Fri, Sep 22, 2017 at 3:39 PM, Jeff Janes  wrote:
> > It turns out it is not new in pg10.  I spotted in the log file only by
> > accident while looking for something else.  Now that I am looking for
> it, I
> > do see it in 9.6 as well.
>
> So I guess the next question is whether it also shows up if you initdb
> with 9.4.latest and then run the same test.
>

git bisect shows that it shows up in 9.5, at this commit:

commit bd7c348d83a4576163b635010e49dbcac7126f01
Author: Andres Freund 
Date:   Sat Sep 26 19:04:25 2015 +0200

Rework the way multixact truncations work.

The patches which enable the crashes and the rapid consumption of xid and
multixact both need a little adjustment from the 10rc1 versions, so I'm
attaching a combined patch that applies to bd7c348d83.

Not really sure what the next step is here.  I could promote the
ereport(LOG...) to a PANIC to get a core dump, but I don't think that would
help because presumably the problem occurred early, when the truncation was
done, not when it was detected.

Cheers,

Jeff

crash_instrument.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - minor fix for meta command only scripts

2017-09-29 Thread Jeff Janes

On Mon, Sep 11, 2017 at 6:27 PM, Fabien COELHO  wrote:

>
> Hello Jeff,
>
> Shouldn't we use pg_usleep to ensure portability?  it is defined for
>> front-end code.  But it returns void, so the error check will have to be
>> changed.
>>
>
> Attached v3 with pg_usleep called instead.
>
> I didn't see the problem before the commit I originally indicated , so I
>> don't think it has to be back-patched to before v10.
>>
>
> Hmmm you've got a point, although I'm not sure how it could work
> without sleeping explicitely. Maybe the path was calling select with an
> empty wait list plus timeout, and select is kind enough to just sleep on an
> empty list, or some other miracle.

Not really a miracle, calling select with an empty list of file handles is
a standard way to sleep on Unix-like platforms.  (Indeed, that is how
pg_usleep is implemented on non-Windows platforms, see
"src/port/pgsleep.c").  The problem is that it is reportedly not portable
to Windows.  But I tested pgbench.exe for 9.6.5-1 from EDB installer, and I
don't see excessive CPU usage for a throttled run, and it throttles to
about the correct speed.  So maybe the non-portability is more rumor than
reality.  So I don't know if this needs backpatching or not.  But it should
be fixed for v10, as there it becomes a demonstrably live issue.

> ISTM clearer to explicitly sleep in that case.

Yes.

 Cheers,

Jeff

Re: [HACKERS] v10 pg_ctl compatibility

2017-09-26 Thread Jeff Janes

On Tue, Sep 26, 2017 at 4:31 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > I was not using -l.  Instead I set logging_collector=on in
> postgresql.conf,
> > but I suppose that that is not sufficent?
>
> No, because initial stderr is still connected to whatever.
>
> > But I just retried with -l, and it still gets the fast shutdown.
>
> Hmph.  Doesn't work that way for me on RHEL6, which surely oughta
> behave the same as your CentOS.
>
> regards, tom lane
>

I happened to have a virgin install of CentOS7 here, so I tried it on
that.  I installed 9.2 and 10rc1 from repositories (CentOS and PGDG,
respectively) rather than from source, and repeated it and still got the
fast shutdown when hitting ctrl-C when 10rc1 has restarted the 9.2 server.

I did the initdb, start, and restart from pg_ctl, not from whatever
management tool comes with the packages.  I did it both with the database
running as OS user jjanes, and separately running as OS user 'postgres', so
it doesn't seem to matter whether uid is high or low.  And with '-l
logfile'.

So, I don't know.

Cheers,

Jeff

Re: [HACKERS] v10 pg_ctl compatibility

2017-09-26 Thread Jeff Janes

On Tue, Sep 26, 2017 at 3:54 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Tue, Sep 26, 2017 at 1:10 PM, Tom Lane  wrote:
> >> Really?  The server should have detached itself from your terminal
> >> group long before that.  What platform is this?
>
> > CentOS release 6.9 (Final)
>
> Hm, same as here.  Are you perhaps not using pg_ctl's -l option?
> If not, the postmaster's stderr would remain attached to your tty,
> which might be the reason why a terminal ^C affects it.
>

I was not using -l.  Instead I set logging_collector=on in postgresql.conf,
but I suppose that that is not sufficent?

But I just retried with -l, and it still gets the fast shutdown.

Cheers,

Jeff

Re: [HACKERS] v10 pg_ctl compatibility

2017-09-26 Thread Jeff Janes

On Tue, Sep 26, 2017 at 1:10 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > To add insult to injury, when v10 pg_ctl does restart a pre-10 server and
> > it sits there for a long time waiting for it to start up even though it
> has
> > already started up, if I hit ctrl-C because I assume something is
> horribly
> > wrong, it then goes ahead and kills the successfully started-and-running
> > server.
>
> Really?  The server should have detached itself from your terminal
> group long before that.  What platform is this?
>


CentOS release 6.9 (Final)

The sever log file (9.6) says:


   64926  2017-09-26 13:56:38.284 PDT LOG:  database system was shut down
at 2017-09-26 13:56:37 PDT
   64926  2017-09-26 13:56:38.299 PDT LOG:  MultiXact member wraparound
protections are now enabled
   64930  2017-09-26 13:56:38.311 PDT LOG:  autovacuum launcher started
   64924  2017-09-26 13:56:38.313 PDT LOG:  database system is ready to
accept connections
...  hit ctrl-C
   64924  2017-09-26 13:56:47.237 PDT LOG:  received fast shutdown request
   64924  2017-09-26 13:56:47.237 PDT LOG:  aborting any active transactions
   64930  2017-09-26 13:56:47.244 PDT LOG:  autovacuum launcher shutting
down
   64927  2017-09-26 13:56:47.261 PDT LOG:  shutting down
...


I can try a different platform (ubuntu 16.04, probably) tonight and see if
it does the same thing.

Cheers,

Jeff

Re: [HACKERS] v10 pg_ctl compatibility

2017-09-26 Thread Jeff Janes

On Tue, Sep 26, 2017 at 12:29 PM, Andres Freund  wrote:

> Hi,
>
> On 2017-09-26 11:59:42 -0700, Jeff Janes wrote:
> > Should the release notes have a compatibility entry about pg_ctl restart,
> > being used against a running pre-10 server, no longer being able to
> detect
> > when startup is complete?
> >
> > I don't know if cross-version use of pg_ctl restart was ever officially
> > supported, but the current behavior is rather confusing (waiting for a
> long
> > time, and then reporting failure, even though it started successfully).
>
> I'm actually tempted to just make pg_ctl verify the right version of
> postgres is being used. Maybe I'm missing something, but what's the
> use-case for allowing it, and every couple releases have some breakage?
>

The use case for me is that I have multiple versions of postgres running on
the same machine, and don't want check which pg_ctl is in my path each time
I change one of their configurations and need to restart it.  Or at least,
I never had to check before, as pg_ctl goes out of its way to restart it
with the same bin/postgres that the server is currently running with,
rather than restarting with the bin/postgres currently in the path, or
whichever bin/postgres is in the same directory as the bin/pg_ctl.  (Which
is itself not an unmitigated win, but it is what I'm used to).

Admittedly, this is a much bigger problem with hacking/testing than it is
with production use, but still I do have production machines with mixed
versions running, and can see myself screwing this up a few times once one
of them gets upgraded to v10, even with an incompatibility warning in the
release notes.  But at least I had fair warning.

To add insult to injury, when v10 pg_ctl does restart a pre-10 server and
it sits there for a long time waiting for it to start up even though it has
already started up, if I hit ctrl-C because I assume something is horribly
wrong, it then goes ahead and kills the successfully started-and-running
server.

If we do want to do a version check and fail out, I think it should check
before shutting it down, rather than shutting it down and then refusing to
start.

Cheers,

Jeff

[HACKERS] v10 pg_ctl compatibility

2017-09-26 Thread Jeff Janes

Should the release notes have a compatibility entry about pg_ctl restart,
being used against a running pre-10 server, no longer being able to detect
when startup is complete?

I don't know if cross-version use of pg_ctl restart was ever officially
supported, but the current behavior is rather confusing (waiting for a long
time, and then reporting failure, even though it started successfully).

Cheers,

Jeff

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-22 Thread Jeff Janes

On Fri, Sep 22, 2017 at 8:47 AM, Alvaro Herrera 
wrote:

> Jeff Janes wrote:
> > I am running some crash recovery testing against 10rc1 by injecting torn
> > page writes, using a test case which generates a lot of multixact, some
> > naturally by doing a lot fk updates, but most artificially by calling
> > the pg_burn_multixact function from one of the attached patches.
>
> Is this new in pg10, or do you also see it in 9.6?
>

It turns out it is not new in pg10.  I spotted in the log file only by
accident while looking for something else.  Now that I am looking for it, I
do see it in 9.6 as well.

Cheers,

Jeff

[HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-22 Thread Jeff Janes

I am running some crash recovery testing against 10rc1 by injecting torn
page writes, using a test case which generates a lot of multixact, some
naturally by doing a lot fk updates, but most artificially by calling
the pg_burn_multixact function from one of the attached patches.

In 22 hours of running I got 12 instances were messages like this appear:

MultiXact member wraparound protections are disabled because oldest
checkpointed MultiXact 681012168 does not exist on disk

This is not a fatal error, and no inconsistent data is found at the end of
the run.  But the code comments suggests that this should only happen on a
server that has been upgraded from 9.3 or 9.4, which this server has not
been.

Is the presence of this log message something that needs to be investigated
further?

Thanks,

Jeff


0002-pg_burn_multixact-utility_v10.patch
Description: Binary data


count.pl
Description: Binary data


crash_REL11.patch
Description: Binary data


do.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-21 Thread Jeff Janes

On Thu, Sep 21, 2017 at 7:42 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/19/17 20:45, Peter Eisentraut wrote:
> > On 9/19/17 17:55, Jeff Janes wrote:
> >> I guess I'm late to the party, but I don't see why this is needed at
> >> all.  We encourage people to use any and all new features which are
> >> appropriate to them--that is why we implement new features.  Why does
> >> this feature need a special invitation?
> >
> > It's not clear to me how an average user would get from the press
> > release or release notes to upgrading their installation to use
> > SCRAM-based authentication and passwords.  A little bit more guidance
> > somewhere would be helpful.
>
> Here is a patch that expands the SCRAM documentation a bit, adds more
> explanation how the different options are related, and sets some better
> links.  I think now you can get from the release notes to the relevant
> documentation and have enough information on how to put the new features
> into use.
>


This looks good to me.  Might suggest adding verifying the clients as a
specific step:

"To upgrade an existing installation from md5 to scram-sha-256, verify that
all client software supports it, set password_encryption = 'scram-sha-256'
in postgresql.conf..."

Cheers,

Jeff

Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-20 Thread Jeff Janes

On Tue, Sep 19, 2017 at 9:15 PM, Amit Kapila 
wrote:

> On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes  wrote:
> > On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro
> >  wrote:
> >>
> >> On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila 
> >> wrote:
> >> > The attached patch fixes both the review comments as discussed above.
> >
> >
> > that should be fixed by turning costs on the explain, as is the
> tradition.
> >
>
> Right.  BTW, did you get a chance to run the original test (for which
> you have reported the problem) with this patch?
>

Yes, this patch makes it use a parallel scan, with great improvement.  No
more having to \copy the data out, then run GNU split, then run my perl or
python program with GNU parallel on each file.  Instead I just have to put
a pl/perl wrapper around the function.

(next up, how to put a "create temp table alsdkfjaslfdj as" in front of it
and keep it running in parallel)

Thanks,

Jeff

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-20 Thread Jeff Janes

On Wed, Sep 20, 2017 at 6:42 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/19/17 21:44, Michael Paquier wrote:
> >> The patch that Heikki posted seemed reasonable to me as a starting
> >> point, but there probably needs to be more "how" information somewhere.
> >
> > I agree with that.
> >
> > +   
> > +Installations using MD5 authentication are encouraged to switch to
> > +SCRAM-SHA-256, unless using older client programs or drivers that
> don't
> > +support it yet.
> > +   
> > I think that the addition of a link to
> > https://wiki.postgresql.org/wiki/List_of_drivers would be appropriate.
>
> I don't have any expectation that that list will be kept up to date.
>

I am not confident that it will be either, but what could we ever have more
confidence in being kept up-to-date than something anyone can update which
is hosted on a community asset?  If we can't collectively keep it
up-to-date, then shame on us and what hope is there for anything else?
Certainly if it is not linked to from the docs, then it is just that much
less likely to be kept up to date.  The problem with it currently is that
it implies anything using libpq supports scram, even though a libpq which
supports scram has not even been released yet.

Cheers,

Jeff

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-19 Thread Jeff Janes

On Tue, Sep 19, 2017 at 4:29 PM, Michael Paquier 
wrote:

> On Wed, Sep 20, 2017 at 6:55 AM, Jeff Janes  wrote:
> > On Tue, Sep 19, 2017 at 1:32 PM, Heikki Linnakangas 
> wrote:
> >> I'm not sure what exactly to do here. Where should we stick that notice?
> >> We could put it in the release notes, where the bullet point about
> SCRAM is,
> >> but it would be well hidden. If we want to give advice to people who
> might
> >> not otherwise pay attention, it should go to a more prominent place. In
> the
> >> "Migration to version 10" section perhaps. Currently, it only lists
> >> incompatibilities, which this isn't. Perhaps put the notice after the
> list
> >> of incompatibilities (patch attached)?
> >
> > I guess I'm late to the party, but I don't see why this is needed at all.
> > We encourage people to use any and all new features which are
> appropriate to
> > them--that is why we implement new features.  Why does this feature need
> a
> > special invitation?
>
> There have been continuous complains on those lists for the last 5
> years or so that MD5 is "weak" and should be avoided. Well, Postgres
> is not wrong in the way it uses MD5 in itself, backups including raw
> MD5 hashes being more of a problem. But I would think that it is fair
> to tell in a louder to such folks that Postgres has actually done
> something on the matter.
>

People who are stressed out about it but use PostgreSQL anyway will see it
in the release notes and recognize the importance (to them) without being
told. People who don't use PostgreSQL at all due to the issue aren't going
to be reading the release notes anyway.  The place to be louder about "this
is now available" is in the announcement and press release, and in the
(currently unwritten) "E.1.1. Overview", not down in the guts.

Meanwhile the people who don't know enough about it to understand why our
use of md5 "is not wrong", will just see "encourage" and "better security"
and then go lock their users and themselves out of their database and have
a generally miserable experience.

I think the proposed invitation is too strong and warning is too weak.
Especially as there seems to be no way to audit server-side what
drivers/versions people are connecting with.  You either have to track down
every client and identify the correct binaries and run ldd against them (or
whatever you would have to do on Windows), or just go ahead and break it
and see who calls.

The people who need this don't need to be encouraged to use it, they just
need to know it exists.  The people who need to be encouraged are going to
shoot themselves in the foot.

Cheers,

Jeff

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-19 Thread Jeff Janes

On Tue, Sep 19, 2017 at 1:32 PM, Heikki Linnakangas  wrote:


> I'm not sure what exactly to do here. Where should we stick that notice?
> We could put it in the release notes, where the bullet point about SCRAM
> is, but it would be well hidden. If we want to give advice to people who
> might not otherwise pay attention, it should go to a more prominent place.
> In the "Migration to version 10" section perhaps. Currently, it only lists
> incompatibilities, which this isn't. Perhaps put the notice after the list
> of incompatibilities (patch attached)?
>
>
I guess I'm late to the party, but I don't see why this is needed at all.
We encourage people to use any and all new features which are appropriate
to them--that is why we implement new features.  Why does this feature need
a special invitation?

Cheers,

Jeff

Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-19 Thread Jeff Janes

On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro  wrote:

> On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila 
> wrote:
> > The attached patch fixes both the review comments as discussed above.
>
> This cost stuff looks unstable:
>
> test select_parallel  ... FAILED
>
> !  Gather  (cost=0.00..623882.94 rows=9976 width=8)
>  Workers Planned: 4
> !->  Parallel Seq Scan on tenk1  (cost=0.00..623882.94 rows=2494
> width=8)
>   (3 rows)
>
>   drop function costly_func(var1 integer);
> --- 112,120 
>   explain select ten, costly_func(ten) from tenk1;
>QUERY PLAN
>   
> 
> !  Gather  (cost=0.00..625383.00 rows=1 width=8)
>  Workers Planned: 4
> !->  Parallel Seq Scan on tenk1  (cost=0.00..625383.00 rows=2500
> width=8)
>   (3 rows)
>

that should be fixed by turning costs on the explain, as is the tradition.


See attached.

Cheers,

Jeff


parallel_paths_include_tlist_cost_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-09-19 Thread Jeff Janes

On Thu, Sep 14, 2017 at 8:23 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/12/17 16:39, Michael Banck wrote:
> > We could split up the logic here and create the optional physical
> > replication slot in the main connection and the temporary one in the WAL
> > streamer connection, but this would keep any fragility around for
> > (likely more frequently used) temporary replication slots. It would make
> > the patch much smaller though if I revert touching temporary slots at
> > all.
>
> That's what I was thinking.
>
> But:
>
> If the race condition concern that Jeff was describing is indeed
> correct, then the current use of temporary replication slots would be
> equally affected.  So I think either we already have a problem, or we
> don't and then this patch wouldn't introduce a new one.
>
> I don't know the details of this well enough.
>

It is possible the race condition is entirely theoretical, as the WAL files
won't be eligible for recycling until the end of the *next* (future)
checkpoint (for no good reason, as far as I can tell, although fortunate in
this case) so that should give plenty of opportunity for the WAL streamer
to connect in all but the weirdest cases.  When it reserves the WAL, I
assume it does so back-dating to the LSN position reported by
pg_start_backup and will promptly fail if those are no longer available?

I don't want to hold up the patch based on theoretical questions when it
solves an actual problem.

Cheers,

Jeff

Re: [HACKERS] GnuTLS support

2017-09-18 Thread Jeff Janes

On Sun, Sep 17, 2017 at 2:17 PM, Andreas Karlsson  wrote:

> On 09/15/2017 06:55 PM, Jeff Janes wrote:
>
>> I can't build against gnutls-2.12.23-21.el6.x86_64 from CentOS 6.9
>>
>
> Thanks for testing my patch. I have fixed both these issues plus some of
> the other feedback. A new version of my patch is attached which should, at
> least on theory, support all GnuTLS versions >= 2.11.
>

You fixed the first issue, but I still get the second one:

be-secure-gnutls.c: In function 'get_peer_certificate':
be-secure-gnutls.c:667: error: 'GNUTLS_X509_CRT_LIST_SORT' undeclared
(first use in this function)
be-secure-gnutls.c:667: error: (Each undeclared identifier is reported only
once
be-secure-gnutls.c:667: error: for each function it appears in.)

Cheers,

Jeff

Re: [HACKERS] CLUSTER command progress monitor

2017-09-15 Thread Jeff Janes

On Wed, Aug 30, 2017 at 7:12 PM, Tatsuro Yamada <
yamada.tats...@lab.ntt.co.jp> wrote:

>
> The view provides the information of CLUSTER command progress details as
> follows
> postgres=# \d pg_stat_progress_cluster
>View "pg_catalog.pg_stat_progress_cluster"
>Column|  Type   | Collation | Nullable | Default
> -+-+---+--+-
>  pid | integer |   |  |
>  datid   | oid |   |  |
>  datname | name|   |  |
>  relid   | oid |   |  |
>  phase   | text|   |  |
>  scan_method | text|   |  |
>  scan_index_relid| bigint  |   |  |
>  heap_tuples_total   | bigint  |   |  |
>  heap_tuples_scanned | bigint  |   |  |
>

I think it should be cluster_index_relid, not scan_index_relid.  If the
scan_method is seq, then the index isn't being scanned.

Cheers,

Jeff

Re: [HACKERS] GnuTLS support

2017-09-15 Thread Jeff Janes

On Thu, Aug 31, 2017 at 10:52 AM, Andreas Karlsson 
wrote:

>
>
>
> = Work left to do
>
> - Test code with older versions of GnuTLS
>

I can't build against gnutls-2.12.23-21.el6.x86_64 from CentOS 6.9

be-secure-gnutls.c: In function 'be_tls_init':
be-secure-gnutls.c:168: warning: implicit declaration of function
'gnutls_certificate_set_pin_function'
be-secure-gnutls.c: In function 'get_peer_certificate':
be-secure-gnutls.c:656: error: 'GNUTLS_X509_CRT_LIST_SORT' undeclared
(first use in this function)
be-secure-gnutls.c:656: error: (Each undeclared identifier is reported only
once
be-secure-gnutls.c:656: error: for each function it appears in.)

But I can build on ubuntu 16.04, using whatever baffling salami of package
versions they turned gnutls into on that system.

I can interoperate in both direction gnutls client to openssl server and
the reverse (and gnutls to gnutls).

Cheers,

Jeff

Re: [HACKERS] postgres_fdw super user checks

2017-09-14 Thread Jeff Janes

On Tue, Sep 12, 2017 at 1:13 AM, Andreas Karlsson  wrote:

> On 07/27/2017 09:45 PM, Jeff Janes wrote:> Here is an updated patch.  This
> version allows you use the password-less
>
>> connection if you either are the super-user directly (which is the
>> existing committed behavior), or if you are using the super-user's mapping
>> because you are querying a super-user-owned view which you have been
>> granted access to.
>>
>
> I have tested the patch and it passes the tests and works, and the code
> looks good (I have a small nitpick below).
>
> The feature seems useful, especially for people who already use views for
> security, so the question is if this is a potential footgun. I am leaning
> towards no since the superuser should be careful when grant access to is
> views anyway.
>
> It would have been nice if there was a more generic way to handle this
> since 1) the security issue is not unique to postgres_fdw and 2) this
> requires you to create a view. But since the patch is simple, an
> improvement in itself and does not prevent any future further improvements
> in this era I see no reason to let perfect be the enemy of good.
>

Thanks for the review.

I think that foreign tables ought to behave as views do, where they run as
the owner rather than the invoker.  No one has talked me out of it, but no
one has supported me on it either.  But I think it is too late to change
that now.  Wrapping it in a view is not hard, but it sure clutters up a
schema.  I don't think this can be made too generic, because each database
has a quite different security model, so the solution will be much
different.

Attached is a new patch which fixes the style issue you mentioned.

Cheers,

Jeff

postgres_fdw_superuser_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] uninterruptible state in 10beta4

2017-09-13 Thread Jeff Janes

On Wed, Sep 13, 2017 at 2:41 PM, Andres Freund  wrote:

> Hi,
>
> On 2017-09-13 14:28:34 -0700, Jeff Janes wrote:
> > In 10beta4 and 11dev, If I run the below it enters an uninterruptible
> > state.  After the insert starts, I give 15 or seconds or so until the
> > memory usage starts to grow due to enqueued triggers checks. Then I can't
> > interrupt it with either ctrl-C in psql or kill -15  from another
> > terminal.
> >
> > I have to do kill -9 
> >
> > create table foo  (x int);
> > create or replace function notice () returns trigger as $$ begin raise
> > notice 'asdfsdf'; return NEW; END;$$ language plpgsql;
> > create trigger foobar after insert on foo for each row execute procedure
> > notice();
> > insert into foo select * from generate_series(1,1);
> >
> > Git bisect lays the blame here which certainly seems plausible:
> >
> > commit d47cfef7116fb36349949f5c757aa2112c249804
> > Author: Andres Freund 
> > Date:   Tue Jul 25 17:37:17 2017 -0700
> >
> > Move interrupt checking from ExecProcNode() to executor nodes.
>
> Indeed that seems plausible. I guess something like the attached should
> fix the issue?
>

Yep, that fixes it.

Thanks,

Jeff

[HACKERS] uninterruptible state in 10beta4

2017-09-13 Thread Jeff Janes

In 10beta4 and 11dev, If I run the below it enters an uninterruptible
state.  After the insert starts, I give 15 or seconds or so until the
memory usage starts to grow due to enqueued triggers checks. Then I can't
interrupt it with either ctrl-C in psql or kill -15  from another
terminal.

I have to do kill -9 

create table foo  (x int);
create or replace function notice () returns trigger as $$ begin raise
notice 'asdfsdf'; return NEW; END;$$ language plpgsql;
create trigger foobar after insert on foo for each row execute procedure
notice();
insert into foo select * from generate_series(1,1);

Git bisect lays the blame here which certainly seems plausible:

commit d47cfef7116fb36349949f5c757aa2112c249804
Author: Andres Freund 
Date:   Tue Jul 25 17:37:17 2017 -0700

Move interrupt checking from ExecProcNode() to executor nodes.



#0  0x003e4e8db7d0 in __write_nocancel () at
../sysdeps/unix/syscall-template.S:82
#1  0x004ef3e1 in XLogFileInit (logsegno=189,
use_existent=0x7ffe52b505df "\001", use_lock=1 '\001') at xlog.c:3222
#2  0x004f15e8 in XLogWrite (WriteRqst=..., flexible=0 '\000') at
xlog.c:2408
#3  0x004f1d89 in AdvanceXLInsertBuffer (upto=3175088128,
opportunistic=) at xlog.c:2114
#4  0x004f1e99 in GetXLogBuffer (ptr=3175088128) at xlog.c:1879
#5  0x004f79fe in CopyXLogRecordToWAL (rdata=0xc921f0,
fpw_lsn=, flags=1 '\001') at xlog.c:1498
#6  XLogInsertRecord (rdata=0xc921f0, fpw_lsn=,
flags=1 '\001') at xlog.c:1073
#7  0x004fb40b in XLogInsert (rmid=10 '\n', info=0 '\000') at
xloginsert.c:462
#8  0x004af8fd in heap_insert (relation=0x7f1456e7be58,
tup=0x7f14568e12d8, cid=, options=,
bistate=) at heapam.c:2537
#9  0x0060db9f in ExecInsert (node=0x14f44b8) at
nodeModifyTable.c:601
#10 ExecModifyTable (node=0x14f44b8) at nodeModifyTable.c:1741
#11 0x005f4ef8 in ExecProcNode (node=0x14f44b8) at
execProcnode.c:422
#12 0x005f1d76 in ExecutePlan (queryDesc=0x1442bb8,
direction=, count=0, execute_once=-72 '\270') at
execMain.c:1693
#13 standard_ExecutorRun (queryDesc=0x1442bb8, direction=, count=0, execute_once=-72 '\270') at execMain.c:362
#14 0x0071c95b in ProcessQuery (plan=,
sourceText=0x149d018 "insert into foo select * from
generate_series(1,1);",
params=0x0, queryEnv=0x0, dest=,
completionTag=0x7ffe52b50cc0 "") at pquery.c:161
#15 0x0071cb95 in PortalRunMulti (portal=0x14caea8, isTopLevel=1
'\001', setHoldSnapshot=0 '\000', dest=0x14f0ca8, altdest=0x14f0ca8,
completionTag=0x7ffe52b50cc0 "") at pquery.c:1286
#16 0x0071d260 in PortalRun (portal=0x14caea8,
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001',
dest=0x14f0ca8,
altdest=0x14f0ca8, completionTag=0x7ffe52b50cc0 "") at pquery.c:799
#17 0x00719987 in exec_simple_query (query_string=0x149d018 "insert
into foo select * from generate_series(1,1);") at postgres.c:1099
#18 0x0071a8c9 in PostgresMain (argc=,
argv=, dbname=0x14473c0 "jjanes", username=)
at postgres.c:4090
#19 0x006aff4a in BackendRun (argc=,
argv=) at postmaster.c:4357
#20 BackendStartup (argc=, argv=)
at postmaster.c:4029
#21 ServerLoop (argc=, argv=) at
postmaster.c:1753
#22 PostmasterMain (argc=, argv=)
at postmaster.c:1361
#23 0x00631410 in main (argc=1, argv=0x141c2f0) at main.c:228



Cheers,

Jeff

Re: [HACKERS] pg_basebackup behavior on non-existent slot

2017-09-12 Thread Jeff Janes

On Wed, Sep 6, 2017 at 2:50 AM, Alvaro Herrera 
wrote:

> Magnus Hagander wrote:
> > On Mon, Sep 4, 2017 at 3:21 PM, Jeff Janes  wrote:
>
> > > Should the parent process of pg_basebackup be made to respond to
> SIGCHLD?
> > > Or call waitpid(bgchild, &status, WNOHANG) in some strategic loop?
> >
> > I think it's ok to just call waitpid() -- we don't need to react super
> > quickly, but we should react.
>
> Hmm, not sure about that ... in the normal case (slotname is correct)
> you'd be doing thousands of useless waitpid() system calls during the
> whole operation, no?  I think it'd be better to have a SIGCHLD handler
> that sets a flag (just once), which can be quickly checked without
> accessing kernel space.
>

If we don't want polling by waitpid, then my next thought would be to move
the data copy into another process, then have the main process do nothing
but wait for the first child to exit.  If the first to exit is the WAL
receiver, then we must have an error and the data receiver can be killed.
I don't know how to translate that to Windows, however.

Cheers,

Jeff

Re: [HACKERS] pgbench - minor fix for meta command only scripts

2017-09-11 Thread Jeff Janes

On Mon, Sep 11, 2017 at 1:49 AM, Fabien COELHO  wrote:

>
> Hello Jeff,
>
> Ok, the problem was a little bit more trivial than I thought.
>
> The issue is that under a low rate there may be no transaction in
> progress, however the wait procedure was relying on select's timeout. If
> nothing is active there is nothing to wait for, thus it was an active loop
> in this case...
>
> I've introduced a usleep call in place of select for this particular case.
> Hopefully this is portable.
>

Shouldn't we use pg_usleep to ensure portability?  it is defined for
front-end code.  But it returns void, so the error check will have to be
changed.

I didn't see the problem before the commit I originally indicated , so I
don't think it has to be back-patched to before v10.

Cheers,

Jeff

Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-10 Thread Jeff Janes

On Sun, Sep 10, 2017 at 4:40 PM, Michael Paquier 
wrote:

> Hi all,
>
> Thomas Munro has hacked up a prototype of application testing
> automatically if patches submitted apply and build:
> http://commitfest.cputube.org/
>
> I would recommend have a look at it from time to time if you are a
> patch author (or a reviewer) as any failure may say that your patch
> has rotten over time and needs a rebase. It is good to keep the commit
> fest entries build-clean at least for testers.
> Thanks,

This looks very interesting.  But when I click on a "build: failing" icon,
it takes me to a generic page,  not one describing how that specific build
is failing.

Cheers,

Jeff

Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-09-08 Thread Jeff Janes

On Wed, Sep 6, 2017 at 9:22 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 8/18/17 05:28, Michael Banck wrote:
> >>> Rebased, squashed and slighly edited version attached. I've added this
> >>> to the 2017-07 commitfest now as well:
> >>>
> >>> https://commitfest.postgresql.org/14/1112/
> >> Can you rebase this past some conflicting changes?
> > Thanks for letting me know, PFA a rebased version.
>
> I have reviewed the thread so far.  I think there is general agreement
> that something like this would be good to have.
>
> I have not found any explanation, however, why the "if not exists"
> behavior is desirable, let alone as the default.  I can only think of
> two workflows here:  Either you have scripts for previous PG versions
> that create the slot externally, in which can you omit --create, or you
> use the new functionality to have pg_basebackup create the slot.  I
> don't see any use for pg_basebackup to opportunistically use a slot if
> it happens to exist.  Even if there is one, it should not be the
> default.  So please change that.
>

+1.  I'd rather just get an error and re-run without the --create switch.
That way you are forced to think about what you are doing.


Is there a race condition here?  The slot is created after the checkpoint
is completed.  But you have to start streaming from the LSN where the
checkpoint started, so shouldn't the slot be created before the checkpoint
is started?

Cheers,

Jeff

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jeff Janes

On Tue, Sep 5, 2017 at 11:39 AM, Jesper Pedersen  wrote:

> On 09/05/2017 02:24 PM, Tom Lane wrote:
>
>> Jesper Pedersen  writes:
>>
>>> I have tested this patch on a 2-socket machine, but don't see any
>>> performance change in the various runs. However, there is no regression
>>> either in all cases.
>>>
>>
>> Hm, so if we can't demonstrate a performance win, it's hard to justify
>> risking touching this code.  What test case(s) did you use?
>>
>>
> I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using
> both logged and unlogged tables. Also ran an internal benchmark which
> didn't show anything either.
>

What scale factor and client count? How many cores per socket?  It looks
like Sokolov was just starting to see gains at 200 clients on 72 cores,
using -N transaction.

Cheers,

Jeff

[HACKERS] pg_basebackup behavior on non-existent slot

2017-09-04 Thread Jeff Janes

If I tell pg_basebackup to use a non-existent slot, it immediately reports
an error.  And then it exits with an error, but only after streaming the
entire database contents.

If you are doing this interactively and are on the ball, of course, you can
hit ctrl-C when you see the error message.

I don't know if this is exactly a bug, but it seems rather unfortunate.

Should the parent process of pg_basebackup be made to respond to SIGCHLD?
Or call waitpid(bgchild, &status, WNOHANG) in some strategic loop?



$ /usr/local/pgsql9_6/bin/pg_basebackup -D data_replica -P --slot foobar -Xs

pg_basebackup: could not send replication command "START_REPLICATION":
ERROR:  replication slot "foobar" does not exist
22384213/22384213 kB (100%), 1/1 tablespace
pg_basebackup: child process exited with error 1
pg_basebackup: removing data directory "data_replica"


Cheers,

Jeff

Re: [HACKERS] pgbench - minor fix for meta command only scripts

2017-09-04 Thread Jeff Janes

On Mon, Sep 4, 2017 at 1:56 PM, Fabien COELHO  wrote:

>
> Hello Jeff,
>
> I have fixed a bug introduced in the patch by changing && by || in the
 (min_sec > 0 && maxsock != -1) condition which was inducing errors with
 multi-threads & clients...

>>>
> Since this commit (12788ae49e1933f463bc5), if I use the --rate to throttle
>> the transaction rate, it does get throttled to about the indicated speed,
>> but the pg_bench consumes the entire CPU.
>>
>>
>> At the block of code starting
>>if (min_usec > 0 && maxsock != -1)
>>
>> If maxsock == -1, then there is no sleep happening.
>>
>
> Argh, shame on me:-(
>
> I cannot find the "induced errors" I was refering to in the message...
> Sleeping is definitely needed to avoid a hard loop.
>
> Patch attached fixes it and does not seem introduce any special issue...
>
> Should probably be backpatched.
>
> Thanks for the debug!


Thanks Fabien, that works for me.

But if min_sec <= 0, do we want to do whatever it is that we already know
is over-do, before stopping to do the select?  If it is safe to go through
this code path when maxsock == -1, then should we just change it to this?

if (min_usec > 0)

Cheers,

Jeff

Re: [HACKERS] pgbench - minor fix for meta command only scripts

2017-09-04 Thread Jeff Janes

On Mon, Sep 26, 2016 at 1:01 AM, Heikki Linnakangas  wrote:

> On 09/24/2016 12:45 PM, Fabien COELHO wrote:
>
>>
>> Attached are some small changes to your version:
>>
>> I have added the sleep_until fix.
>>
>> I have fixed a bug introduced in the patch by changing && by || in the
>> (min_sec > 0 && maxsock != -1) condition which was inducing errors with
>> multi-threads & clients...
>>
>> I have factored out several error messages in "commandFailed", in place of
>> the "metaCommandFailed", and added the script number as well in the error
>> messages. All messages are now specific to the failed command.
>>
>> I have added two states to the machine:
>>
>>   - CSTATE_CHOOSE_SCRIPT which simplifies threadRun, there is now one call
>> to chooseScript instead of two before.
>>
>>   - CSTATE_END_COMMAND which manages is_latencies and proceeding to the
>> next command, thus merging the three instances of updating the stats
>> that were in the first version.
>>
>> The later state means that processing query results is included in the per
>> statement latency, which is an improvement because before I was getting
>> some transaction latency significantly larger that the apparent sum of the
>> per-statement latencies, which did not make much sense...
>>
>
> Ok. I agree that makes more sense.
>
> I have added & updated a few comments.
>>
>
> Thanks! Committed.
>
> There are some places where the break could be a pass through
>> instead, not sure how desirable it is, I'm fine with break.
>>
>
> I left them as "break". Pass-throughs are error-prone, and make it more
> difficult to read, IMHO. The compiler will optimize it into a pass-through
> anyway, if possible and worthwhile, so there should be no performance
> difference.



Since this commit (12788ae49e1933f463bc5), if I use the --rate to throttle
the transaction rate, it does get throttled to about the indicated speed,
but the pg_bench consumes the entire CPU.


At the block of code starting
if (min_usec > 0 && maxsock != -1)

If maxsock == -1, then there is no sleep happening.

Cheers,

Jeff

Re: [HACKERS] pg_basebackup throttling doesn't throttle as promised

2017-09-01 Thread Jeff Janes

On Fri, Sep 1, 2017 at 1:32 PM, Jeff Janes  wrote:

> The "-r" option to pg_basebackup is supposed to throttle the rate of the
> backup.  But it only works properly if the server is mostly idle.
>
> Every non-trivial call to XLogFlush or XLogBackgroundFlush will wake up
> the wal sender (the one which is not really sending wal, but base files),
> and the throttling routine doesn't go back to sleep after being awoke
> early.  Rather, it releases another 32kb of data.
>
>
> Should the basebackup.c throttle sleep in a loop until its time has
> expired?
>
> Or should walsender.c WalSndWakeup not wake a wal sender whose status
> is WALSNDSTATE_BACKUP?
>
> Or both?
>


I'm attaching a patch for each option.  Each one independently solves the
problem.  But I think we should do both.  There is no point in issuing
unnecessary kill system calls, and there may also be more spurious wake-ups
than just these ones.

Cheers,

Jeff


pg_basebackup_throttle_1_v1.patch
Description: Binary data


pg_basebackup_throttle_2_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_basebackup throttling doesn't throttle as promised

2017-09-01 Thread Jeff Janes

The "-r" option to pg_basebackup is supposed to throttle the rate of the
backup.  But it only works properly if the server is mostly idle.

Every non-trivial call to XLogFlush or XLogBackgroundFlush will wake up the
wal sender (the one which is not really sending wal, but base files), and
the throttling routine doesn't go back to sleep after being awoke early.
Rather, it releases another 32kb of data.


Should the basebackup.c throttle sleep in a loop until its time has
expired?

Or should walsender.c WalSndWakeup not wake a wal sender whose status
is WALSNDSTATE_BACKUP?

Or both?

Cheers,

Jeff

Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-26 Thread Jeff Janes

On Sat, Aug 26, 2017 at 4:28 PM, Peter Geoghegan  wrote:

> On Sat, Aug 26, 2017 at 3:53 PM, Jeff Janes  wrote:
> > I get nearly a 3 fold speed up using the new transaction, from 9184 to
> 26383
> > TPS, on 8 CPU machine using scale 50 and:
> >
> > PGOPTIONS="-c synchronous_commit=off" pgbench -c32 -j32 -T60 -b tpcb-like
>
> What about with "-M prepared"? I think that most of us use that
> setting already, especially with CPU-bound workloads.
>

I still get a 2 fold improvement, from 13668 to 27036, when both
transactions are tested with -M prepared.

I am surprised, I usually haven't seen that much difference for the default
queries between prepared or not, to the point that I got out of the habit
of testing with it.  But back when I was testing with and without
systematically, I did notice that it changed a lot depending on hardware
and concurrency.  And of course from version to version different
bottlenecks come and go.

And thanks to Tom for letting me put -M at the end of the command line now.

Cheers,

Jeff

[HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-26 Thread Jeff Janes

If all the data is in memory and you have a system with fast fsyncs (or are
running with fsync off, or unlogged tables, or synchronous_commit off),
then the big bottleneck in pgbench is the amount of back and forth between
the pgbench program and the backend.  There are 7 commands per transaction.

It is easy to package 5 of those commands into a single PL/pgSQL function,
with the other two being implicit via the standard auto-commit behavior
when explicit transactions are not opened.  The attached patch does that,
under the name tpcb-func.  I first named it tpcb-like-func, but one builtin
name can't be a prefix or another so that won't work.

It creates the function unconditionally during -i, because there is no way
to know if the run-time will end up using it or not.  I think this is OK.
PL/pgSQL is installed by default in all supported versions. If someone has
gone through the bother of uninstalling it, I don't see a need to
accommodate them here.

I get nearly a 3 fold speed up using the new transaction, from 9184
to 26383 TPS, on 8 CPU machine using scale 50 and:

PGOPTIONS="-c synchronous_commit=off" pgbench -c32 -j32 -T60 -b tpcb-like

I think this should be committed as a built-in, not just a user-defined
transaction, because I would like to see it widely used.  In fact, if it
weren't for historical consistency I would say it should be the default
transaction.  Wanting to measure IPC overhead is a valid thing to do, but
certainly isn't the most common thing people want to do with pgbench.  If a
user is limited by IO, it wouldn't matter which transaction they use, and
if they are not limited by IO then this transaction is more likely to be
the right one for them than the current default one transaction is.

Also, as a user-defined transaction with -f, you have to go out of your way
to create the function (no "-i" support) and to make sure :scale gets set
correctly during runs (as it won't be automatically read from
pgbench_branches table, you have manually give -D).

Cheers,

Jeff


pgbench_function_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] subscription worker signalling wal writer too much

2017-08-26 Thread Jeff Janes

On Mon, Jul 3, 2017 at 8:26 PM, Jeff Janes  wrote:

> On Sat, Jun 24, 2017 at 5:09 PM, Andres Freund  wrote:
>
>> On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
>>
>>
>>
>
>> > Wouldn't it
>> > better, and much simpler, just to have reverted the first change once
>> the
>> > second one was done?
>>
>> I think both can actually happen independently, no? It's pretty common
>> for the page lsn to be *older* than the corresponding commit.  In fact
>> you'll always hit that case unless there's also concurrent writes also
>> touching said page.
>>
>
> That is true, but I don't see any residual regression when going from
> pre-db76b1efbbab2441428 to 7975c5e0a992ae9a4-with-my-patch applied.  So I
> think the hint bit issue only shows up when hitting the same page
> aggressively, in which case the LSN does get advanced quickly enough that
> db76b1efbba solves it.  Perhaps a different test case could show that
> issue.  I also didn't see any improvement from the original 4de82f7d7c50a81e,
> so maybe 8 CPUs just isn't enough to detect the problem with setting
> hint-buts with permanent tables with synchronous_commit=false.
>

I've refined my benchmarking, using a smaller scaling factor (8, same as my
CPU count) and running the tests for longer. The smaller size means there
are more intensive updates on individual pgbench_accounts pages, and the
longer run times allows be unset hint bits to build up for longer (usually
I like short test runs with a large number of randomized repetitions, but
that doesn't work well here).  Since the scale factor is the same as the
number of CPUs, I've bumped the thread count so that it is more likely
someone will choose a non-contended value of pgbench_branches.bid to update
at any given moment.

pgbench -c16 -j16 -T1200 -M prepared -b tpcb-func --startup='set
synchronous_commit=false'

This testing did show the importance of waking up the wal writer so that
hint bits can be set:

commit TPS
cfafd8beadcd6f 22200.48
cfafd8beadcd6f_nowake 30766.83
median of 14 runs reported, p-val on difference of means is 2e-22.

Where nowake is a hackish disabling of the wake up introduced in
4de82f7d7c50a8,
forward ported to 9.6 branch.  (I still wake it if is is doing the big
sleep, I just took out the part that wake it up even from small sleeps)

So my revised benchmark is capable of detecting this effect, even with only
8 CPUs.

When I move to next commit where we set hint bits as long as the page was
re-dirtied after, get these numbers:

db76b1efbbab24 30742.69
db76b1efbbab24_nowake 31072.97

The difference between these is not statistically different, nor is the
difference between them and cfafd8beadcd6f.

So I think the logic you introduced in db76b1efbbab24 captures all the gain
there is to be captured, and 4de82f7d7c50a8 can be reverted.  The only
reason to wake up the wal writer is if it is taking the big sleep.

Of course, I can't prove a negative.  There could always be some test case
which demonstrates that 4de82f7d7c50a8 still matters in spite of
db76b1efbbab24.
So to be ultra-conservative, attached is a patch which should preserve all
wake-ups other than the ones known to be useless.

7975c5e0a 27810.66
7975c5e0a_patched 30821.41

So 7975c5 causes a significant regression (10% regression, p-val 7e-16 on
difference of the means).  It repeatedly wakes the wal-writer up to flush a
page which the wal-writer simply refuses to flush.  This can't serve any
purpose.  My patch still wakes it up to write the page if it has not been
written yet, and also still wakes it to flush a range of pages of
WalWriterFlushAfter is met.  But it just doesn't wake it up to flush a page
which has already been written and will not get flushed.

I'd prefer the code-simplifying change of just not waking it up anymore
except for when it is doing the big sleep, but I can't reliably detect a
performance difference between that and the more conservative change posted
here.

Cheers,

Jeff

change_walwriter_wakeup_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2017-08-17 Thread Jeff Janes

On Wed, Aug 16, 2017 at 12:40 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > This patch still applies, and I think the argument for it is still valid.
> > So I'm going to make a commit-fest entry for it.  Is there additional
> > evidence we should gather?
>
> I think we had consensus to apply this at the start of the next
> development cycle; I just forgot to do it for v10.  Hence, pushed
> into v11.
>

Thanks,

Jeff

Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-08-15 Thread Jeff Janes

On Sun, Apr 9, 2017 at 4:22 AM, Michael Banck 
wrote:

> Hi,
>
> Am Freitag, den 24.03.2017, 19:32 +0100 schrieb Michael Banck:
> > On Thu, Mar 23, 2017 at 12:41:54PM +0100, Magnus Hagander wrote:
> > > On Tue, Mar 21, 2017 at 8:34 PM, Robert Haas 
> wrote:
> > > > So I tend to think that there should always be some explicit user
> > > > action to cause the creation of a slot, like --create-slot-if-needed
> > > > or --create-slot=name.  That still won't prevent careless use of that
> > > > option but it's less dangerous than assuming that a user who refers
> to
> > > > a nonexistent slot intended to create it when, perhaps, they just
> > > > typo'd it.
> > >
> > > Well, the explicit user action would be "--slot". But sure, I can
> > > definitely live with a --create-if-not-exists.
> >
> > Can we just make that option create slot and don't worry if one exists
> > already? CreateReplicationSlot() can be told to not mind about already
> > existing slots, and I don't see a huge point in erroring out if the slot
> > exists already, unless somebody can show that this leads to data loss
> > somehow.
> >
> > > The important thing, to me, is that you can run it as a single
> > > command, which makes it a lot easier to work with. (And not like we
> > > currently have for pg_receivewal which requires a separate command to
> > > create the slot)
> >
> > Oh, that is how it works with pg_receivewal, I have to admit I've never
> > used it so was a bit confused about this when I read its code.
> >
> > So in that case I think we don't necessarily need to have the same user
> > interface at all. I first thought about just adding "-C, --create" (as
> > in "--create --slot=foo"), but this on second thought this looked a bit
> > shortsighted - who knows what flashy thing pg_basebackup might create in
> > 5 years... So I settled on --create-slot, which is only slightly more to
> > type (albeit repetive, "--create-slot --slot=foo"), but adding a short
> > option "-C" would be fine I thinkg "-C -S foo".
> >
> > So attached is a patch with adds that option. If people really think it
> > should be --create-slot-if-not-exists instead I can update the patch, of
> > course.
> >
> > I again added a second patch with some further refactoring which makes
> > it print a message on temporary slot creation in verbose mode.
>
> Rebased, squashed and slighly edited version attached. I've added this
> to the 2017-07 commitfest now as well:
>
> https://commitfest.postgresql.org/14/1112/



Can you rebase this past some conflicting changes?

Thanks,

Jeff

Re: [HACKERS] why not parallel seq scan for slow functions

2017-08-02 Thread Jeff Janes

On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila 
wrote:

> On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes  wrote:
> > On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila 
> > wrote:
> >>
> >> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes 
> wrote:
> >> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar 
> >> > wrote:
> >> >>
> >> >> So because of this high projection cost the seqpath and parallel path
> >> >> both have fuzzily same cost but seqpath is winning because it's
> >> >> parallel safe.
> >> >
> >> >
> >> > I think you are correct.  However, unless parallel_tuple_cost is set
> >> > very
> >> > low, apply_projection_to_path never gets called with the Gather path
> as
> >> > an
> >> > argument.  It gets ruled out at some earlier stage, presumably because
> >> > it
> >> > assumes the projection step cannot make it win if it is already behind
> >> > by
> >> > enough.
> >> >
> >>
> >> I think that is genuine because tuple communication cost is very high.
> >
> >
> > Sorry, I don't know which you think is genuine, the early pruning or my
> > complaint about the early pruning.
> >
>
> Early pruning.  See, currently, we don't have a way to maintain both
> parallel and non-parallel paths till later stage and then decide which
> one is better. If we want to maintain both parallel and non-parallel
> paths, it can increase planning cost substantially in the case of
> joins.  Now, surely it can have benefit in many cases, so it is a
> worthwhile direction to pursue.
>

If I understand it correctly, we have a way, it just can lead to
exponential explosion problem, so we are afraid to use it, correct?  If I
just lobotomize the path domination code (make pathnode.c line 466 always
test false)

if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)

Then it keeps the parallel plan and later chooses to use it (after applying
your other patch in this thread) as the overall best plan.  It even doesn't
slow down "make installcheck-parallel" by very much, which I guess just
means the regression tests don't have a lot of complex joins.

But what is an acceptable solution?  Is there a heuristic for when
retaining a parallel path could be helpful, the same way there is for
fast-start paths?  It seems like the best thing would be to include the
evaluation costs in the first place at this step.

Why is the path-cost domination code run before the cost of the function
evaluation is included?  Is that because the information needed to compute
it is not available at that point, or because it would be too slow to
include it at that point? Or just because no one thought it important to do?

Cheers,

Jeff

Re: [HACKERS] tab complete for psql pset pager values

2017-07-28 Thread Jeff Janes

On Wed, Jul 26, 2017 at 8:02 AM, Pavel Stehule 
wrote:

> Hi
>
> attached trivial patch for missing tab complete for \pset pager setting
>
>
Looks good to me.  I've set it ready for committer.

Cheers,

Jeff

Re: [HACKERS] postgres_fdw super user checks

2017-07-27 Thread Jeff Janes

On Thu, Dec 1, 2016 at 7:11 PM, Haribabu Kommi 
wrote:

> On Tue, Oct 18, 2016 at 10:38 AM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> On Mon, Oct 17, 2016 at 10:51 PM, Robert Haas 
>> wrote:
>> > On Mon, Oct 17, 2016 at 2:18 AM, Michael Paquier
>> >  wrote:
>> >> On Mon, Oct 17, 2016 at 3:33 AM, Jeff Janes 
>> wrote:
>> >>> postgres_fdw has some checks to enforce that non-superusers must
>> connect to
>> >>> the foreign server with a password-based method.  The reason for this
>> is to
>> >>> prevent the authentication to the foreign server from happening on
>> the basis
>> >>> of the OS user who is running the non-foreign server.
>> >>>
>> >>> But I think these super user checks should be run against the userid
>> of the
>> >>> USER MAPPING being used for the connection, not the userid of
>> currently
>> >>> logged on user.
>> >>
>> >> So, if the user mapping user is a superuser locally, this would allow
>> >> any lambda user of the local server to attempt a connection to the
>> >> remote server. It looks dangerous rather dangerous to me to authorize
>> >> that, even if the current behavior is a bit inconsistent I agree.
>> >
>> > I don't know what "any lambda user" means.  Did you mean to write "any
>> > random user"?
>>
>> Yes, in this context that would be "any non-superuser" or "any user
>> without superuser rights". Actually that's a French-ism. I just
>> translated it naturally to English to define a user that has no access
>> to advanced features :)
>>
>
>
> Thanks for the patch, but it breaking the existing functionality as per
> the other
> mails. Marked as "returned with feedback" in 2016-11 commitfest.
>

Here is an updated patch.  This version allows you use the password-less
connection if you either are the super-user directly (which is the existing
committed behavior), or if you are using the super-user's mapping because
you are querying a super-user-owned view which you have been granted access
to.

It first I thought the currently committed behavior might be a security bug
as a directly logged in superuser can use another user's user-defined
mapping but without the password restriction when querying a view made by
someone else.  But consulting with the security list nearly a year ago, the
conclusion was that it is never a good idea for a superuser to blindly
query from other users' views, and that the current situation is no worse
for postgres_fdw than it is for other features, and so nothing needs to be
done about it.  So that is why I've decided to allow the passwordless
solution in either situation--a superuser using someone else mapping, or
someone else using a super user's mapping.

I didn't update any comments because the existing ones seem to apply
equally well to the new code as the old code.

The regression test passes with this version because I still allow the old
behavior.  I didn't add a new test to also test the new behavior, because I
don't know how to do that with the existing make check framework, and a TAP
test seems like overkill.

Cheers,

Jeff

postgres_fdw_superuser_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] tab completion for "create user mapping for"

2017-07-27 Thread Jeff Janes

If you have "CREATE USE" tab completion will recommend both USER and USER
MAPPING FOR.

But once you have the full "CREATE USER " or "CREATE USER M" it will not
complete the rest of the "MAPPING FOR".

Attached patch fixes that.

Cheers,

Jeff


user_mapping_tab_complete_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-24 Thread Jeff Janes

On Thu, Jul 20, 2017 at 12:51 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > I think that's a valid point.  There are also other concerns here -
> > e.g. whether instead of adopting the patch as proposed we ought to (a)
> > use some smaller size, or (b) keep the size as-is but reduce the
> > maximum fraction of shared_buffers that can be consumed, or (c) divide
> > the ring buffer size through by autovacuum_max_workers.  Personally,
> > of those approaches, I favor (b).  I think a 16MB ring buffer is
> > probably just fine if you've got 8GB of shared_buffers but I'm
> > skeptical about it when you've got 128MB of shared_buffers.
>
> WFM.  I agree with *not* dividing the basic ring buffer size by
> autovacuum_max_workers.  If you have allocated more AV workers, I think
> you expect AV to go faster, not for the workers to start fighting among
> themselves.
>

But fighting among themselves is just what they do regarding the
autovacuum_vacuum_cost_limit, so I don't see why it should be one way there
but different here.  The reason for setting autovacuum_max_workers to N is
so that small tables aren't completely starved of vacuuming even if N-1
larger tables are already being vacuumed simultaneously.  Now the small
tables get vacuumed at speed 1/N, which kind of sucks, but that is the
mechanism we currently have.

Of course just because we are in a hole with vacuum_cost_limit doesn't mean
we should dig ourselves deeper, but we are being inconsistent then.

Cheers,

Jeff

Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-24 Thread Jeff Janes

On Sat, Jul 22, 2017 at 8:53 PM, Amit Kapila 
wrote:

> On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila 
> wrote:
> > On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes 
> wrote:
> >>
> >>
> >>
> >> Setting parallel_workers to 8 changes the threshold for the parallel to
> even
> >> be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076.  So it is
> >> going in the correct direction, but not by enough to matter.
> >>
> >
> > You might want to play with cpu_tuple_cost and or seq_page_cost.
> >
>
> I don't know whether the patch will completely solve your problem, but
> this seems to be the right thing to do.  Do you think we should stick
> this for next CF?
>

It doesn't solve the problem for me, but I agree it is an improvement we
should commit.

Cheers,

Jeff

Re: [HACKERS] Better error message for trying to drop a DB with open subscriptions?

2017-07-20 Thread Jeff Janes

On Thu, Jul 20, 2017 at 4:09 PM, Josh Berkus  wrote:

> All:
>
> The problem:
>
> postgres=# drop database bookdata;
> ERROR:  database "bookdata" is being accessed by other users
> DETAIL:  There is 1 other session using the database.
> postgres=# \c bookdata
> You are now connected to database "bookdata" as user "postgres".
> bookdata=# drop subscription wholedb;
> NOTICE:  dropped replication slot "wholedb" on publisher
> DROP SUBSCRIPTION
> bookdata=# \c postgres
> You are now connected to database "postgres" as user "postgres".
> postgres=# drop database bookdata;
> DROP DATABASE
>
> Is there any easy way for us to detect that the "user" accessing the
> target database is actually a logical replication subscription, and give
> the DBA a better error message (e.g. "database 'bookdata' still has open
> subscrptions")?
>


+1

Better yet would be to just cascade the drop, but I assume that would be
harder to do.

Cheers,

Jeff

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-20 Thread Jeff Janes

On Thu, Jul 20, 2017 at 7:59 AM, Robert Haas  wrote:

> >
> > Initially I wanted to make BAS_BULKWRITE and BAS_VACUUM ring sizes
> > configurable, but after testing I don't see much gain from increasing
> > ring buffer above 16MB. So I propose just 1 line change.
>
> I think the question for this patch is "so, why didn't we do it this
> way originally?".
>
> It's no secret that making the ring buffer larger will improve
> performance -- in fact, not having a ring buffer at all would improve
> performance even more.  But it would also increase the likelihood that
> the background work of vacuum would impact the performance of
> foreground operations, which is already a pretty serious problem that
> we probably don't want to make worse.

But having a very fast sequence of fdatasync calls is not terribly friendly
to the performance of the foreground operations, either.

I think the reason we didn't do it this way originally is tied the same
reason that autovacuum_vacuum_cost_delay = 20ms by default. If we want it
to be heavily throttled, there isn't much point in using a larger ring
buffer.  It is just wasted space. Maybe we could have it start out at
BAS_VACUUM's default size, then grow by one buffer every time it had to
issue a WAL sync, until it reached BAS_BULKWRITE's size where it would max
out.

Cheers,

Jeff

Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-20 Thread Jeff Janes

On Thu, Jul 20, 2017 at 6:28 AM, Stephen Frost  wrote:

> Greetings,
>
> * Sokolov Yura (y.soko...@postgrespro.ru) wrote:
> > I wrote two days ago about vacuum ring buffer:
> > https://www.postgresql.org/message-id/8737e9bddb82501da1134f021bf492
> 9a%40postgrespro.ru
> >
> > Increasing Vacuum's ring buffer to size of Bulk Writer's one reduces
> > autovacuum time in 3-10 times.
> > (for both patched and unpatched version I used single non-default
> > setting
> > 'autovacuum_cost_delay=2ms').
> >
> > This is single line change, and it improves things a lot.
>
> Right- when the database fits in the OS cache but not in shared_buffers.
>

On a system with a slow fsync, increasing the ring buffer helps a lot even
if database doesn't fit in the OS cache. When the next buffer allocation
runs into a dirtied buffer in the ring, it needs to sync the WAL up through
that buffer's LSN before it can write it out and reuse it.  With a small
ring, this means a lot of WAL flushing needs to be done.

> I do agree that's a useful improvement to make based on your testing.
>
> It's not clear off-hand how much that would improve this case, as
> the database size appears to pretty quickly get beyond the OS memory
> size (and only in the first test is the DB starting size less than
> system memory to begin with).
>

Also, this system probably has a pretty fast fdatasync, considering it is
SSD.

Cheers,

Jeff

Re: [HACKERS] SCRAM auth and Pgpool-II

2017-07-15 Thread Jeff Janes

On Fri, Jul 14, 2017 at 7:48 AM, Vladimir Borodin  wrote:

>
> 14 июля 2017 г., в 1:33, Stephen Frost  написал(а):
>
> What would be really nice for such cases is support for Kerberos and
> delegated Kerberos credentials.  Having pgpool support that would remove
> the need to deal with passwords at all.
>
>
> Since nearly all systems with some kind of load nowadays use connection
> poolers (pgpool-II or pgbouncer) between applications and postgres, it is a
> pretty big pain to re-implement all authentication methods supported by
> postgres in such poolers. Kerberos is cool but not the only thing that
> should be supported by FDWs or connection poolers. I.e. many users would
> want to have support for LDAP and SCRAM.
>

For the postgres_fdw, LDAP and SCRAM just work.  In the case of SCRAM (and
MD5), it would be nice if you could store something other than the
plain-text password, but that is a different matter.   If other FDW connect
to something which can do LDAP or SCRAM, I don't see why those FDW would
have any difficulty, either.

> And every time when there would be some changes in postgres auth methods,
> exactly the same work (or even worse) should be done in many (at least two)
> other products widely used by people.
>

That is not all that often.

>
> It seems that postgres either should provide connection pooling feature in
> core
>

That would be nice, but since pgpool and pgbouncer co-exist with each
other, I see no reason to think they wouldn't continue to exist even if
there were an in-core pooler.

Cheers,

Jeff

Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-12 Thread Jeff Janes

On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila 
wrote:

> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes  wrote:
> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar 
> wrote:
> >>
> >> So because of this high projection cost the seqpath and parallel path
> >> both have fuzzily same cost but seqpath is winning because it's
> >> parallel safe.
> >
> >
> > I think you are correct.  However, unless parallel_tuple_cost is set very
> > low, apply_projection_to_path never gets called with the Gather path as
> an
> > argument.  It gets ruled out at some earlier stage, presumably because it
> > assumes the projection step cannot make it win if it is already behind by
> > enough.
> >
>
> I think that is genuine because tuple communication cost is very high.
>

Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.

I agree that the communication cost is high, which is why I don't want to
have to set parellel_tuple_cost very low.  For example, to get the benefit
of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
real-world case, not the dummy test case I posted, although the number are
around the same for that one too).  But with a setting that low, all kinds
of other things also start using parallel plans, even if they don't benefit
from them and are harmed.

I realize we need to do some aggressive pruning to avoid an exponential
explosion in planning time, but in this case it has some rather unfortunate
consequences.  I wanted to explore it, but I can't figure out where this
particular pruning is taking place.

By the time we get to planner.c line 1787, current_rel->pathlist already
does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
pruning is happening earlier than that.

> If your table is reasonable large then you might want to try by
> increasing parallel workers (Alter Table ... Set (parallel_workers =
> ..))
>

Setting parallel_workers to 8 changes the threshold for the parallel to
even be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076.  So it
is going in the correct direction, but not by enough to matter.

Cheers,

Jeff

Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-11 Thread Jeff Janes

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar  wrote:

>
>
> In below function, we always multiply the target->cost.per_tuple with
> path->rows, but in case of gather it should multiply this with
> subpath->rows
>
> apply_projection_to_path()
> 
>
> path->startup_cost += target->cost.startup - oldcost.startup;
> path->total_cost += target->cost.startup - oldcost.startup +
> (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
>
>
> So because of this high projection cost the seqpath and parallel path
> both have fuzzily same cost but seqpath is winning because it's
> parallel safe.
>

I think you are correct.  However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument.  It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.

So the attached patch improves things, but doesn't go far enough.

Cheers,

Jeff

subpath_projection_cost.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] why not parallel seq scan for slow functions

2017-07-10 Thread Jeff Janes

If I have a slow function which is evaluated in a simple seq scan, I do not
get parallel execution, even though it would be massively useful.  Unless
force_parallel_mode=ON, then I get a dummy parallel plan with one worker.

explain select aid,slow(abalance) from pgbench_accounts;

CREATE OR REPLACE FUNCTION slow(integer)
 RETURNS integer
 LANGUAGE plperl
 IMMUTABLE PARALLEL SAFE STRICT COST 1000
AS $function$
  my $thing=$_[0];
  foreach (1..1_000_000) {
$thing = sqrt($thing);
$thing *= $thing;
  };
  return ($thing+0);
$function$;

The partial path is getting added to the list of paths, it is just not
getting chosen, even if parallel_*_cost are set to zero.  Why not?

If I do an aggregate, then it does use parallel workers:

explain select sum(slow(abalance)) from pgbench_accounts;

It doesn't use as many as I would like, because there is a limit based on
the logarithm of the table size (I'm using -s 10 and get 3 parallel
processes) , but at least I know how to start looking into that.

Also, how do you debug stuff like this?  Are there some gdb tricks to make
this easier to introspect into the plans?

Cheers,

Jeff

Re: [HACKERS] pgsql 10: hash indexes testing

2017-07-04 Thread Jeff Janes

On Tue, Jul 4, 2017 at 3:57 AM, AP  wrote:

>
> The data being indexed is BYTEA, (quasi)random and 64 bytes in size.
> The table has over 2 billion entries. The data is not unique. There's
> an average of 10 duplicates for every unique value.
>

What is the number of duplicates for the most common value?

Cheers,

Jeff

Re: [HACKERS] subscription worker signalling wal writer too much

2017-07-03 Thread Jeff Janes

On Sat, Jun 24, 2017 at 5:09 PM, Andres Freund  wrote:

> Hi,
>
> On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
>
> > It looks like only limited consolidation was happening, the number of
> kills
> > invoked was more than half of the number of SetLatch.  I think the reason
> > is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately
> > suspends the calling process and gives the signalled process a chance to
> > run in that time slice.  The Wal Writer gets woken up, sees that it only
> > has a partial page to write and decides not to do anything, but resets
> the
> > latch.  It does this fast enough that the subscription worker hasn't
> > migrated CPUs yet.
>
> I think part of the problem here is that latches signal the owning
> process even if the owning process isn't actually sleeping - that's
> obviously quite pointless.  In cases where walwriter is busy, that
> actually causes a lot of superflous interrupted syscalls, because it
> actually never ends up waiting on the latch. There's also a lot of
> superflous context switches.  I think we can avoid doing so quite
> easily, as e.g. with the attached patch.  Could you check how much that
> helps your benchmark?
>

That patch doesn't make any improvement to the pgbench --unlogged-tables
benchmark.  I expected that, because this benchmark exposes the opposite
problem.  The wal writer isn't busy, it is constantly being woken up but
then immediately going back to sleep.



> > The first change made the WALWriter wake up and do a write and flush
> > whenever an async commit occurred and there was a completed WAL page to
> be
> > written.  This way the hint bits could be set while the heap page was
> still
> > hot, because they couldn't get set until the WAL covering the hinted-at
> > transaction commit is flushed.
> >
> > The second change said we can set hint bits before the WAL covering the
> > hinted-at transaction is flushed, as long the page LSN is newer than that
> > (so the wal covering the hinted-at transaction commit must be flushed
> > before the page containing the hint bit can be written).
> >
> > Then the third change makes the wal writer only write the full pages most
> > of the times it is woken up, not flush them.  It only flushes them once
> > every wal_writer_delay or wal_writer_flush_after, regardless of how often
> > it is woken up.
> >
> > It seems like the third change rendered the first one useless.
>
> I don't think so. Isn't the walwriter writing out the contents of the
> WAL is quite important because it makes room in wal_buffers for new WAL?
>

If we wanted to do that, I think the asynchronous committer should just
issue the write directly (unless wal_sync_method is open_datasync
or open_sync).  Writes are usually extremely fast and it not worth trying
to offload them to a background process, as your half of the signalling
takes longer than just doing the write yourself.  When that is not true and
writes start blocking due to extreme io congestion, them everything is
going to start blocking anyway (wal_buffers head will run into the tail,
and the tail can't advance because the writes are blocking) so while
offloading writes to a background process is then good in theory, it isn't
very effective.  If everyone is stuck, it doesn't matter which process is
directly stuck and which ones are indirectly stuck.


>
> I suspect we actually should wake up wal-writer *more* aggressively, to
> offload wal fsyncs from individual backends, even when they're not
> committing.  Right now we fsync whenever a segment is finished - we
> really don't want to do that in backends that could do other useful
> work.  I suspect it'd be a good idea to remove that logic from
> XLogWrite() and replace it with
> if (ProcGlobal->walwriterLatch)
> SetLatch(ProcGlobal->walwriterLatch);
>
>
My understanding is that you can't start on a new file until the old file
is completely synced, because the book keeping can currently only handle
one file at a time.  So if you signal the wal writer to do the sync, you
would just end up immediately blocking on it to finish.  Getting around
that would require substantially more changes; we would need to implement a
queue of full log files not yet synced so that we could move on without
waiting.  If we had such a queue, then sure, this would be a good idea to
just wake the wal writer. But currently, it only matters for large
transactions (\copy, insert ... select, bulk updates).  With small
transactions, you can't fill up log files quickly enough for it to make
much of a difference.  So I think this is something to do, but it is
independent from what

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-29 Thread Jeff Janes

On Thu, Jun 29, 2017 at 11:39 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > In the now-committed version of this, the 'pg_ctl start' returns
> > successfully as soon as the server reaches a consistent state. Which is
> OK,
> > except that it does the same thing when hot_standby=off.  When
> > hot_standby=off, I would expect it to wait for the end of recovery before
> > exiting with a success code.
>
> Um, won't it be waiting forever with that definition?
>
> regards, tom lane
>

No, this isn't streaming.  It hits the PITR limit (recovery_target_*), or
runs out of archived wal, and then it opens for business.


Cheers,

Jeff

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-29 Thread Jeff Janes

On Tue, Jun 27, 2017 at 11:59 AM, Tom Lane  wrote:

> I wrote:
> > Andres Freund  writes:
> >> On 2017-06-26 17:38:03 -0400, Tom Lane wrote:
> >>> Hm.  Take that a bit further, and we could drop the connection probes
> >>> altogether --- just put the whole responsibility on the postmaster to
> >>> show in the pidfile whether it's ready for connections or not.
>
> >> Yea, that seems quite appealing, both from an architectural, simplicity,
> >> and log noise perspective. I wonder if there's some added reliability by
> >> the connection probe though? Essentially wondering if it'd be worthwhile
> >> to keep a single connection test at the end. I'm somewhat disinclined
> >> though.
>
> > I agree --- part of the appeal of this idea is that there could be a net
> > subtraction of code from pg_ctl.  (I think it wouldn't have to link libpq
> > anymore at all, though maybe I forgot something.)  And you get rid of a
> > bunch of can't-connect failure modes, eg kernel packet filter in the way,
> > which probably outweighs any hypothetical reliability gain from
> confirming
> > the postmaster state the old way.
>
> Here's a draft patch for that.  I quite like the results --- this seems
> way simpler and more reliable than what pg_ctl has done up to now.
> However, it's certainly arguable that this is too much change for an
> optional post-beta patch.


In the now-committed version of this, the 'pg_ctl start' returns
successfully as soon as the server reaches a consistent state. Which is OK,
except that it does the same thing when hot_standby=off.  When
hot_standby=off, I would expect it to wait for the end of recovery before
exiting with a success code.

Cheers,

Jeff

Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2017-06-26 Thread Jeff Janes

On Tue, Aug 23, 2016 at 5:28 AM, Tom Lane  wrote:

> Tomas Vondra  writes:
> > On 08/22/2016 07:42 PM, Alvaro Herrera wrote:
> >> Also, if we patch it this way and somebody has a slow query because of a
> >> lot of duplicate values, it's easy to solve the problem by
> >> de-duplicating.  But with the current code, people that have the
> >> opposite problem has no way to work around it.
>
> > I certainly agree it's better when a smart user can fix his query plan
> > by deduplicating the values than when we end up generating a poor plan
> > due to assuming some users are somewhat dumb.
>
> Well, that seems to be the majority opinion, so attached is a patch
> to do it.  Do we want to sneak this into 9.6, or wait?
>
> > I wonder how expensive would it be to actually count the number of
> > distinct values - there certainly are complex data types where the
> > comparisons are fairly expensive, but I would not expect those to be
> > used in explicit VALUES lists.
>
> You'd have to sort the values before de-duping, and deal with VALUES
> expressions that aren't simple literals.  And you'd have to do it a
> lot --- by my count, get_variable_numdistinct is invoked six times
> on the Var representing the VALUES output in Jeff's example query.
> Maybe you could cache the results somewhere, but that adds even
> more complication.  Given the lack of prior complaints about this,
> it's hard to believe it's worth the trouble.
>
>
This patch still applies, and I think the argument for it is still valid.
So I'm going to make a commit-fest entry for it.  Is there additional
evidence we should gather?

Cheers,

Jeff

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Jeff Janes

On Mon, Jun 26, 2017 at 12:15 PM, Tom Lane  wrote:

> Michael Paquier  writes:
> > On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane  wrote:
> >> The attached proposed patch adjusts pg_ctl to check every 100msec,
> >> instead of every second, for the postmaster to be done starting or
> >> stopping.
>
> >> +#define WAITS_PER_SEC  10  /* should divide 100 evenly */
>
> > As a matter of style, you could define 100 as well in a variable
> > and refer to the variable for the division.
>
> Good idea, done that way.  (My initial thought was to use USECS_PER_SEC
> from timestamp.h, but that's declared as int64 which would have
> complicated matters, so I just made a new symbol.)
>
> > This also pops up more easily failures with 001_stream_rep.pl without
> > a patch applied from the other recent thread, so this patch had better
> > not get in before anything from
> > https://www.postgresql.org/message-id/8962.1498425...@sss.pgh.pa.us.
>
> Check.  I pushed your fix for that first.
>
> Thanks for the review!
>
> regards, tom lane
>

The 10 fold increase in log spam during long PITR recoveries is a bit
unfortunate.

9153  2017-06-26 12:55:40.243 PDT FATAL:  the database system is starting up
9154  2017-06-26 12:55:40.345 PDT FATAL:  the database system is starting up
9156  2017-06-26 12:55:40.447 PDT FATAL:  the database system is starting up
9157  2017-06-26 12:55:40.550 PDT FATAL:  the database system is starting up
...

I can live with it, but could we use an escalating wait time so it slows
back down to once a second after a while?

Cheers,

Jeff

[HACKERS] CREATE SUBSCRIPTION log noise

2017-06-20 Thread Jeff Janes

I think this should go away:

ereport(NOTICE,
  (errmsg("created replication slot \"%s\" on
publisher",
  slotname)));


It doesn't appear to be contingent on anything other than the content of
the command you just gave it.  I don't think we need a NOTICE saying that
it did that thing I just told it to do--that should be implicit by the lack
of an ERROR.

Cheers,

Jeff

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-20 Thread Jeff Janes

On Thu, Jun 15, 2017 at 3:06 PM, Jeff Janes  wrote:

>
> This new patch is simpler than the previous one, and more effective at
> speeding up replication. I assume it would speed up pgbench with
> synchronous_commit turned off (or against unlogged tables) as well, but I
> don't think I have the hardware needed to test that.
>

If I use the 'tpcb-func' script embodied in the attached patch to pgbench,
then I can see the performance difference against unlogged tables using 8
clients on a 8 CPU virtual machine. The normal tpcb-like script has too
much communication overhead, bouncing from pgbench to the postgres backend
7 times per transaction, to see the difference. I also had to
make autovacuum_vacuum_cost_delay=0, otherwise auto analyze holds a
snapshot long enough to bloat the HOT chains which injects a great deal of
variability into the timings.

Commit 7975c5e0a992ae9 in the 9.6 branch causes a regression of about 10%,
and the my patch from the previous email redeems that regression.  It also
gives the same improvement against 10dev HEAD.

Cheers,

Jeff

pgbench_function.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CREATE SUBSCRIPTION documentation

2017-06-19 Thread Jeff Janes

https://www.postgresql.org/docs/devel/static/sql-createsubscription.html

Has the note:

See Section 26.2.5.1

for
details on how to configure access control between the subscription and the
publication instance.

But that section seems to describe authentication for physical, not
logical, replication.  Those two no longer use the same access control
methods.  For logical replication, the role has to have replication
attribute or be a superuser, but the role does not need to have replication
listed in the pg_hba.conf.

I think it would be better to instead reference:

https://www.postgresql.org/docs/devel/static/logical-replication-security.html

And on that page, it says:

"The role used for the replication connection must have the REPLICATION
 attribute"

Should also say something like "or be a superuser".  It is bit tedious to
say that everywhere, but the docs are supposed to be a bit tedious.

Cheers,

Jeff

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-15 Thread Jeff Janes

On Wed, Jun 14, 2017 at 4:29 PM, Andres Freund  wrote:

> On 2017-06-14 16:24:27 -0700, Jeff Janes wrote:
> > On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund 
> wrote:
> >
> > > On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > > > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes 
> > > wrote:
> > > >
> > > > > If I publish a pgbench workload and subscribe to it, the
> subscription
> > > > > worker is signalling the wal writer thousands of times a second,
> once
> > > for
> > > > > every async commit.  This has a noticeable performance cost.
> > > > >
> > > >
> > > > I've used a local variable to avoid waking up the wal writer more
> than
> > > once
> > > > for the same page boundary.  This reduces the number of wake-ups by
> about
> > > > 7/8.
> > >
> > > Maybe I'm missing something here, but isn't that going to reduce our
> > > guarantees about when asynchronously committed xacts are flushed out?
> > > You can easily fit a number of commits into the same page...   As this
> > > isn't specific to logical-rep, I don't think that's ok.
> > >
> >
> > The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't
> think
> > this changes that. (Also, it isn't really a guarantee, the fsync can take
> > many seconds to complete once we do initiate it, and there is absolutely
> > nothing we can do about that, other than do the fsync synchronously in
> the
> > first place).
>
> Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
> this afaics would allow wal writer to go into sleep mode with half a
> page filled, and it'd not be woken up again until the page is filled.
> No?
>

If it is taking the big sleep, then we always wake it up, since
acd4c7d58baf09f.

I didn't change that part.  I only changed what we do if it not hibernating.

> > > Have you chased down why there's that many wakeups?  Normally I'd have
> > > expected that a number of the SetLatch() calls get consolidated
> > > together, but I guess walwriter is "too quick" in waking up and
> > > resetting the latch?
>
> > I'll have to dig into that some more.  The 7/8 reduction I cited was just
> > in calls to SetLatch from that part of the code, I didn't measure whether
> > the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
> > determined that reduction, so it wasn't truly wake ups I measured.
> Actual
> > wake ups were measured only indirectly via the impact on performance.
> I'll
> > need to figure out how to instrument that without distorting the
> > performance too much in the process..
>
> I'd suspect that just measuring the number of kill() calls should be
> doable, if measured via perf or something like hta.t
>

It looks like only limited consolidation was happening, the number of kills
invoked was more than half of the number of SetLatch.  I think the reason
is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately
suspends the calling process and gives the signalled process a chance to
run in that time slice.  The Wal Writer gets woken up, sees that it only
has a partial page to write and decides not to do anything, but resets the
latch.  It does this fast enough that the subscription worker hasn't
migrated CPUs yet.

But, if the WAL Writer sees it has already flushed up to the highest
eligible page boundary, why doesn't the SetAsync code see the same thing?
Because it isn't flushed that high, but only written that high.  It looks
like a story of three git commits:

commit 4de82f7d7c50a81ec8e70e2cb0ab413ab9134c0b
Author: Simon Riggs 
Date:   Sun Nov 13 09:00:57 2011 +

Wakeup WALWriter as needed for asynchronous commit performance.

commit db76b1efbbab2441428a9ef21f7ac9ba43c52482
Author: Andres Freund 
Date:   Mon Feb 15 22:15:35 2016 +0100

Allow SetHintBits() to succeed if the buffer's LSN is new enough.

commit 7975c5e0a992ae9a45e03d145e0d37e2b5a707f5
Author: Andres Freund 
Date:   Mon Feb 15 23:52:38 2016 +0100

Allow the WAL writer to flush WAL at a reduced rate.

The first change made the WALWriter wake up and do a write and flush
whenever an async commit occurred and there was a completed WAL page to be
written.  This way the hint bits could be set while the heap page was still
hot, because they couldn't get set until the WAL covering the hinted-at
transaction commit is flushed.

The second change said we can set hint bits before the WAL covering the
hinted-at transaction is flushed, as long the page LSN is newer than that
(so the wal c

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes

On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund  wrote:

> On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes 
> wrote:
> >
> > > If I publish a pgbench workload and subscribe to it, the subscription
> > > worker is signalling the wal writer thousands of times a second, once
> for
> > > every async commit.  This has a noticeable performance cost.
> > >
> >
> > I've used a local variable to avoid waking up the wal writer more than
> once
> > for the same page boundary.  This reduces the number of wake-ups by about
> > 7/8.
>
> Maybe I'm missing something here, but isn't that going to reduce our
> guarantees about when asynchronously committed xacts are flushed out?
> You can easily fit a number of commits into the same page...   As this
> isn't specific to logical-rep, I don't think that's ok.
>

The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
this changes that. (Also, it isn't really a guarantee, the fsync can take
many seconds to complete once we do initiate it, and there is absolutely
nothing we can do about that, other than do the fsync synchronously in the
first place).

The reason for kicking the wal writer at page boundaries is so that hint
bits can get set earlier than they otherwise could. But I don't think
kicking it multiple times per page boundary can help in that effort.

>
> Have you chased down why there's that many wakeups?  Normally I'd have
> expected that a number of the SetLatch() calls get consolidated
> together, but I guess walwriter is "too quick" in waking up and
> resetting the latch?
>

I'll have to dig into that some more.  The 7/8 reduction I cited was just
in calls to SetLatch from that part of the code, I didn't measure whether
the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
determined that reduction, so it wasn't truly wake ups I measured.  Actual
wake ups were measured only indirectly via the impact on performance.  I'll
need to figure out how to instrument that without distorting the
performance too much in the process..

Cheers,

Jeff

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes

On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes  wrote:

> If I publish a pgbench workload and subscribe to it, the subscription
> worker is signalling the wal writer thousands of times a second, once for
> every async commit.  This has a noticeable performance cost.
>

I've used a local variable to avoid waking up the wal writer more than once
for the same page boundary.  This reduces the number of wake-ups by about
7/8.

I'm testing it by doing 1e6 transactions over 8 clients while replication
is in effect, then waiting for the logical replica to catch up.  This cycle
takes 183.1 seconds in HEAD, and 162.4 seconds with the attached patch.
N=14, p-value for difference of the means 6e-17.

If I suppress all wake-ups just to see what would happen, it further
reduces the runtime to 153.7.

Cheers,

Jeff

wake_wal_writer_less_aggressively.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes

If I publish a pgbench workload and subscribe to it, the subscription
worker is signalling the wal writer thousands of times a second, once for
every async commit.  This has a noticeable performance cost.

I don't think it is ever necessary to signal the wal writer here, unless
wal writer is taking the long sleep for power saving purposes.  If up
to wal_writer_delay of "committed" transactions get lost on a crash, it
doesn't really matter because they will be replayed again once replication
resumes. However, might delaying the advance of the hot_standby_feedback by
up to the amount of wal_writer_delay be a problem?  I would think any
set-up which depends on the standby never being a few dozen milliseconds
behind is already doomed.

If I want to suppress signalling, what would be the right way to
communicate to XLogSetAsyncXactLSN that it is being called in a
subscription worker?

Another approach would be to make XLogSetAsyncXactLSN signalling less
aggressive for everyone, not just subscription workers.  There is no point
in signalling it more than once for a given WriteRqstPtr page boundary.  So
each backend could locally remember the last boundary for which it
signalled wal writer, and not signal again for the same boundary.  This
would be especially effective for a subscription worker, as it should be
pretty common for almost all the async commits to be coming from the same
process.  Or, the last boundary could be kept in shared memory (XLogCtl) so
all backends can share it, at the expense of additional work needed to be
done under a spin lock.

Other ideas?

Thanks,

Jeff

Re: [HACKERS] logical replication: \dRp+ and "for all tables"

2017-06-12 Thread Jeff Janes

On Sat, Jun 10, 2017 at 7:42 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Fri, Jun 9, 2017 at 10:20 PM, Masahiko Sawada 
> > wrote:
> >> On Sat, Jun 10, 2017 at 7:29 AM, Jeff Janes 
> wrote:
> >>> That seems unfortunate.  Should the "for all tables" be included as
> >>> another column in \dRp and \dRp+, or at least as a footnote tag in
> \dRp+ ?
>
> >> +1. I was thinking the same. Attached patch adds "All Tables" column
> >> to both \dRp and \dRp+.
>
> > Looks good to me.  Attached with regression test expected output
> changes.
>
> This patch confuses me.  In the first place, I don't see the argument for
> adding the "all tables" property to \dRp output; it seems out of place
> there.

Why?  It is an important property of the publication which is single-valued
and
and easy to represent concisely.  What makes it out of place?

> In the second place, this really fails to respond to what I'd call
> the main usability problem with \dRp+, which is that the all-tables
> property is likely to lead to an unreadably bulky list of affected tables.
> What I'd say the patch ought to do is *replace* \dRp+'s list of affected
> tables with a notation like "(all tables)" when puballtables is true.
>

I'd considered that, but I find the pager does a fine job of dealing with
the bulkiness of the list. I thought it might be a good idea to not only
point out that it is all tables, but also remind people of exactly what
tables those are currently (in case it had slipped their mind that all
tables will include table from other schemas not in their search_path, for
example)

Cheers,

Jef

Re: [HACKERS] Why restore_command is called for existing files in pg_xlog?

2017-06-12 Thread Jeff Janes

On Mon, Jun 12, 2017 at 5:25 AM, Alex Kliukin  wrote:

> On Fri, Jun 2, 2017, at 11:51 AM, Alexander Kukushkin wrote:
>
> Hello hackers,
> There is one strange and awful thing I don't understand about
> restore_command: it is always being called for every single WAL segment
> postgres wants to apply (even if such segment already exists in pg_xlog)
> until replica start streaming from the master.
>
>
> The real problem this question is related to is being unable to bring a
> former master, demoted after a crash, online, since the WAL segments
> required to get it to the consistent state were not archived while it was
> still a master, and local segments in pg_xlog are ignored when a
> restore_command is defined. The other replicas wouldn't be good candidates
> for promotion as well, as they were way behind the master (because the last
> N WAL segments were not archived and streaming replication had a few
> seconds delay).
>

I don't really understand the problem.  If the other replicas are not
candidates for promotion, than why was the master ever "demoted" in the
first place?  It should just go through normal crash recovery, not PITR
recovery, and therefore will read the files from pg_xlog just fine.

If you already promoted one of the replicas and accepted data changes into
it, and now are thinking that was not a good idea, then there is no off the
shelf automatic way to merge together the two systems.  You have do a
manual inspection of the differences.  To do that, you would start by
starting up (a copy of) the crashed master, using normal crash recovery,
not PITR.

>
> Is this a correct list for such questions, or would it be more appropriate
> to ask elsewhere (i.e. pgsql-bugs?)
>

Probably more appropriate for pgsql-general or pgsql-admin.

Cheers,

Jeff

Re: [HACKERS] logical replication: \dRp+ and "for all tables"

2017-06-09 Thread Jeff Janes

On Fri, Jun 9, 2017 at 10:20 PM, Masahiko Sawada 
wrote:

> On Sat, Jun 10, 2017 at 7:29 AM, Jeff Janes  wrote:
> > If I create a publication "for all tables", \dRp+ doesn't indicate it is
> for
> > all tables, it just gives a list of the tables.
> >
> > So it doesn't distinguish between a publication specified to be for all
> > tables (which will be dynamic regarding future additions), and one which
> > just happens to include all the table which currently exist.
> >
> > That seems unfortunate.  Should the "for all tables" be included as
> another
> > column in \dRp and \dRp+, or at least as a footnote tag in \dRp+ ?
> >
>
> +1. I was thinking the same. Attached patch adds "All Tables" column
> to both \dRp and \dRp+.
>
>
Looks good to me.  Attached with regression test expected output  changes.

Cheers,

Jeff


psql_publication_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical replication: \dRp+ and "for all tables"

2017-06-09 Thread Jeff Janes

If I create a publication "for all tables", \dRp+ doesn't indicate it is
for all tables, it just gives a list of the tables.

So it doesn't distinguish between a publication specified to be for all
tables (which will be dynamic regarding future additions), and one which
just happens to include all the table which currently exist.

That seems unfortunate.  Should the "for all tables" be included as another
column in \dRp and \dRp+, or at least as a footnote tag in \dRp+ ?

Cheers,

Jeff

[HACKERS] logical replication NOTICE "synchronized table states"

2017-06-09 Thread Jeff Janes

When creating a subscription, I get a NOTICE "synchronized table states"

What is this supposed to be telling the end user?  Is this just a debugging
message not intended to be included in the released code?

When I first saw this NOTICE, I thought, based on the wording, that it was
telling me the data had finished copying over (i.e. the initial state of
the table data had been synced).  But at the time this notice is issued,
the copy has not even started.  In fact, if I create a subscription for a
non-existent publication, I still get this message.

Cheers,

Jeff

Re: [HACKERS] List of hostaddrs not supported

2017-06-09 Thread Jeff Janes

On Fri, Jun 9, 2017 at 11:52 AM, Heikki Linnakangas  wrote:

> On 06/09/2017 05:47 PM, Jeff Janes wrote:
>
>> Your commit to fix this part, 76b11e8a43eca4612d, is giving me compiler
>> warnings:
>>
>> fe-connect.c: In function 'connectDBStart':
>> fe-connect.c:1625: warning: 'ret' may be used uninitialized in this
>> function
>>
>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)
>>
>
> Oh. Apparently that version of gcc doesn't take it for granted that the
> switch-statement covers all the possible cases. I've added a dummy
> initialization, to silence it. Thanks, and let me know if it didn't help.


It worked.  Thanks.

Jeff

Re: [HACKERS] partial aggregation with internal state type

2017-06-09 Thread Jeff Janes

On Fri, Jun 9, 2017 at 9:06 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > The docs for creating aggregates for 9.6 and beyond say:
> > "For aggregate functions whose state_data_type is internal, the
> combinefunc
> > must not be strict. In this case the combinefunc must ensure that null
> > states are handled correctly and that the state being returned is
> properly
> > stored in the aggregate memory context."
>
> > Since combinefunc with an internal type is only useful when serialfunc
> and
> > deserialfunc are also defined, why can't the built-in machinery just do
> the
> > right thing when faced with a strict combinefunc?
>
> The issue is how to initialize the state value to begin with.
>

Why does it need to be initialized?  initializing a NULL state upon first
use is already the job of sfunc.  Can't it just be left NULL if both inputs
are NULL?  (and use serialize/deserialize to change the memory context of
the not-NULL argument if one is NULL and one is not NULL)

Cheers,

Jeff

[HACKERS] partial aggregation with internal state type

2017-06-09 Thread Jeff Janes

The docs for creating aggregates for 9.6 and beyond say:

"For aggregate functions whose state_data_type is internal, the combinefunc
must not be strict. In this case the combinefunc must ensure that null
states are handled correctly and that the state being returned is properly
stored in the aggregate memory context."

Since combinefunc with an internal type is only useful when serialfunc and
deserialfunc are also defined, why can't the built-in machinery just do the
right thing when faced with a strict combinefunc?

Cheers,

Jeff

Re: [HACKERS] List of hostaddrs not supported

2017-06-09 Thread Jeff Janes

On Thu, Jun 8, 2017 at 1:36 AM, Heikki Linnakangas  wrote:

> While testing libpq and GSS the other day, I was surprised by the behavior
> of the host and hostaddr libpq options, if you specify a list of hostnames.
>
> I did this this, and it took me quite a while to figure out what was going
> on:
>
> $ psql "dbname=postgres hostaddr=::1  host=localhost,localhost
>> user=krbtestuser" -c "SELECT 'hello'"
>> psql: GSSAPI continuation error: Unspecified GSS failure.  Minor code may
>> provide more information
>> GSSAPI continuation error: Server postgres/localhost,localhost@PG.EXAMPLE
>> not found in Kerberos database
>>
>
> That was a pilot error; I specified a list of hostnames, but only one
> hostaddr. But I would've expected to get a more helpful error, pointing
> that out.
>
> Some thoughts on this:
>
> 1. You cannot actually specify a list of hostaddrs. Trying to do so gives
> error:
>
> psql: could not translate host name "::1,::1" to address: Name or service
>> not known
>>
>
> That error message is a bit inaccurate, in that it wasn't really a "host
> name" that it tried to translate, but a raw address in string format.
> That's even more confusing if you make the mistake that you specify
> "hostaddr=localhost":
>


Your commit to fix this part, 76b11e8a43eca4612d, is giving me compiler
warnings:

fe-connect.c: In function 'connectDBStart':
fe-connect.c:1625: warning: 'ret' may be used uninitialized in this function

gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)


 Cheers,

Jeff

[HACKERS] postgres_fdw cost estimation defaults and documentation

2017-06-05 Thread Jeff Janes

The default value for fdw_tuple_cost is 0.01, which seems way too low.  If
I set up a loop-back foreign server with a large fetch_size, then tests
like:

select * from pgbench_accounts except select * from
loopback.pgbench_accounts

vs

select * from pgbench_accounts except select * from pgbench_accounts

indicate that 0.1 is about the lowest value for fdw_tuple_cost that could
make sense, and a reasonable default would probably be 0.25.  Yes, it is
only a default, but the default should make sense for at least some
situation, and I can't imagine any situation in which 0.01 makes sense.

In the documentation for fdw_startup_cost, it says "This represents the
additional overhead of establishing a connection, parsing and planning the
query on the remote side, etc.".  I think that "establishing a connection"
should be stricken. Either you need a connection or you don't, there is
nothing the optimizer can do about this.  And if do need one, you only
establish one once (at most), not once per query sent to the remote side.
I think the implementation correctly doesn't try to account for the
overhead of establishing a connection, so the docs should remove that claim.

In regards to use_remote_estimate, the documentation says "Running ANALYZE
on the foreign table is the way to update the local statistics; this will
perform a scan of the remote table and then calculate and store statistics
just as though the table were local. Keeping local statistics can be a
useful way to reduce per-query planning overhead for a remote table — but
if the remote table is frequently updated, the local statistics will soon
be obsolete."  This makes it send like local stats is basically equivalent
to use_remote_estimate, other than the staleness issue.  But they are far
from equivalent.  use_remote_estimate has implicit knowledge of the indexes
on the foreign server (implicit via the reduced cost estimates derived from
the foreign side for parameterized queries), whereas local stats of foreign
tables just assumes there are no indexes for planning purposes. Perhaps
adding something like "Also, local statistics do not contain information on
the available indexes on the remote side, while use_remote_estimate does
take these into account"?

Cheers,

Jeff

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-03 Thread Jeff Janes

On Fri, Jun 2, 2017 at 4:10 PM, Petr Jelinek 
wrote:

>
> While I was testing something for different thread I noticed that I
> manage transactions incorrectly in this patch. Fixed here, I didn't test
> it much yet (it takes a while as you know :) ). Not sure if it's related
> to the issue you've seen though.
>

This conflicts again with Peter E's recent commit
3c9bc2157a4f465b3c070d1250597568d2dc285f

Thanks,

Jeff

Re: [HACKERS] psql: Activate pager only for height, not width

2017-05-29 Thread Jeff Janes

On Sun, May 28, 2017 at 10:09 PM, Brendan Jurd  wrote:

> Hello hackers,
>
> I am often frustrated by the default behaviour of the psql pager, which
> will activate a pager if the output is deemed to be "too wide" for the
> terminal, regardless of the number of lines output, and of the
> pager_min_lines setting.
>
> This behaviour is sometimes desirable, but in my use patterns it is more
> often the case that I want the pager to activate for output longer than
> terminal height, whereas for output a little wider than the terminal, I am
> happy for there to be some wrapping.  This is especially the case with "\d"
> output for tables, where, at 80 columns, very often the only wrapping is in
> the table borders and constraint/trigger definitions.
>
> Usually I turn the pager off completely, and only switch it on when I am
> about to execute something that will return many rows, but what I'd really
> like is some way to tell psql to activate the pager as normal for height,
> but to ignore width.  My first thought was an alternate mode to \pset pager
> -- to {'on' | 'off' | 'always'} we could add 'height'.
>
> Another option is to add the ability to specify the number of columns
> which psql considers "too wide", analogous to pager_min_lines.  I could
> then set pager_min_cols to something around 150 which would work nicely for
> my situation.
>
> I don't have strong opinions about how the options are constructed, as
> long as it is possible to obtain the behaviour.
>
> I would be happy to produce a patch, if this seems like an acceptable
> feature add.
>

I'd like a feature like this.  I often run into the problem where one or
two lines of an EXPLAIN plan are wide enough to wrap, which causes the
pager to kick in.  Then when I exit the pager, it clears the contents so I
can't see that plan vertically adjacent to the one I'm trying to compare it
to.  I'd rather have the pager kick in if greater than some settable number
of the lines are too wide, or if the wrapped lines would push the height
above the height limit.  If just one line is too wide, I'd rather just deal
with them being wrapped.

(You can configure the pager not to redraw the screen when exited, but I
want it to redraw the screen after looking at 10,000 rows, just not after
looking ten rows, one of which was 170 characters wide)

Cheers,

Jeff

Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-29 Thread Jeff Janes

On Sat, May 27, 2017 at 6:48 AM, Petr Jelinek 
wrote:

>
>
> Actually, I guess it's the pid 47457 (COPY process) who is actually
> running the xid 73322726. In that case that's the same thing Masahiko
> Sawada reported [1].

Related, but not the same.  It would be nice if they didn't block, but if
they do have to block, shouldn't it wait on a semaphore, rather than doing
a tight loop?  It looks like maybe a latch didn't get reset when it should
have or something.

[1]
> https://www.postgresql.org/message-id/flat/CAD21AoC2KJdavS7MFffmSsRc1dn3V
> g_0xmuc%3DUpBrZ-_MUxh-Q%40mail.gmail.com

Cheers,

Jeff

Re: [HACKERS] logical replication - still unstable after all these months

2017-05-28 Thread Jeff Janes

On Sun, May 28, 2017 at 3:17 PM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> On 28/05/17 19:01, Mark Kirkwood wrote:
>
>
>> So running in cloud land now...so for no errors - will update.
>>
>>
>>
>>
> The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3
> failed test runs (all scale 25 and 8 pgbench clients). Given the way the
> test decides on failure (gets tired of waiting for the table md5's to
> match) - it begs the question 'What if it had waited a bit longer'? However
> from what I can see in all cases:
>
> - the rowcounts were the same in master and replica
> - the md5 of pgbench_accounts was different
>

All four tables should be wrong if there is still a transaction it is
waiting for, as all the changes happen in a single transaction.

I also got a failure, after 87 iterations of a similar test case.  It
waited for hours, as mine requires manual intervention to stop waiting.  On
the subscriber, one account still had a zero balance, while the history
table on the subscriber agreed with both history and accounts on the
publisher and the account should not have been zero, so definitely a
transaction atomicity got busted.

I altered the script to also save the tellers and branches tables and
repeated the runs, but so far it hasn't failed again in over 800 iterations
using the altered script.

>
> ...so does seem possible that there is some bug being tickled here.
> Unfortunately the test framework blasts away the failed tables and
> subscription and continues on...I'm going to amend it to stop on failure so
> I can have a closer look at what happened.
>

What would you want to look at?  Would saving the WAL from the master be
helpful?

Cheers,

Jeff

[HACKERS] execGrouping.c limit on work_mem

2017-05-27 Thread Jeff Janes

In BuildTupleHashTable


/* Limit initial table size request to not more than work_mem */
nbuckets = Min(nbuckets, (long) ((work_mem * 1024L) / entrysize));


Is this a good idea?  If the caller of this code has no respect for
work_mem, they are still going to blow it out of the water.  Now we will
just do a bunch of hash-table splitting in the process.  That is only going
to add to the pain.

Also:

* false if it existed already.  ->additional_data in the new entry has

The field is just ->additional, not ->additional_data

Cheers,

Jeff

[HACKERS] simplehash.h typo

2017-05-27 Thread Jeff Janes

/* round up size to the next power of 2, that's the bucketing works  */


That should probably be "that's the **way** bucketing works".  Or maybe it
is an idiom I don't grok.

Cheers,

Jeff

[HACKERS] logical replication busy-waiting on a lock

2017-05-26 Thread Jeff Janes

When I create a subscription in the disabled state, and then later doing
"alter subscription sub enable;", on the master I sometimes get a tight
loop of the deadlock detector:

(log_lock_waits is on, of course)

deadlock_timeout is set to 1s, so I don't know why it seems to be running
several times per millisecond.

47462 idle in transaction 2017-05-26 16:05:20.505 PDT LOG:  logical
decoding found initial starting point at 1B/7BAC9D50
47462 idle in transaction 2017-05-26 16:05:20.505 PDT DETAIL:  Waiting for
transactions (approximately 9) older than 73326615 to end.
47462 idle in transaction waiting 2017-05-26 16:05:21.505 PDT LOG:  process
47462 still waiting for ShareLock on transaction 73322726 after 1000.060 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.505 PDT DETAIL:
 Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG:  process
47462 still waiting for ShareLock on transaction 73322726 after 1000.398 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
 Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG:  process
47462 still waiting for ShareLock on transaction 73322726 after 1000.574 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
 Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG:  process
47462 still waiting for ShareLock on transaction 73322726 after 1000.816 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
 Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG:  process
47462 still waiting for ShareLock on transaction 73322726 after 1001.180 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
 Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT LOG:  process
47462 still waiting for ShareLock on transaction 73322726 after 1001.284 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT DETAIL:
 Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT LOG:  process
47462 still waiting for ShareLock on transaction 73322726 after 1001.493 ms
.

And so on out to "after 9616.814", when it finally acquires the lock.

The other process, 47457, is doing the initial COPY of another table as
part of the same publisher/subscriber set.

Cheers,

Jeff

Re: [HACKERS] logical replication - still unstable after all these months

2017-05-26 Thread Jeff Janes

On Fri, May 26, 2017 at 12:27 AM, Erik Rijkers  wrote:

> On 2017-05-26 08:58, Simon Riggs wrote:
>
>> On 26 May 2017 at 07:10, Erik Rijkers  wrote:
>>
>> - Do you agree this number of failures is far too high?
>>> - Am I the only one finding so many failures?
>>>
>>
>> What type of failure are you getting?
>>
>
> The failure is that in the result state the replicated tables differ from
> the original tables.
>

But what is the actual failure?  Which tables differ?  Do they have the
same number of rows? Do they only differ in the *balance column or
something else?  Are they transactionally consistent?

I have not been able to replicate the problem.

Cheers,

Jeff

Re: [HACKERS] logical replication - still unstable after all these months

2017-05-26 Thread Jeff Janes

On Fri, May 26, 2017 at 5:17 AM, tushar 
wrote:

>
> run second time =
> ./pgbench -T 20   -c 90 -j 90  -f test.sql  postgres
>
> check the row count on master/standby
> Master=
> postgres=# select count(*) from pgbench_history ;
>  count
> 
>  536836
> (1 row)
>
> Standby =
>
> postgres=#  select count(*) from pgbench_history ;
>   count
> -
>  1090959
> (1 row)

Hi Tushar,

pgbench starts out by truncating pgbench_history.  That truncation does not
get replicated to the subscriber.  The later inserts do.  So your
subscriber ends up with about twice as many rows.

Cheers,

Jeff

Re: [HACKERS] CREATE STATISTICS statistic_type documentation

2017-05-25 Thread Jeff Janes

On Thu, May 25, 2017 at 9:28 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > The docs for CREATE STATISTICS does not say what happens if the
> > statistic_type clause is omitted.  It should probably say that the
> default
> > action is to create both ndistinct and dependencies.
>
> Hmm, I coulda sworn that it did say that somewhere.
>
> Where would you expect to find that?
>

Probably at the end of the paragraph:

"A statistic type to be computed in this statistics object. Currently
supported types are ndistinct, which enables n-distinct statistics, and
dependencies, which enables functional dependency statistics. For more
information, see Section 14.2.2 and Section 68.2."

Something like "If omitted, both ndistinct and dependencies will be
included."

If we invent more types in the future, would we expect those to be
defaulted to as well?

Cheers,

Jeff

[HACKERS] CREATE STATISTICS statistic_type documentation

2017-05-25 Thread Jeff Janes

The docs for CREATE STATISTICS does not say what happens if the
statistic_type clause is omitted.  It should probably say that the default
action is to create both ndistinct and dependencies.

Cheers,

Jeff

[HACKERS] ALTER PUBLICATION materializing the list of tables

2017-05-22 Thread Jeff Janes

If I create a publication FOR ALL TABLES and then change my mind, the only
thing I can do is drop the publication and recreate it.

Since "ALTER PUBLICATION name SET TABLE" allows you to replace the entire
table list, shouldn't it also let you change from the dynamic FOR ALL
TABLES to a static specific list?

Cheers,

Jeff

[HACKERS] ALTER PUBLICATION documentation

2017-05-22 Thread Jeff Janes

"The first variant of this command listed in the synopsis can change all of
the publication properties specified in CREATE PUBLICATION
."

That referenced first variant no longer exists.  I don't if that should
just be removed, or reworked to instead describe "ALTER PUBLICATION name
SET".

Cheers,

Jeff

[HACKERS] postgres_fdw aggregation pushdown has collation change in 10beta.

2017-05-17 Thread Jeff Janes

It is shipping collation-sensitive aggregates between servers which have
different collations.

commit 7012b132d07c2b4ea15b0b3cb1ea9f3278801d98
Author: Robert Haas 
Date:   Fri Oct 21 09:54:29 2016 -0400

postgres_fdw: Push down aggregates to remote servers.


I've attached a reproducing case.  Before this commit, the two final
queries give the same answer, and after they give different answers.  Maybe
this isn't considered a bug?  Is it just one of the "surprising semantic
anomalies may arise when types or collations do not match"?  It should be
able to know what collation the local definition of the foreign table has;
couldn't it pass that collation over the foreign side?

I don't really care, I was just using min as a way to get an arbitrary
value in the cases where there are more than one, but I found the change
surprising.

Cheers,

Jeff
drop database foobar_C;
drop database foobar_US;
create database foobar_C encoding 'utf-8' lc_collate "C" template template0;
create database foobar_US encoding 'utf-8' lc_collate "en_US.utf8" template 
template0;

\c foobar_us

create table remote1 (x text);
\copy remote1 from stdin
a
P
\.

\c foobar_c

CREATE EXTENSION IF NOT EXISTS postgres_fdw WITH SCHEMA public;

CREATE SERVER remote_server FOREIGN DATA WRAPPER postgres_fdw options (dbname 
'foobar_us') ;
CREATE USER MAPPING FOR postgres SERVER remote_server OPTIONS (
"user" 'postgres'
);

CREATE FOREIGN TABLE remote1 (
   x text
)
SERVER remote_server
OPTIONS (
schema_name 'public',
table_name 'remote1',
use_remote_estimate 'true'
);

create table local1 (x text);
\copy local1 from stdin
a
P
\.

select min(x) from local1;
select min(x) from remote1;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Transaction held open by autoanalyze can be a bottleneck

2017-05-10 Thread Jeff Janes

Autovacuum's analyze starts a transaction when it starts on a table, and
holds it for the duration. This holds back the xmin horizon.

On a TPC-B-like benchmark, this can be a problem.  While it is
autoanalyzing pgbench_accounts and pgbench_history, dead-but-for-analyze
tuples accumulate rapidly in pgbench_tellers and pgbench_branches.  Now the
UPDATES to those tables have to walk the unprunable HOT chains to find
their tuples to update, greatly slowing them down.

The analyze actually takes quite a while, because it is frequently setting
hint bits and so dirtying pages and so sleeping
for autovacuum_vacuum_cost_delay.

If I set autovacuum_vacuum_cost_delay=0, then tps averaged over an hour
goes from 12,307.6 to 24,955.2.  I can get a similar gain just by changing
the relopts for those two tables to autovacuum_analyze_threshold =
20.  I don't think these are particularly attractive solutions, but
they do demonstrate the nature of the problem.

Does analyze need all of its work done under the same transaction?  Is
there an elegant way to make it periodically discard the transaction and
get a new one, so that the xmin horizon can advance? I think doing so every
time vacuum_delay_point decides to sleep would be a good time to do that,
but that would expand its contract quite a bit. And it is probably possible
to have analyze take a long time without ever deciding to sleep, so doing
it there would not be a fully general solution.

Cheers,

Jeff

Re: [HACKERS] logical replication deranged sender

2017-05-09 Thread Jeff Janes

On Tue, May 9, 2017 at 9:18 AM, Petr Jelinek 
wrote:

> On 08/05/17 13:47, Petr Jelinek wrote:
> > On 08/05/17 01:17, Jeff Janes wrote:
> >> After dropping a subscription, it says it succeeded and that it dropped
> >> the slot on the publisher.
> >>
> >> But the publisher still has the slot, and a full-tilt process described
> >> by ps as
> >>
> >> postgres: wal sender process jjanes [local] idle in transaction
> >>
> >> Strace shows that this process is doing nothing but opening, reading,
> >> lseek, and closing from pg_wal, and calling sbrk.  It never sends
> anything.
> >>
> >> This is not how it should work, correct?
> >>
> >
> > No, and I don't see how this happens though, we only report success if
> > the publisher side said that DROP_REPLICATION_SLOT succeeded. So far I
> > don't see anything in source that would explain this. I will need to
> > reproduce it first to see what's happening (wasn't able to do that yet,
> > but it might just need more time since you say it does no happen always).
> >
>
> Hm I wonder are there any workers left on subscriber when this happens?
>

Yes.  using ps, I get this:

postgres: bgworker: logical replication worker for subscription 16408 sync
16391
postgres: bgworker: logical replication worker for subscription 16408 sync
16388

They seem to be permanently blocked on a socket to read from the publisher.

On the publisher side, I think it is very slowly assembling a snapshot.  It
seems to be adding one xid at a time, and then re-sorting the entire list.
Over and over.

Cheers,

Jeff

[HACKERS] logical replication deranged sender

2017-05-07 Thread Jeff Janes

After dropping a subscription, it says it succeeded and that it dropped the
slot on the publisher.

But the publisher still has the slot, and a full-tilt process described by
ps as

postgres: wal sender process jjanes [local] idle in transaction

Strace shows that this process is doing nothing but opening, reading,
lseek, and closing from pg_wal, and calling sbrk.  It never sends anything.

This is not how it should work, correct?

I'm running pgbench -c4 -j8 on the publisher, publishing all pgbench tables.

It doesn't seem to happen every time I create and then drop a subscription,
but it happens pretty often.

The subscribing server's logs shows:

21270  2017-05-07 16:04:16.517 PDT LOG:  process 21270 still waiting for
AccessShareLock on relation 6100 of database 0 after 1000.051 ms
21270  2017-05-07 16:04:16.517 PDT DETAIL:  Process holding the lock:
25493. Wait queue: 21270.
21270  2017-05-07 16:04:16.520 PDT LOG:  process 21270 acquired
AccessShareLock on relation 6100 of database 0 after 1003.608 ms
25493 DROP SUBSCRIPTION 2017-05-07 16:04:16.520 PDT LOG:  duration:
1005.176 ms CPU: user: 0.00 s, system: 0.00 s, elapsed: 1.00 s   statement:
drop subscription foobar ;

The publishing server's logs doesn't show anything relevant.

I'm using 86713de,  I have not tried this in earlier commits.

Cheers,

Jeff

Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-26 Thread Jeff Janes

On Wed, Apr 26, 2017 at 8:00 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 4/25/17 15:42, Petr Jelinek wrote:
> >> Here is the patch doing just that.
> >
> > And one more revision which also checks in_use when attaching shared
> > memory. This is mainly to improve the user visible behavior in
> > theoretical corner case when the worker is supposed to be cleaned up but
> > actually still manages to start (it would still fail even without this
> > change but the error message would be more obscure).
>
> Committed that, with some editorializing.
>

This gives me compiler warning:

launcher.c: In function 'logicalrep_worker_launch':
launcher.c:257: warning: 'slot' may be used uninitialized in this function

gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)


Cheers,

Jeff

Re: [HACKERS] DELETE and UPDATE with LIMIT and ORDER BY

2017-04-24 Thread Jeff Janes

On Mon, Apr 24, 2017 at 8:09 AM, Surafel Temesgen 
wrote:

> the necessity of allowing limit and order by clause to be used with delete
> and
> update statement is discussed in the past and added to the todo list
>
> preveouse mailing list descissions
>
>  http://archives.postgresql.org/pgadmin-hackers/2010-04/msg00078.php
>  http://archives.postgresql.org/pgsql-hackers/2010-11/msg01997.php
>

See this more recent one:

https://www.postgresql.org/message-id/flat/54102581.2020207%40joh.to#54102581.2020...@joh.to

That patch was not adopted, as I recall, mostly due to the requirement that
it support partitioned tables.

> i attached a small patch for its implementation.
>
> Notice : inorder to avoid unpredictable result the patch did not allow
> limit clause without order by and vise versal.
>

I think both of those are ill-advised.  To avoid deadlock, it is perfectly
fine to want an order by without a limit.

And to facilitate the reorganization of partitions or the population of new
columns in bite-size chunks, it is also fine to want limit without order by.

Cheers,

Jeff

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1456 matches

Mail list logo