date:20160422

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-22 Thread Gavin Flower


On 23/04/16 00:56, Robert Haas wrote:

On Thu, Apr 21, 2016 at 7:20 PM, Tom Lane  wrote:

Robert Haas  writes:

On Thu, Apr 21, 2016 at 4:01 PM, Gavin Flower
 wrote:

Why not 4?  As most processors now have at least 4 physical cores, & surely
it be more likely to flush out race conditions.

Because if we did that, then it's extremely likely that people would
end up writing queries that are faster only if workers are present,
and then not get any workers.

Is that because max_worker_processes is only 8 by default?  Maybe we
need to raise that, at least for beta purposes?

I'm not really in favor of that.  I mean, almost all of our default
settings are optimized for running PostgreSQL on, for example, a
Raspberry Pi 2, so it would seem odd to suddenly swing the other
direction and assume that there are more than 8 unused CPU cores.  It
doesn't make sense to me to roll out settings in beta that we wouldn't
be willing to release with if they work out.  That's why, honestly, I
would prefer max_parallel_degree=1, which I think would be practical
for many real-world deployments.  max_parallel_degree=2 is OK.  Beyond
that, we're just setting people up to fail, I think.  Higher settings
should probably only be used on substantial hardware, and not
everybody has that.

If Java can find out how many processors there are available to it, 
since JDK1.4, then surely PostgreSQL can do the same?


So how about the default being half the available processors rounded up 
to the nearest integer?


Perhaps the GUC for workers should be a percentage of the available 
processors, with the minimum & maximum workers optionally specified - or 
something of that nature?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-22 Thread Christian Ullrich


* Christian Ullrich wrote:


* Andrew Dunstan wrote:


On 04/22/2016 02:46 AM, Michael Paquier wrote:



Progress report:

1. My VS 2015 installations (I now have several) all generate
solution file
with:
Microsoft Visual Studio Solution File, Format Version 12.00
so I propose to set that as the solution file version.


I am wondering why it happens this way for you..


It's not just me. See the reply at


and notice that in both cases the Solution file format is version 12.00.


Try as I may, I cannot get my VS 2015 to write a version 14 .sln file.
It's always "12.00". If I change it to 14 by hand, it still opens and
appears to work fine.

I tried to find a real-world version 14 solution to see if I can spot a
difference between it and the version 12 files, but there appears to be
very little out there, and what there is, looks like it was either
autogenerated or edited manually. Examples:


OK, so searching on GitHub yielded a few more results, but I could not 
identify a single one that was clearly not created or edited by anything 
other than Visual Studio itself.


--
Christian




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-22 Thread Christian Ullrich


* Andrew Dunstan wrote:


On 04/22/2016 02:46 AM, Michael Paquier wrote:



Progress report:

1. My VS 2015 installations (I now have several) all generate
solution file
with:
Microsoft Visual Studio Solution File, Format Version 12.00
so I propose to set that as the solution file version.

>>

I am wondering why it happens this way for you..


It's not just me. See the reply at

and notice that in both cases the Solution file format is version 12.00.


Try as I may, I cannot get my VS 2015 to write a version 14 .sln file. 
It's always "12.00". If I change it to 14 by hand, it still opens and 
appears to work fine.


I tried to find a real-world version 14 solution to see if I can spot a 
difference between it and the version 12 files, but there appears to be 
very little out there, and what there is, looks like it was either 
autogenerated or edited manually. Examples:


- 
 
- "converted to VS2015 in anticipation of the release"


- 
 
- "but I changed the the contents of the .SLN to Microsoft Visual Studio 
Solution File, Format Version 14.00"


- 
 
- "# This file was generated by MPC."


Google returns not quite two pages of results for "Microsoft Visual 
Studio Solution File, Format Version 14.00"; it *has* to be nonstandard.


My best guess at this point is that the solution format really did not 
change between 2013 and 2015. Microsoft may just have added support for 
.sln version 14 for compatibility with people who generate solutions by 
other means and who are under the impression the .sln version has to be 
in step with the VS version, when in fact the VisualStudioVersion line 
in a version 12 .sln file controls which version of VS will open any 
given file by default, and the MinimumVisualStudioVersion line indicates 
which VS version supports the project types in the solution.


If my guess is wrong, and anyone knows a way to make VS 2015 write a 
version 14 .sln file, please tell me.


--
Christian




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpoint_flush_after documentation inconsistency

2016-04-22 Thread Andres Freund

On 2016-04-22 23:33:07 -0400, Robert Haas wrote:
> On Thu, Apr 21, 2016 at 2:20 PM, Andres Freund  wrote:
> > On 2016-04-18 14:33:28 +0900, Fujii Masao wrote:
> >> On Fri, Apr 15, 2016 at 6:56 PM, Magnus Hagander  
> >> wrote:
> >> > The documentation says that the default value is 128Kb on Linux, but the
> >> > code says it's 256Kb.
> >> >
> >> > Not sure which one is correct, but the other one should be updated :) I'm
> >> > guessing it's a copy/paste mistake in the docs, but since I'm not sure 
> >> > I'm
> >> > not just pushing a fix.
> >>
> >> I think you're right.
> >>
> >> I also found another several small problems regarding XXX_flush_after
> >> parameters.
> >
> > Thanks for finding these, once I'm back home/office from pgconf.nyc
> > (tonight / tomorrow) I'll push a fix.
> 
> Ping.

Working on a fix for invalidation issue [1] right now, because I want to
see that fixed before pushing the fix for the smgr thing [2]. Will get
to this afterwards.

Andres

[1] 
http://www.postgresql.org/message-id/CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w...@mail.gmail.com
[2] 
http://archives.postgresql.org/message-id/CAA-aLv6Dp_ZsV-44QA-2zgkqWKQq%3DGedBX2dRSrWpxqovXK%3DPg%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpoint_flush_after documentation inconsistency

2016-04-22 Thread Robert Haas

On Thu, Apr 21, 2016 at 2:20 PM, Andres Freund  wrote:
> On 2016-04-18 14:33:28 +0900, Fujii Masao wrote:
>> On Fri, Apr 15, 2016 at 6:56 PM, Magnus Hagander  wrote:
>> > The documentation says that the default value is 128Kb on Linux, but the
>> > code says it's 256Kb.
>> >
>> > Not sure which one is correct, but the other one should be updated :) I'm
>> > guessing it's a copy/paste mistake in the docs, but since I'm not sure I'm
>> > not just pushing a fix.
>>
>> I think you're right.
>>
>> I also found another several small problems regarding XXX_flush_after
>> parameters.
>
> Thanks for finding these, once I'm back home/office from pgconf.nyc
> (tonight / tomorrow) I'll push a fix.

Ping.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too old, configured by time

2016-04-22 Thread Bruce Momjian

On Tue, Apr 19, 2016 at 07:38:04AM -0400, Robert Haas wrote:
> 2. Without this feature, you can kill sessions or transactions to
> control bloat, but this feature is properly thought of as a way to
> avoid bloat *without* killing sessions or transactions.  You can let
> the session live, without having it generate bloat, just so long as it
> doesn't try to touch any data that has been recently modified.  We
> have no other feature in PostgreSQL that does something like that.

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice.  I can see why the patch author worked hard
to do that.

How does/did Oracle handle this?  I assume we can't find a way to set
this per session, right?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-22 Thread David Rowley

On 23 April 2016 at 13:58, Robert Haas  wrote:
> On Fri, Apr 22, 2016 at 5:36 PM, David Rowley
>  wrote:
>> I really don't think that we should print FILTER details in a combine
>> aggregate node. We'd be claiming to be doing something that we're
>> actually not doing. Please see advance_aggregates() in nodeAgg.c, and
>> compare that to combine_aggregates(), which is used when combineStates
>> == true. Notice that only advance_aggregates() bothers with the
>> aggfilter clause.
>
> I think you're wrong.  The Output line says what that node outputs,
> not how it's computed.  And a FinalizeAggregate on top of a
> PartialAggregate produces the same output as an Aggregate.

The most basic thing I can think of to rationalise my thinking for this is:

# create table v (v varchar);
# create view v_v as select lower(v) v from v;
# explain verbose select upper(v) from v_v;
 QUERY PLAN
-
 Seq Scan on public.v  (cost=0.00..30.40 rows=1360 width=32)
   Output: upper(lower((v.v)::text))
(2 rows)

It seems that you're proposing that the aggregate equivalence of this should be;

 QUERY PLAN
-
 Seq Scan on public.v  (cost=0.00..30.40 rows=1360 width=32)
   Output: upper((v)::text)
(2 rows)

which to me seems wrong, as it's hiding the fact that lower() was called.

My arguments don't seem to be holding much weight, so I'll back down,
although it would still be interesting to hear what others have to say
about this.

In any case, thanks for stepping up to fix this. I'm sure what you're
proposing will be much better than what's there at the moment.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-22 Thread Robert Haas

On Fri, Apr 22, 2016 at 5:36 PM, David Rowley
 wrote:
> I really don't think that we should print FILTER details in a combine
> aggregate node. We'd be claiming to be doing something that we're
> actually not doing. Please see advance_aggregates() in nodeAgg.c, and
> compare that to combine_aggregates(), which is used when combineStates
> == true. Notice that only advance_aggregates() bothers with the
> aggfilter clause.

I think you're wrong.  The Output line says what that node outputs,
not how it's computed.  And a FinalizeAggregate on top of a
PartialAggregate produces the same output as an Aggregate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump / copy bugs with "big lines" ?

2016-04-22 Thread Bruce Momjian

On Thu, Mar  3, 2016 at 10:31:26AM +0900, Michael Paquier wrote:
> On Thu, Mar 3, 2016 at 12:47 AM, Alvaro Herrera
>  wrote:
> > Well, the CopyData message has an Int32 field for the message length.
> > I don't know the FE/BE protocol very well but I suppose each row
> > corresponds to one CopyData message, or perhaps each column corresponds
> > to one CopyData message.  In either case, it's not possible to go beyond
> > 2GB without changing the protocol ...
> 
> Based on what I know from this stuff (OOM libpq and other stuff
> remnants), one 'd' message means one row. fe-protocol3.c and
> CopySendEndOfRow in backend's copy.c are confirming that as well. I am
> indeed afraid that having extra logic to get chunks of data will
> require extending the protocol with a new message type for this
> purpose.

Is there any documentation that needs updating based on this research?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump dump catalog ACLs

2016-04-22 Thread Stephen Frost

Noah, all,

* Noah Misch (n...@leadboat.com) wrote:
> On Tue, Apr 05, 2016 at 05:50:18PM -0400, Stephen Frost wrote:
> > Subject: [PATCH 4/5] In pg_dump, include pg_catalog and extension ACLs, if
> >  changed
> > 
> > Now that all of the infrastructure exists, add in the ability to
> > dump out the ACLs of the objects inside of pg_catalog or the ACLs
> > for objects which are members of extensions, but only if they have
> > been changed from their original values.
> 
> I wrote the attached test script to verify which types of ACLs dump/reload
> covers.  Based on the commit message, I expected these results:
> 
>   Dumped: type, class, attribute, proc, largeobject_metadata,
>   foreign_data_wrapper, foreign_server,
>   language(in-extension), namespace(in-extension)
>   Not dumped: database, tablespace,
>   language(from-initdb), namespace(from-initdb)
> 
> Did I interpret the commit message correctly?  The script gave these results:
> 
>   Dumped: type, class, attribute, namespace(from-initdb)
>   Not dumped: database, tablespace, proc, language(from-initdb)

You interpreted the commit message correctly and in a number of cases
the correct results are generated, but there's an issue in the WHERE
clause being used for some of the object types.  The earlier versions of
this patch adjusted the WHERE clause by using a sub-select, but later I
turned it into a LEFT JOIN and didn't correctly update the WHERE clause
to reflect that, so some kinds of ACL changes weren't being picked up.

That's relatively straight-forward to fix, but I'm going to put
together a series of TAP tests to go with these fixes.  While I tested
various options and bits and pieces of the code while developing, this
really needs a proper test suite that runs through all of these
permutations with each change.

I'm planning to have that done by the end of the weekend.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] if (!superuser) checks

2016-04-22 Thread Andres Freund

Hi,

On 2016-04-22 14:56:44 -0400, Stephen Frost wrote:
> The idea we came up with is to have a pg_replication default role which
> essentially replaces the REPLICATION role attribute.  Andres didn't see
> it as being terribly valuable to disallow a role with the REPLICATION
> attribute from logging into the database directly, if it's allowed to do
> so by pg_hba.conf.

Not just not useful, but rather an anti-feature.

> Lastly, Andres felt that we could keep the syntax for
> setting/unsetting the role attribute and just convert those into
> GRANT/REVOKE statements for the pg_replication role.  I'm not thrilled
> with that idea as it feels a bit unnecessary at this point, given the
> relatively few users of pg_logical_* functions by tools

Uh, that's not just about the pg_logical_ functions. It's primarily
about scripts which setup a postgres cluster, including replicas. Those
will frequently (hopefully at least) include creating/altering a role to
have the replication flag set?


> Andres, please correct me if any of the above is incorrect.

Nothing but the above point stands out.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kqueue

2016-04-22 Thread Tom Lane

Andres Freund  writes:
>> pg_strtoi?

> I think that's what Thomas did upthread. Are you taking this one then?

I'd go with just "strtoint".  We have "strtoint64" elsewhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kqueue

2016-04-22 Thread Tom Lane

Thomas Munro  writes:
> On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund  wrote:
>> On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
>>> While doing that I discovered that unpatched master doesn't actually
>>> build on recent NetBSD systems because our static function strtoi
>>> clashes with a non-standard libc function of the same name[1] declared
>>> in inttypes.h.  Maybe we should rename it, like in the attached?

>> Yuck. That's a new function they introduced? That code hasn't changed in
>> a while

> Yes, according to the man page it appeared in NetBSD 7.0.  That was
> released in September 2015, and our buildfarm has only NetBSD 5.x
> systems.  I see that the maintainers of the NetBSD pg package deal
> with this with a preprocessor kludge:

> http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1

> What is the policy for that kind of thing -- do nothing until someone
> cares enough about the platform to supply a buildfarm animal?

There's no set policy, but certainly a promise to put up a buildfarm
animal would establish that somebody actually cares about keeping
Postgres running on the platform.  Without one, we might fix a specific
problem when reported, but we'd have no way to know about new problems.

Rooting through that patches directory reveals quite a number of
random-looking patches, most of which we certainly wouldn't take
without a lot more than zero explanation.  It's hard to tell which
are actually needed, but at least some don't seem to have anything
to do with building for NetBSD.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kqueue

2016-04-22 Thread Andres Freund

On 2016-04-22 19:25:06 -0300, Alvaro Herrera wrote:
> Since we haven't, maybe nobody cares, so why should we?

I guess it's to a good degree because netbsd has pg packages, and it's
fixed there?

> would rename our function nonetheless FWIW; the name seems far too
> generic to me.

Yea.

> pg_strtoi?

I think that's what Thomas did upthread. Are you taking this one then?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kqueue

2016-04-22 Thread Andres Freund

On 2016-04-23 10:12:12 +1200, Thomas Munro wrote:
> What is the policy for that kind of thing -- do nothing until someone
> cares enough about the platform to supply a buildfarm animal?

I think we should fix it, I just want to make sure we understand why the
error is appearing now. Since we now do...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kqueue

2016-04-22 Thread Alvaro Herrera

Thomas Munro wrote:
> On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund  wrote:
> > On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
> >> While doing that I discovered that unpatched master doesn't actually
> >> build on recent NetBSD systems because our static function strtoi
> >> clashes with a non-standard libc function of the same name[1] declared
> >> in inttypes.h.  Maybe we should rename it, like in the attached?
> >
> > Yuck. That's a new function they introduced? That code hasn't changed in
> > a while
> 
> Yes, according to the man page it appeared in NetBSD 7.0.  That was
> released in September 2015, and our buildfarm has only NetBSD 5.x
> systems.  I see that the maintainers of the NetBSD pg package deal
> with this with a preprocessor kludge:
> 
> http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1
> 
> What is the policy for that kind of thing -- do nothing until someone
> cares enough about the platform to supply a buildfarm animal?

Well, if the platform is truly alive, we would have gotten complaints
already.  Since we haven't, maybe nobody cares, so why should we?  I
would rename our function nonetheless FWIW; the name seems far too
generic to me.  pg_strtoi?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kqueue

2016-04-22 Thread Thomas Munro

On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund  wrote:
> On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
>> While doing that I discovered that unpatched master doesn't actually
>> build on recent NetBSD systems because our static function strtoi
>> clashes with a non-standard libc function of the same name[1] declared
>> in inttypes.h.  Maybe we should rename it, like in the attached?
>
> Yuck. That's a new function they introduced? That code hasn't changed in
> a while

Yes, according to the man page it appeared in NetBSD 7.0.  That was
released in September 2015, and our buildfarm has only NetBSD 5.x
systems.  I see that the maintainers of the NetBSD pg package deal
with this with a preprocessor kludge:

http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1

What is the policy for that kind of thing -- do nothing until someone
cares enough about the platform to supply a buildfarm animal?

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-22 Thread David Rowley

On 23 April 2016 at 09:19, Robert Haas  wrote:
> 2. Add a field "bool aggcombine" to args, and set it to true in this
> case.  When we see that in deparsing, expect the argument list to be
> one element long, a TargetEntry containing a Var.  Use that to dig out
> the partial Aggref to which it points, and deparse that instead.  I
> guess maybe get_variable() could be used for this purpose.
>

I did do that at one stage. I think perhaps one of the above patches
did it that way. The problem was that because I was just detecting
combine Aggrefs and printing the first item in args, it meant the
PARTIAL word was printed again, and that was pretty bogus, since it
was not a partial agg. I didn't see a way to do this without adding
some extra bool parameter to that whole series of functions. But I
only looked at using get_rule_expr(). I didn't look at what
get_variable() is.

> There might be another approach, too.  Thoughts?
>
> (Note that I'm assuming here that the final aggregate's target list
> output should match what we would have gotten from a regular
> aggregate, despite the combining stage in the middle.  I think that's
> correct; we're outputting the same thing, even though we computed it
> differently.)

Please note that in this case the final and combine node are the same
node, so I'm confused by the "combining stage in the middle" part.
There's only 2 aggregate nodes. I'm not sure how one of those is in
the middle.

I really don't think that we should print FILTER details in a combine
aggregate node. We'd be claiming to be doing something that we're
actually not doing. Please see advance_aggregates() in nodeAgg.c, and
compare that to combine_aggregates(), which is used when combineStates
== true. Notice that only advance_aggregates() bothers with the
aggfilter clause.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-04-22 Thread Robert Haas

On Fri, Apr 22, 2016 at 1:07 PM, Andres Freund  wrote:
> The attached patch basically adds the segment size checks to
> _mdfd_getseg(), and doesn't perform extension, even in recovery, if
> EXTENSION_REALLY_RETURN_NULL is passed.
>
> This fixes the issue for me, both in the originally reported variant,
> and in some more rudely setup version (i.e. rm'ing files).

I think this looks broadly reasonable.  Some specific comments:

+/*
+ * Normally we will create new segments only if authorized by
+ * the caller (i.e., we are doing mdextend()).  But when doing
+ * WAL recovery, create segments anyway; this allows cases
+ * such as replaying WAL data that has a write into a
+ * high-numbered segment of a relation that was later deleted.
+ * We want to go ahead and create the segments so we can
+ * finish out the replay.

Add something like: "However, if the caller has specified
EXTENSION_REALLY_RETURN_NULL, then extension is not desired even in
recovery; we won't reach this point in that case."

+errno = ENOENT;/* some callers check errno, yuck */

I think a considerably more detailed comment would be warranted.

+else
+{
+ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not open file \"%s\"
(target block %u): previous segment is only %u blocks",
+_mdfd_segpath(reln, forknum, nextsegno),
+blkno, nblocks)));
+}

The else and its ensuing level of indentation are unnecessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-22 Thread David Rowley

On 23 April 2016 at 06:35, Robert Haas  wrote:
> 2. You're using the term "combine agg", but as far as the EXPLAIN
> ANALYZE output is concerned, that's not a thing.  There is
> PartialAggregate, Aggregate, and FinalizeAggregate.  I think you mean
> FinalizeAggregate when you say "combine aggregate", since we identify
> a node as a FinalizeAggregate by observing that combineStates = true.

I really do mean combine when I say combine. I'm pretty sure that I've
coded everything properly to work even if there was 10 stages of
aggregation, in this case 9 of those would be combine nodes, and only
1 of them a finalize node. I can see why you'd say that at a glace of
explain.c, but it's more relevant that agg->finalizeAggs is true,
giving the combine test is in the else if condition. The test for
combineStates is only there so we don't output Finalize Aggregate for
normal 1 stage aggregates.

I simply don't think FILTER should be shown if combineStates is true.
This means that it will only be shown at the first aggregate node, the
one with combineStates == false. I think this is a good idea due to
the fact that this is the only place where FILTERing is done.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-22 Thread Robert Haas

On Sun, Apr 17, 2016 at 10:22 AM, Tom Lane  wrote:
> David Rowley  writes:
>> On 16 April 2016 at 04:27, Tom Lane  wrote:
>>> +1 for the latter, if we can do it conveniently.  I think exposing
>>> the names of the aggregate implementation functions would be very
>>> user-unfriendly, as nobody but us hackers knows what those are.
>
>> It does not really seem all that convenient to do this. It also seems
>> a bit strange to me to have a parent node report a column which does
>> not exist in any nodes descending from it. Remember that the combine
>> Aggref does not have the same ->args as its corresponding partial
>> Aggref. It's not all that clear to me if there is any nice way to do
>> have this work the way you'd like. If we were to just call
>> get_rule_expr() on the first arg of the combine aggregate node, it
>> would re-print the PARTIAL keyword again.
>
> This suggests to me that the parsetree representation for partial
> aggregates was badly chosen.  If EXPLAIN can't recognize them, then
> neither can anything else, and we may soon find needs for telling
> the difference that are not just cosmetic.  Maybe we need more
> fields in Aggref.

So the basic problem is this code in setrefs.c:

newvar = search_indexed_tlist_for_partial_aggref(aggref,
 context->subplan_itlist,
 context->newvarno);
if (newvar)
{
Aggref *newaggref;
TargetEntry *newtle;

/*
 * Now build a new TargetEntry for the Aggref's arguments which is
 * a single Var which references the corresponding AggRef in the
 * node below.
 */
newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
newaggref = (Aggref *) copyObject(aggref);
newaggref->args = list_make1(newtle);

So what ends up happening is that, before setrefs processing, the
FinalizeAggregate's aggref has the same argument list as what the user
originally specified, but during that process, we replace the original
argument list with a 1-element argument list that points to the output
of the partial aggregate step.  After that, it's unsurprising that the
deparsing logic goes wrong; consider especially the case of an
aggregate that originally took any number of arguments other than one.
Offhand, I see two somewhat reasonable strategies for trying to fix
this:

1. Add a field "List *origargs" to Aggref, and in this case set
newaggref->origargs to a copy of the original argument list.  Use
origargs for deparsing, unless it's NULL, in which case use args.  Or
alternative, always populate origargs, but in other cases just make it
equal to args.

2. Add a field "bool aggcombine" to args, and set it to true in this
case.  When we see that in deparsing, expect the argument list to be
one element long, a TargetEntry containing a Var.  Use that to dig out
the partial Aggref to which it points, and deparse that instead.  I
guess maybe get_variable() could be used for this purpose.

There might be another approach, too.  Thoughts?

(Note that I'm assuming here that the final aggregate's target list
output should match what we would have gotten from a regular
aggregate, despite the combining stage in the middle.  I think that's
correct; we're outputting the same thing, even though we computed it
differently.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] add option to pg_dumpall to exclude tables from the dump

2016-04-22 Thread Juergen Hannappel

A new option -T --exlude-table for pg_dumpall. This option is then
passed through to the pg_dump which really does the work.
This feature can be used to exclude large tables that are known not
to change from a database backup dump so that only the changing parts
of the database are dumped.

Signed-off-by: Juergen Hannappel 
---
 doc/src/sgml/ref/pg_dumpall.sgml | 14 ++
 src/bin/pg_dump/pg_dumpall.c |  9 -
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 6c34c25..24408b9 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -198,6 +198,20 @@ PostgreSQL documentation
  
 
  
+  -T table
+  --exclude-table=table
+  
+   
+Do not dump any tables matching the table pattern.  The pattern is
+interpreted according to the same rules as for -t.
+-T can be given more than once to exclude tables
+matching any of several patterns.
+   
+  
+ 
+
+ 
   -v
   --verbose
   
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index a7dc41c..979a964 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -111,6 +111,7 @@ main(int argc, char *argv[])
{"password", no_argument, NULL, 'W'},
{"no-privileges", no_argument, NULL, 'x'},
{"no-acl", no_argument, NULL, 'x'},
+   {"exclude-table", required_argument, NULL, 'T'},
 
/*
 * the following options don't have an equivalent short option 
letter
@@ -195,7 +196,7 @@ main(int argc, char *argv[])
 
pgdumpopts = createPQExpBuffer();
 
-   while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWx", 
long_options, &optindex)) != -1)
+   while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWxT:", 
long_options, &optindex)) != -1)
{
switch (c)
{
@@ -283,6 +284,11 @@ main(int argc, char *argv[])
appendPQExpBufferStr(pgdumpopts, " -x");
break;
 
+   case 'T':
+   appendPQExpBufferStr(pgdumpopts, " -T");
+   doShellQuoting(pgdumpopts,optarg);
+   break;
+
case 0:
break;
 
@@ -564,6 +570,7 @@ help(void)
printf(_("  -s, --schema-onlydump only the schema, no 
data\n"));
printf(_("  -S, --superuser=NAME superuser user name to use in 
the dump\n"));
printf(_("  -t, --tablespaces-only   dump only tablespaces, no 
databases or roles\n"));
+   printf(_("  -T, --exclude-table  exclude some tables\n"));
printf(_("  -x, --no-privileges  do not dump privileges 
(grant/revoke)\n"));
printf(_("  --binary-upgrade for use by upgrade utilities 
only\n"));
printf(_("  --column-inserts dump data as INSERT commands 
with column names\n"));
-- 
1.8.4.5



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Inline initial comparisons in TestForOldSnapshot()

2016-04-22 Thread Tom Lane

Kevin Grittner  writes:
> Since I failed to find anything in our docs or C comments, and very
> scant clues in the Wiki and list archives, about when to use
> PGDLLIMPORT and PGDLLEXPORT I figured it might be helpful to
> clarify here, and maybe add something near one of the definitions.

> Based on your fix and the meager clues found elsewhere, along with
> randomly looking at a few points of existing usage, I infer that
> these should be specified from the perspective of loadable modules
> which we might want to build for Windows.  That is, an extension
> would use PGDLLEXPORT for symbols it wanted to have the backends
> find, and core code should use PGDLLIMPORT for symbols that
> extensions or other loadable modules might need to find.  These are
> used for both data locations and function names.

I resolutely refuse to become an expert on such things, but from existing
usage it seems we only need PGDLLIMPORT on the "extern"s for core global
variables that need to be accessible to extensions.  If there is a
corresponding requirement for core functions, it must be getting handled
automatically somewhere in the Windows build systems.

There are no manual uses of PGDLLEXPORT anywhere in our tree --- it's
plastered on automatically by PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1,
and that's all.  I rather suspect that this is cargo-cult programming
and those uses are not really necessary, because if they are, how do V0
functions in extensions work?  But as long as it's out of sight I don't
particularly care if they're there or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Jeff Janes

On Thu, Apr 21, 2016 at 11:00 PM, Noah Misch  wrote:
> On Mon, Apr 18, 2016 at 05:48:17PM +0300, Teodor Sigaev wrote:
>> >>Added, see attached patch (based on v3.1)
>> >
>> >With this applied, I am getting a couple errors I have not seen before
>> >after extensive crash recovery testing:
>> >ERROR:  attempted to delete invisible tuple
>> >ERROR:  unexpected chunk number 1 (expected 2) for toast value
>> >100338365 in pg_toast_16425
>> Huh, seems, it's not related to GIN at all... Indexes don't play with toast
>> machinery. The single place where this error can occur is a heap_delete() -
>> deleting already deleted tuple.
>
> Like you, I would not expect gin_alone_cleanup-4.patch to cause such an error.
> I get the impression Jeff has a test case that he had run in many iterations
> against the unpatched baseline.  I also get the impression that a similar or
> smaller number of its iterations against gin_alone_cleanup-4.patch triggered
> these two errors (once apiece, or multiple times?).  Jeff, is that right?

Because the unpatched baseline suffers from the bug which was the
original topic of this thread, I haven't been able to test against the
original baseline.  It would fail from that other bug before it ran
long enough to hit these ones.  Both errors start within a few minutes
of each other, but do not appear to be related other than that.  Once
they start happening, they occur repeatedly.

> Could you describe the test case in sufficient detail for Teodor to reproduce
> your results?

I spawn a swarm of processes to update a counter in a randomly chosen
row, selected via the gin index.  They do this as fast as possible
until the server intentionally crashes.  When it recovers, they
compare notes and see if the results are consistent.

But in this case, the problem is not inconsistent results, but rather
errors during the updating stage.

The patch introduces a mechanism to crash the server, a mechanism to
fast-forward the XID, and some additional logging that I sometimes
find useful (which I haven't been using in this case, but don't want
to rip it out)

The perl script implements the core of the test harness.

The shell script sets up the server (using hard coded paths for the
data and the binaries, so will need to be changed), and then calls the
perl script in a loop.

Running on an 8 core system, I've never seen it have a problem in less
than 9 hours of run time.

This produces copious amounts of logging to stdout.  This is how I
look through the log files to find this particular problem:

sh do.sh >& do.err &

tail -f do.err | fgrep ERROR

The more I think about it, the more I think gin is just an innocent
bystander, for which I just happen to have a particularly demanding
test.  I think something about snapshots and wrap-around may be
broken.

Cheers,

Jeff

count.pl
Description: Binary data

crash_REL9_6.patch
Description: Binary data

do.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Inline initial comparisons in TestForOldSnapshot()

2016-04-22 Thread Kevin Grittner

On Thu, Apr 21, 2016 at 1:45 PM, Tom Lane  wrote:

> PGDLLIMPORT is what's needed on any backend global variable that's to
> be referenced by extensions.  I already pushed a fix before noticing
> this thread.

Thanks!

Since I failed to find anything in our docs or C comments, and very
scant clues in the Wiki and list archives, about when to use
PGDLLIMPORT and PGDLLEXPORT I figured it might be helpful to
clarify here, and maybe add something near one of the definitions.

Based on your fix and the meager clues found elsewhere, along with
randomly looking at a few points of existing usage, I infer that
these should be specified from the perspective of loadable modules
which we might want to build for Windows.  That is, an extension
would use PGDLLEXPORT for symbols it wanted to have the backends
find, and core code should use PGDLLIMPORT for symbols that
extensions or other loadable modules might need to find.  These are
used for both data locations and function names.

Close?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "parallel= " information is not coming in pg_dumpall for create aggregate

2016-04-22 Thread Fabrízio de Royes Mello

On Thu, Apr 21, 2016 at 12:06 AM, Robert Haas  wrote:
>
> On Mon, Apr 18, 2016 at 10:47 AM, Fabrízio de Royes Mello
>  wrote:
> >> I checked in PG 9.6 , if we create an aggregate function with saying -
> >> parallel=safe/restricted/unsafe and then take
> >> a pg_dumpall of the entire cluster , "parallel= " is missing from
create
> >> aggregate syntax
> >>
> >> Steps to reproduce -
> >>
> >> .)connect to psql terminal and create an aggregate function
> >>
> >> postgres=# CREATE AGGREGATE unsafe_sum100 (float8)
> >> (
> >> stype = float8,
> >> sfunc = float8pl,
> >> mstype = float8,
> >> msfunc = float8pl,
> >> minvfunc = float8mi,
> >> parallel=safe);
> >> CREATE AGGREGATE
> >>
> >> .)perform pg_dumpall against that cluster
> >>
> >> .)check the content of create aggregate unsafe_sum100 in the file
> >>
> >> "
> >> -
> >> -- Name: unsafe_sum100(double precision); Type: AGGREGATE; Schema:
public;
> >> Owner: centos
> >> --
> >>
> >> CREATE AGGREGATE unsafe_sum100(double precision) (
> >> SFUNC = float8pl,
> >> STYPE = double precision,
> >> MSFUNC = float8pl,
> >> MINVFUNC = float8mi,
> >> MSTYPE = double precision
> >> );
> >>
> >> "
> >
> > You're correct... try the attached patch to fix it.
>
> Nice catch, Tushar.  Thanks for the patch, Fabrízio.  Committed.
>

You're welcome!

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] if (!superuser) checks

2016-04-22 Thread Stephen Frost

All,

* Stephen Frost (sfr...@snowman.net) wrote:
> In particular, the pg_logical_* functions have superuser checks and
> those checks also allow roles who have the replication role attribute.
> That isn't something we can represent with the GRANT system currently.

I chatted with Andres a bit at PGConf.NY regarding the pg_logical_* and
the REPLICATION role attribute in general.

This is all 9.7 material, of course, but I wanted to jot down my
recollection of the discussion before it goes out of mind.

The idea we came up with is to have a pg_replication default role which
essentially replaces the REPLICATION role attribute.  Andres didn't see
it as being terribly valuable to disallow a role with the REPLICATION
attribute from logging into the database directly, if it's allowed to do
so by pg_hba.conf.  If the administrator doesn't wish to allow that,
then they can configure pg_hba.conf to not allow it.  Further, with the
permissions handled by the GRANT system, the administrator could have
independent roles for connecting to the replication protocol and for
running the pg_logical_* functions, again, if they wish to configure
their system that way.

Lastly, Andres felt that we could keep the syntax for
setting/unsetting the role attribute and just convert those into
GRANT/REVOKE statements for the pg_replication role.  I'm not thrilled
with that idea as it feels a bit unnecessary at this point, given the
relatively few users of pg_logical_* functions by tools and because it
would introduce a one-off hack in tools such as pgAdmin, which would
have to query pg_auth_members to get the current setting of the
REPLICATION role attribute, even if the code for granting/revoking that
role attribute doesn't change, unless we also hack up the views and
tables which contain role attributes today to reflect membership in
pg_replication as having (or not) that role attribute.  Basically, for
my 2c, it seems like an awful lot of extra code on the backend side for
relatively small gain.  pgAdmin and similar tools are going to want to
provide support for default roles in any case, along with their role
attributes support, and it doesn't seem useful to try and conflate the
two.

Andres, please correct me if any of the above is incorrect.

Thoughts and comments are welcome, of course.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Robert Haas

On Fri, Apr 22, 2016 at 2:20 PM, Jeff Janes  wrote:
>> Check my reasoning: In version 4 I added a remebering of tail of pending
>> list into blknoFinish variable. And when we read page which was a tail on
>> cleanup start then we sets cleanupFinish variable and after cleaning that
>> page we will stop further cleanup. Any insert caused during cleanup will be
>> placed after blknoFinish (corner case: in that page), so, vacuum should not
>> miss tuples marked as deleted.
>
> Yes, I agree with the correctness of v4.  But I do wonder if we should
> use that early stopping for vacuum and gin_clean_pending_list, rather
> than just using it for user backends.  While I think correctness
> allows it to stop early, since these routines are explicitly about
> cleaning things up it seems like they should volunteer to clean the
> whole thing.

Not to hold anyone's feet to the fire, but we are running out of time
before beta1, and this seems to be one of the more serious outstanding
issues.  I'd rather not release beta1 with known data corruption bugs,
but that means either fixing whatever is broken here or reverting the
commit that caused or revealed the problem pretty soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Queries and PostGIS

2016-04-22 Thread Stephen Frost

Paul,

* Paul Ramsey (pram...@cleverelephant.ca) wrote:
> On Mon, Mar 28, 2016 at 9:45 AM, Stephen Frost  wrote:
> > Would you agree that it'd be helpful to have for making the st_union()
> > work better in parallel?
> 
> For our particular situation w/ ST_Union, yes, it would be ideal to be
> able to run a worker-side combine function as well as the master-side
> one. Although the cascaded union would be less effective spread out
> over N nodes, doing it only once per worker, rather than every N
> records would minimize the loss of effectiveness.

I chatted with Robert a bit about this and he had an interesting
suggestion.  I'm not sure that it would work for you, but the
serialize/deserialize functions are used to transfer the results from
the worker process to the main process.  You could possibly do the
per-worker finalize work in the serialize function to get the benefit of
running that in parallel.

You'll need to mark the aggtranstype as 'internal' to have the
serialize/deserialize code called.  Hopefully that's not too much of an
issue.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] EXPLAIN VERBOSE with parallel Aggregate

2016-04-22 Thread Robert Haas

On Thu, Apr 21, 2016 at 8:57 PM, David Rowley
 wrote:
> OK, so here's my thoughts. Currently, as mentioned above, I've
> included a PARTIAL prefix for partial aggregates. This is
> syntactically incorrect for the dumping of views etc, but that should
> not matter as partial Aggrefs never come from the parser, they're only
> perhaps generated later in the planner. Same goes for combine
> aggregates too.

Yeah.  As I've said a few times, I would like to have SQL syntax that
emits the unfinalized (but serialized, if type internal) values, so
that postgres_fdw can use that to partial aggregation to the remote
side.  Maybe we should consider what would be reasonable syntax for
this; but that's a second-order consideration for now.

> The attached still does not get the output into the way Robert would
> have liked, but I still stand by my dislike to pretending the combine
> aggregate is a normal aggregate. It's not all that clear if FILTER
> should be displayed in the combine agg. Combine Aggs don't do FILTER.

I am rather confused by this, for two reasons:

1. The "Output" line is supposed to display the columns that the plan
node is producing.  And a FinalizeAggregate had darned well better be
producing the same results as an Aggregate, so it's entirely
reasonable for it to produce the same output that an Aggregate would
have given us.

2. You're using the term "combine agg", but as far as the EXPLAIN
ANALYZE output is concerned, that's not a thing.  There is
PartialAggregate, Aggregate, and FinalizeAggregate.  I think you mean
FinalizeAggregate when you say "combine aggregate", since we identify
a node as a FinalizeAggregate by observing that combineStates = true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Jeff Janes

On Mon, Apr 18, 2016 at 7:48 AM, Teodor Sigaev  wrote:
>>> Added, see attached patch (based on v3.1)
>>
>>
>> With this applied, I am getting a couple errors I have not seen before
>> after extensive crash recovery testing:
>> ERROR:  attempted to delete invisible tuple
>> ERROR:  unexpected chunk number 1 (expected 2) for toast value
>> 100338365 in pg_toast_16425
>
> Huh, seems, it's not related to GIN at all... Indexes don't play with toast
> machinery. The single place where this error can occur is a heap_delete() -
> deleting already deleted tuple.

Those are two independent errors.  The delete invisible tuple error
doesn't occur on toast tables.

The actual statement triggering the error is an update statement.
Since it is showing up in the delete path, I assume it must be an
update where the new tuple goes to a different page.  But, if the
soon-to-be-old tuple is not visible, why is the update trying to
update it in the first place?  It seems like the different parts of
the code disagree on what is visible.

update foo set count=count+1,text_array=$1 where text_array @> $2

I agree it might not have anything to do with gin indexes, but I
didn't see it in testing anything else.  It might be a wrap-around
problem which for some reason only the gin test is efficient at
evoking. What I've done now is apply your v4 patch directly to
e95680832854cf300e64c1 and I am trying to see if it also has the
problem.  If that is clean, then it is probably an independently
introduced bug which is just getting exercised by the gin index stress
test.  If that is the case I'll try to git bisect forward, but that
could take weeks given the runtimes involved.  If that is dirty, then
maybe the FSM vacuuming patch introduced/uncovered more than one bug,
and should be reverted.

>> I've restarted the test harness with intentional crashes turned off,
>> to see if the problems are related to crash recovery or are more
>> generic than that.

I do not see the problem when there is no crash-recovery cycling involved.

I also do not see the problem when compiled under --enable-cassert,
but that could just be because compiling that way makes it too slow to
get in sufficient testing to hit the bug; before I gave up.

>>
>> I've never seen these particular problems before, so don't have much
>> insight into what might be going on or how to debug it.
>
> Check my reasoning: In version 4 I added a remebering of tail of pending
> list into blknoFinish variable. And when we read page which was a tail on
> cleanup start then we sets cleanupFinish variable and after cleaning that
> page we will stop further cleanup. Any insert caused during cleanup will be
> placed after blknoFinish (corner case: in that page), so, vacuum should not
> miss tuples marked as deleted.

Yes, I agree with the correctness of v4.  But I do wonder if we should
use that early stopping for vacuum and gin_clean_pending_list, rather
than just using it for user backends.  While I think correctness
allows it to stop early, since these routines are explicitly about
cleaning things up it seems like they should volunteer to clean the
whole thing.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-04-22 Thread Andres Freund

On 2016-04-14 11:09:29 +0530, Abhijit Menon-Sen wrote:
> At 2016-04-12 09:00:57 -0400, robertmh...@gmail.com wrote:
> >
> > On Mon, Apr 11, 2016 at 1:17 PM, Andres Freund  wrote:
> > >
> > > 3) Actually handle the case of the last open segment not being
> > >RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so.
> > 
> > #3 seems like it's probably about 15 years overdue, so let's do that
> > anyway.
> 
> I don't have anything useful to contribute, I'm just trying to
> understand this fix.
> 
> _mdfd_getseg() looks like this:
> 
>   targetseg = blkno / ((BlockNumber) RELSEG_SIZE);
>   for (nextsegno = 1; nextsegno <= targetseg; nextsegno++)
>   {
>   Assert(nextsegno == v->mdfd_segno + 1);
> 
>   if (v->mdfd_chain == NULL)
>   {
>   /*
>* Normally we will create new segments only if 
> authorized by the
>* caller (i.e., we are doing mdextend()). […]
>*/
>   if (behavior == EXTENSION_CREATE || InRecovery)
>   {
>   if (_mdnblocks(reln, forknum, v) < RELSEG_SIZE)
> /* mdextend */
>   v->mdfd_chain = _mdfd_openseg(reln, forknum, 
> +nextsegno, O_CREAT);
>   }
>   else
>   {
>   /* We won't create segment if not existent */
>   v->mdfd_chain = _mdfd_openseg(reln, forknum, 
> nextsegno, 0);
>   }
>   if (v->mdfd_chain == NULL)
> /* return NULL or ERROR */
>   }
>   v = v->mdfd_chain;
>   }
> 
> Do I understand correctly that the case of the "last open segment"
> (i.e., the one for which mdfd_chain == NULL) not being RELSEG_SIZE
> (i.e., _mdnblocks(reln, forknum, v) < RELSEG_SIZE) should not call
> _mfdf_openseg on nextsegno if behaviour is not EXTENSION_CREATE or
> InRecovery?
>
> And that "We won't create segment if not existent" should happen, but
> doesn't only because the next segment file wasn't removed earlier, so
> we have to add an extra check for that case?
> 
> In other words, is something like the following what's needed here, or
> is there more to it?

It's a bit more complicated than that, but that's the principle. Thanks
for the patch, and sorry for my late response. See my attached version
for a more fleshed out version of this.

Looking at the code around this I found:
* _mdfd_getseg(), in contrast to mdnblocks() doesn't check segment size,
  neither whether to continue with a too small, nor to error out with a
  too big segment
* For, imo pretty bad, reasons in mdsync() we currently rely on
  EXTENSION_RETURN_NULL to extend the file in recovery, and to set errno
  to ENOENT. Brrr.
* we currently FATAL if a segment is too big - I've copied that
  behaviour, but why not just ERROR out?

The attached patch basically adds the segment size checks to
_mdfd_getseg(), and doesn't perform extension, even in recovery, if
EXTENSION_REALLY_RETURN_NULL is passed.

This fixes the issue for me, both in the originally reported variant,
and in some more rudely setup version (i.e. rm'ing files).


I'm seing some weird behaviour (even with writeback flushing disabled)
with smgr invals and recovery/standbys, but I don't want to dive into it
before 
http://www.postgresql.org/message-id/CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w...@mail.gmail.com
is addressed, it's too likely to be related.

Greetings,

Andres Freund
>From ea6bd2b96e9d8c499b7c926f5f4b05b39dfbc540 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 22 Apr 2016 09:45:26 -0700
Subject: [PATCH] Don't open formally non-existant segments in _mdfd_getseg().

Discussion: CAA-aLv6Dp_ZsV-44QA-2zgkqWKQq=GedBX2dRSrWpxqovXK=p...@mail.gmail.com
Fixes: 428b1d6b29ca599c5700d4bc4f4ce4c5880369bf
---
 src/backend/storage/smgr/md.c | 86 ++-
 1 file changed, 60 insertions(+), 26 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 578276d..89a20d1 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -165,9 +165,14 @@ static CycleCtr mdckpt_cycle_ctr = 0;
 
 typedef enum	/* behavior for mdopen & _mdfd_getseg */
 {
-	EXTENSION_FAIL,/* ereport if segment not present */
-	EXTENSION_RETURN_NULL,		/* return NULL if not present */
-	EXTENSION_CREATE			/* create new segments as needed */
+	/* ereport if segment not present, create in recovery */
+	EXTENSION_FAIL,
+	/* return NULL if not present, create in recovery */
+	EXTENSION_RETURN_NULL,
+	/* return NULL if not present */
+	EXTENSION_REALLY_RETURN_NULL,
+	/* create new segments as needed */
+	EXTENSION_CREATE
 } ExtensionBehavior;
 
 /* local routines */
@@ -591,7 +596,8 @@ mdopen(SMgrRelation reln, ForkNumb

Re: [HACKERS] kqueue

2016-04-22 Thread Andres Freund

On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
> I vote to leave this patch in the next commitfest where it is, and
> reconsider if someone shows up with a relevant problem report on large
> systems.

Sounds good!


> Here's a new version of the patch that fixes some stupid bugs.  I have
> run regression tests and some basic sanity checks on OSX 10.11.4,
> FreeBSD 10.3, NetBSD 7.0 and OpenBSD 5.8.  There is still room to make
> an improvement that would drop the syscall from AddWaitEventToSet and
> ModifyWaitEvent, compressing wait set modifications and waiting into a
> single syscall (kqueue's claimed advantage over the competition).

I find that not to be particularly interesting, and would rather want to
avoid adding complexity for it.


> While doing that I discovered that unpatched master doesn't actually
> build on recent NetBSD systems because our static function strtoi
> clashes with a non-standard libc function of the same name[1] declared
> in inttypes.h.  Maybe we should rename it, like in the attached?

Yuck. That's a new function they introduced? That code hasn't changed in
a while

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump dump catalog ACLs

2016-04-22 Thread Stephen Frost

* Noah Misch (n...@leadboat.com) wrote:
> On Wed, Apr 20, 2016 at 10:50:21PM -0400, Stephen Frost wrote:
> > I'm certainly open to improving these issues now if we agree that they
> > should be fixed for 9.6.  If we don't want to include such changes in 9.6
> > then I will propose then for post-9.6.
> 
> Folks run clusters with ~1000 databases; we previously accepted at least one
> complex performance improvement[1] based on that use case.  On the faster of
> the two machines I tested, the present thread's commits slowed "pg_dumpall
> --schema-only --binary-upgrade" by 1-2s per database.  That doubles pg_dump
> runtime against the installcheck regression database.  A run against a cluster
> of one hundred empty databases slowed fifteen-fold, from 8.6s to 131s.
> "pg_upgrade -j50" probably will keep things tolerable for the 1000-database
> case, but the performance regression remains jarring.  I think we should not
> release 9.6 with pg_dump performance as it stands today.

After looking through the code a bit, I realized that there are a lot of
object types which don't have ACLs at all but which exist in pg_catalog
and were being analyzed because the bitmask for pg_catalog included ACLs
and therefore was non-zero.

Clearing that bit for object types which don't have ACLs improved the
performance for empty databases quite a bit (from about 3s to a bit
under 1s on my laptop).  That's a 42-line patch, with comment lines
being half of that, which I'll push once I've looked into the other
concerns which were brought up on this thread.

Much of the remaining inefficiancy is how we query for column
information one table at a time (that appears to be around 300ms of the
900ms or so total time).  I'm certainly interested in improving that but
that would require adding more complex data structures to pg_dump than
what we use currently (we'd want to grab all of the columns we care
about in an entire schema and store it locally and then provide a way to
look up that information, etc), so I'm not sure it'd be appropriate to
do now.

I'll look into it again once I've addressed the rest of the issues and
add the TAP-based tests which I've also been working on to actually get
direct testing of pg_dump in the regression suite.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-22 Thread Robert Haas

On Fri, Apr 22, 2016 at 10:07 AM, Joshua D. Drake  
wrote:
> This is the problem right here.
>
> We should be shipping for a reasonable production configuration. It is not
> reasonable to assume that someone is going to be running on a Rasberry Pi 2.
> Yes, we can effectively run on that platform that doesn't mean it should be
> our default target configuration. Consider that a 5.00/mo Digital Ocean VM
> is going to outperform a Rasberry Pi.

I don't disagree with that, and I think there is a considerable amount
of work that could be done to create a saner "out of the box"
configuration.  But I don't think that the two weeks before beta is
the right time to start building a consensus around what that might
look like.

> True, but isn't that also what context switching and (possibly)
> hyperthreading are for?

Sure.  What you should expect, though, is that overall system
throughput will be higher if the system is not oversubscribed.  You
can use parallel query selectively to speed up certain queries even if
that takes you above the number of CPUs you have; if those queries are
on a deadline, finishing them sooner may be worth whatever you lose in
overall throughput.

> I think your argument sounds more like a production solution, not a Beta
> solution. We should be pushing it a little bit in Beta.

Shipping with max_parallel_workers=2 *is* pushing it a little bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-22 Thread Tom Lane

Michael Paquier  writes:
> On Fri, Apr 22, 2016 at 10:49 PM, Tom Lane  wrote:
>> How far back are we thinking of supporting VS2015, anyway?  I can check
>> and push this as a separate patch.

> My guess is 9.5: HEAD + last stable branch. That's what has been done
> when support for VS2012 or VS2013 has been added.

OK.  Pushed with some adjustments --- mainly, I undid the conversion
of the non-bool-returning functions, since those aren't broken (yet)
and I think it's still valuable to test V0 as much as we can here.

Also, I static-ified some functions that weren't SQL-visible, and
thereby discovered that some of those were actually dead code ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-22 Thread Robert Haas

On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander  wrote:
> Note that we have not marked them as deprecated. We're just giving warnings
> that they will be deprecated.

But I think that is being said here is that maybe they won't be
deprecated, at least not any time soon.  And therefore maybe we
shouldn't say so.

Frankly, I think that's right.  It is one thing to say that the new
method is preferred - +1 for that.  But the old method is going to
continue to be used by many people for a long time, and in some cases
will be superior.  That's not something we can deprecate, unless I'm
misunderstanding the situation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-22 Thread Andrew Dunstan




On 04/22/2016 02:46 AM, Michael Paquier wrote:



Progress report:

1. My VS 2015 installations (I now have several) all generate solution file
with:
Microsoft Visual Studio Solution File, Format Version 12.00
so I propose to set that as the solution file version.

I am wondering why it happens this way for you..



It's not just me. See the reply at 
 
and notice that in both cases the Solution file format is version 12.00.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-22 Thread Joshua D. Drake


On 04/22/2016 06:47 AM, Robert Haas wrote:

On Fri, Apr 22, 2016 at 9:33 AM, Tom Lane  wrote:

Robert Haas  writes:

On Thu, Apr 21, 2016 at 7:20 PM, Tom Lane  wrote:

Is that because max_worker_processes is only 8 by default?  Maybe we
need to raise that, at least for beta purposes?



I'm not really in favor of that.  I mean, almost all of our default
settings are optimized for running PostgreSQL on, for example, a
Raspberry Pi 2, so it would seem odd to suddenly swing the other
direction and assume that there are more than 8 unused CPU cores.


This is the problem right here.

We should be shipping for a reasonable production configuration. It is 
not reasonable to assume that someone is going to be running on a 
Rasberry Pi 2. Yes, we can effectively run on that platform that doesn't 
mean it should be our default target configuration. Consider that a 
5.00/mo Digital Ocean VM is going to outperform a Rasberry Pi.




It is much less likely to be true for parallel workers.  The reason
why those processes aren't contending for the CPU at the same time is
generally that most of the connections are in fact idle.  But a
parallel worker is never idle.  It is launched when it is needed to
run a query and exits immediately afterward.  If it's not contending
for the CPU, it will be contending for I/O bandwidth, or a lock.



True, but isn't that also what context switching and (possibly) 
hyperthreading are for?




So what I'm concerned about for beta purposes is that we have a setup that
can exercise cases like, say, varying orders in which different workers
return tuples, or potential deadlocks between sibling workers.  We'd get
no coverage of that behavioral space at max_parallel_degree=1.  I'm not
really convinced that we'll get adequate coverage at
max_parallel_degree=2.


The right solution to that is for people who have the right hardware
to raise the settings, not to unleash a ridiculous set of defaults on
everyone.  I really hope that some people do serious destruction
testing of parallel query and try to break it.  For example, you could
use the parallel_degree reloption to force 100 parallel workers to
scan the same relation.   That's likely to be dog slow, but it might
well turn up some bugs.


I think your argument sounds more like a production solution, not a Beta 
solution. We should be pushing it a little bit in Beta.


JD









--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-22 Thread Michael Paquier

On Fri, Apr 22, 2016 at 10:49 PM, Tom Lane  wrote:
> How far back are we thinking of supporting VS2015, anyway?  I can check
> and push this as a separate patch.

My guess is 9.5: HEAD + last stable branch. That's what has been done
when support for VS2012 or VS2013 has been added.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-22 Thread Tom Lane

Michael Paquier  writes:
> On Fri, Apr 22, 2016 at 1:30 PM, Tom Lane  wrote:
>> If we assume that oldstyle functions returning integer are still okay,
>> which the success of the regression test case involving oldstyle_length()
>> seems to prove, then indeed seg's bool-returning functions are the only
>> hazard.

> Your assumption is right. With the patch attached for contrib/seg/
> that converts all those functions to use the V1 declaration, I am able
> to make the tests pass. As the internal shape of the functions is not
> changed and that there are no functional changes, I guess that it
> would be fine to backpatch down to where VS2015 is intended to be
> supported. Is anybody here foreseeing any problems for back-branches
> if there is such a change?

It should be fine, since converting a function to V1 makes no difference
at the SQL level --- we don't need an extension script modification.

How far back are we thinking of supporting VS2015, anyway?  I can check
and push this as a separate patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-22 Thread Robert Haas

On Fri, Apr 22, 2016 at 9:33 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Apr 21, 2016 at 7:20 PM, Tom Lane  wrote:
>>> Is that because max_worker_processes is only 8 by default?  Maybe we
>>> need to raise that, at least for beta purposes?
>
>> I'm not really in favor of that.  I mean, almost all of our default
>> settings are optimized for running PostgreSQL on, for example, a
>> Raspberry Pi 2, so it would seem odd to suddenly swing the other
>> direction and assume that there are more than 8 unused CPU cores.
>
> I'm not following why you think that max_worker_processes cannot be
> set higher than the number of cores.  By that argument, it's insane
> that we ship with max_connections = 100.  In practice it's generally
> fine, and people can get away with oversubscribing their core count
> even more than that, because it's seldom that all those processes
> are actually contending for CPU at the same time.  There are enough
> inefficiencies in our parallel-query design that the same will most
> certainly be true for parallel workers.

It is much less likely to be true for parallel workers.  The reason
why those processes aren't contending for the CPU at the same time is
generally that most of the connections are in fact idle.  But a
parallel worker is never idle.  It is launched when it is needed to
run a query and exits immediately afterward.  If it's not contending
for the CPU, it will be contending for I/O bandwidth, or a lock.

> So what I'm concerned about for beta purposes is that we have a setup that
> can exercise cases like, say, varying orders in which different workers
> return tuples, or potential deadlocks between sibling workers.  We'd get
> no coverage of that behavioral space at max_parallel_degree=1.  I'm not
> really convinced that we'll get adequate coverage at
> max_parallel_degree=2.

The right solution to that is for people who have the right hardware
to raise the settings, not to unleash a ridiculous set of defaults on
everyone.  I really hope that some people do serious destruction
testing of parallel query and try to break it.  For example, you could
use the parallel_degree reloption to force 100 parallel workers to
scan the same relation.   That's likely to be dog slow, but it might
well turn up some bugs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-22 Thread Tom Lane

Robert Haas  writes:
> On Thu, Apr 21, 2016 at 7:20 PM, Tom Lane  wrote:
>> Is that because max_worker_processes is only 8 by default?  Maybe we
>> need to raise that, at least for beta purposes?

> I'm not really in favor of that.  I mean, almost all of our default
> settings are optimized for running PostgreSQL on, for example, a
> Raspberry Pi 2, so it would seem odd to suddenly swing the other
> direction and assume that there are more than 8 unused CPU cores.

I'm not following why you think that max_worker_processes cannot be
set higher than the number of cores.  By that argument, it's insane
that we ship with max_connections = 100.  In practice it's generally
fine, and people can get away with oversubscribing their core count
even more than that, because it's seldom that all those processes
are actually contending for CPU at the same time.  There are enough
inefficiencies in our parallel-query design that the same will most
certainly be true for parallel workers.

So what I'm concerned about for beta purposes is that we have a setup that
can exercise cases like, say, varying orders in which different workers
return tuples, or potential deadlocks between sibling workers.  We'd get
no coverage of that behavioral space at max_parallel_degree=1.  I'm not
really convinced that we'll get adequate coverage at
max_parallel_degree=2.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-22 Thread Amit Kapila

On Thu, Apr 21, 2016 at 6:38 AM, Ants Aasma  wrote:
>
> On Tue, Apr 19, 2016 at 6:11 PM, Kevin Grittner  wrote:
> > On Tue, Apr 19, 2016 at 9:57 AM, Amit Kapila 
wrote:
> >> On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund 
wrote:
> >>>
> >>> FWIW, I could be kinda convinced that it's temporarily ok, if there'd
be
> >>> a clear proposal on the table how to solve the scalability issue
around
> >>> MaintainOldSnapshotTimeMapping().
> >>
> >> It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> >> takes EXCLUSIVE LWLock which seems to be a probable reason for a
performance
> >> regression.  Now, here the question is do we need to acquire that lock
if
> >> xmin is not changed since the last time value of
> >> oldSnapshotControl->latest_xmin is updated or xmin is lesser than
equal to
> >> oldSnapshotControl->latest_xmin?
> >> If we don't need it for above cases, I think it can address the
performance
> >> regression to a good degree for read-only workloads when the feature is
> >> enabled.
> >
> > Thanks, Amit -- I think something along those lines is the right
> > solution to the scaling issues when the feature is enabled.  For
> > now I'm focusing on the back-patching issues and the performance
> > regression when the feature is disabled, but I'll shift focus to
> > this once the "killer" issues are in hand.
>
> I had an idea I wanted to test out. The gist of it is to effectively
> have the last slot of timestamp to xid map stored in the latest_xmin
> field and only update the mapping when slot boundaries are crossed.
> See attached WIP patch for details. This way the exclusive lock only
> needs to be acquired once per minute.
>

Why at all do we need to acquire Exclusive lock if xmin is not changing at
all?  Also, I think your proposed patch can effect the update of xid's for
existing mappings.  In particular, I am talking about below code:

else if (ts <= (oldSnapshotControl->head_timestamp
+((oldSnapshotControl->count_used - 1)* USECS_PER_MINUTE)))

{

/* existing mapping; advance xid if possible */

After your patch, it might skip the update to existing mappings which
doesn't seem to be a good thing.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Truncating/vacuuming relations on full tablespaces

2016-04-22 Thread Asif Naeem

On Wed, Apr 6, 2016 at 9:06 PM, Robert Haas  wrote:

> On Wed, Apr 6, 2016 at 3:32 AM, Asif Naeem  wrote:
> >> Oh, I see.  I think it's probably not a good idea to skip truncating
> >> those maps, but perhaps the option could be defined as making no new
> >> entries, rather than ignoring them altogether, so that they never
> >> grow.  It seems that both of those are defined in such a way that if
> >> page Y follows page X in the heap, the entries for page Y in the
> >> relation fork will follow the one for page X.  So truncating them
> >> should be OK; it's just growing them that we need to avoid.
> >
> > Thank you Robert. PFA basic patch, it introduces EMERGENCY option to
> VACUUM
> > that forces to avoid extend any entries in the VM or FSM. It seems
> working
> > fine in simple test scenarios e.g.
> >
> >> postgres=# create table test1 as (select generate_series(1,10));
> >> SELECT 10
> >> postgres=# vacuum  EMERGENCY test1;
> >> VACUUM
> >> postgres=# select pg_relation_filepath('test1');
> >>  pg_relation_filepath
> >> --
> >>  base/13250/16384
> >> (1 row)
> >> [asif@centos66 inst_96]$ find . | grep base/13250/16384
> >> ./data/base/13250/16384
> >> postgres=# vacuum test1;
> >> VACUUM
> >> [asif@centos66 inst_96]$ find . | grep base/13250/16384
> >> ./data/base/13250/16384
> >> ./data/base/13250/16384_fsm
> >> ./data/base/13250/16384_vm
> >
> >
> > Please do let me know if I missed something or more information is
> required.
> > Thanks.
>
> This is too late for 9.6 at this point and certainly requires
> discussion anyway, so please add it to the next CommitFest.  But in
> the meantime, here are a few quick comments:
>

Sure. Sorry for delay caused.


> You should only support EMERGENCY using the new-style, parenthesized
> options syntax.  That way, you won't need to make EMERGENCY a keyword.
> We don't want to do that, and we especially don't want it to be
> partially reserved, as you have done here.
>

Sure. I have removed EMERGENCY keyword, it made code lot small now i.e.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index b9aeb31..89c4ee0 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -9298,6 +9298,20 @@ vacuum_option_elem:
> | VERBOSE   { $$ =
> VACOPT_VERBOSE; }
> | FREEZE{ $$ =
> VACOPT_FREEZE; }
> | FULL  { $$ =
> VACOPT_FULL; }
> +   | IDENT
> +   {
> +   /*
> +* We handle identifiers that
> aren't parser keywords with
> +* the following special-case
> codes.
> +*/
> +   if (strcmp($1, "emergency") == 0)
> +   $$ = VACOPT_EMERGENCY;
> +   else
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_SYNTAX_ERROR),
> +
>  errmsg("unrecognized vacuum option \"%s\"", $1),
> +
>  parser_errposition(@1)));
> +   }
> ;
>
>  AnalyzeStmt:


Passing the EMERGENCY flag around in a global variable is probably not
> a good idea; use parameter lists.  That's what they are for.  Also,
> calling the variable that decides whether EMERGENCY was set
> Extend_VM_FSM is confusing.
>

Sure. I adopted this approach by find other similar cases in the same
source code file i.e.

src/backend/commands/vacuumlazy.c

> /* A few variables that don't seem worth passing around as parameters */
> static int elevel = -1;
> static TransactionId OldestXmin;
> static TransactionId FreezeLimit;
> static MultiXactId MultiXactCutoff;
> static BufferAccessStrategy vac_strategy;


Should I modify code to use parameter lists for these variables too ?

I see why you changed the calling convention for visibilitymap_pin()
> and RecordPageWithFreeSpace(), but that's awfully invasive.  I wonder
> if there's a better way to do that, like maybe having vacuumlazy.c ask
> the VM and FSM for their length in pages and then not trying to use
> those functions for block numbers that are too large.
>
> Don't forget to update the documentation.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-22 Thread Andreas Joseph Krogh

På fredag 22. april 2016 kl. 14:56:33, skrev Robert Haas mailto:robertmh...@gmail.com>>:
On Thu, Apr 21, 2016 at 7:20 PM, Tom Lane  wrote:
 > Robert Haas  writes:
 >> On Thu, Apr 21, 2016 at 4:01 PM, Gavin Flower
 >>  wrote:
 >>> Why not 4?  As most processors now have at least 4 physical cores, & 
surely
 >>> it be more likely to flush out race conditions.
 >
 >> Because if we did that, then it's extremely likely that people would
 >> end up writing queries that are faster only if workers are present,
 >> and then not get any workers.
 >
 > Is that because max_worker_processes is only 8 by default?  Maybe we
 > need to raise that, at least for beta purposes?

 I'm not really in favor of that.  I mean, almost all of our default
 settings are optimized for running PostgreSQL on, for example, a
 Raspberry Pi 2, so it would seem odd to suddenly swing the other
 direction and assume that there are more than 8 unused CPU cores.  It
 doesn't make sense to me to roll out settings in beta that we wouldn't
 be willing to release with if they work out.  That's why, honestly, I
 would prefer max_parallel_degree=1, which I think would be practical
 for many real-world deployments.  max_parallel_degree=2 is OK.  Beyond
 that, we're just setting people up to fail, I think.  Higher settings
 should probably only be used on substantial hardware, and not
 everybody has that.

Maybe it's time to ask the question if the settings should be optimized more 
for high-end HW and not som matchstick-box? I mean, most of the people I know 
who are responsible for databases run them on HW colser to high-end than 
low-end. I'm not sure why optimizing for low-end is such a great choice.

-- Andreas Joseph Krog

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-22 Thread Amit Kapila

On Wed, Apr 20, 2016 at 7:39 PM, Andres Freund  wrote:
>
> On 2016-04-19 20:27:31 +0530, Amit Kapila wrote:
> > On Sun, Apr 17, 2016 at 2:26 AM, Andres Freund 
wrote:
> > >
> > > On 2016-04-16 16:44:52 -0400, Noah Misch wrote:
> > > > That is more controversial than the potential ~2% regression for
> > > > old_snapshot_threshold=-1.  Alvaro[2] and Robert[3] are okay
releasing
> > > > that way, and Andres[4] is not.
> > >
> > > FWIW, I could be kinda convinced that it's temporarily ok, if there'd
be
> > > a clear proposal on the table how to solve the scalability issue
around
> > > MaintainOldSnapshotTimeMapping().
> > >
> >
> > It seems that for read-only workloads, MaintainOldSnapshotTimeMapping()
> > takes EXCLUSIVE LWLock which seems to be a probable reason for a
> > performance regression.
>
> Yes, that's the major problem.
>
>
> > Now, here the question is do we need to acquire that lock if xmin is
> > not changed since the last time value of
> > oldSnapshotControl->latest_xmin is updated or xmin is lesser than
> > equal to oldSnapshotControl->latest_xmin?  If we don't need it for
> > above cases, I think it can address the performance regression to a
> > good degree for read-only workloads when the feature is enabled.
>
> I think the more fundamental issue is that the time->xid mapping is
> built at GetSnapshotData() time (via MaintainOldSnapshotTimeMapping()),
> and not when xids are assigned. Snapshots are created a lot more
> frequently in nearly all use-cases than xids are assigned.  That's what
> forces the exclusive lock to be in the read path, rather than the write
> path.
>
> What's the reason for this?
>

I don't see any particular reason for doing so, but not sure if it will be
beneficial in all kind of cases if we build that mapping when xids are
assigned.   As an example, consider the case where couple of write
transactions start at same time and immediately after that a read statement
is executed, now for all-those write-transactions we need to take Exclusive
lock to build an oldsnaphot entry, whereas with the above optimization
suggested by me, it needs to take Exclusive lock just once for
read-statement.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-04-22 Thread Kevin Grittner

On Thu, Apr 21, 2016 at 4:13 PM, Kevin Grittner  wrote:

> I have your test case running, and it is not immediately
> clear why old rows are not being vacuumed away.

I have not found the reason that the vacuuming is not as aggressive
as it should be with this old_snapshot_threshold, but I left your
test case running overnight and found that it eventually did kick
in.  So the question is why it was not nearly as aggressive as one
would expect.

>From the server log:

kgrittn@Kevin-Desktop:~/pg/master$ grep -B2 -A3 'tuples: [1-9]'
Debug/data/pg_log/postgresql.log
[2016-04-21 16:21:29.658 CDT] LOG:  automatic vacuum of table
"kgrittn.public.high_throughput": index scans: 1
pages: 0 removed, 2759 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 94 removed, 159928 remain, 158935 are dead but not yet removable
buffer usage: 6005 hits, 0 misses, 8 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.090 MB/s
system usage: CPU 0.00s/0.08u sec elapsed 0.69 sec
--
[2016-04-21 16:55:31.971 CDT] LOG:  automatic vacuum of table
"kgrittn.pg_catalog.pg_statistic": index scans: 1
pages: 0 removed, 23 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 2 removed, 515 remain, 128 are dead but not yet removable
buffer usage: 50 hits, 11 misses, 14 dirtied
avg read rate: 4.048 MB/s, avg write rate: 5.152 MB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec
--
[2016-04-22 00:33:11.978 CDT] LOG:  automatic vacuum of table
"kgrittn.pg_catalog.pg_statistic": index scans: 1
pages: 0 removed, 68 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 1016 removed, 409 remain, 22 are dead but not yet removable
buffer usage: 89 hits, 127 misses, 111 dirtied
avg read rate: 1.478 MB/s, avg write rate: 1.292 MB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.67 sec
[2016-04-22 00:33:18.572 CDT] LOG:  automatic vacuum of table
"kgrittn.public.high_throughput": index scans: 1
pages: 0 removed, 20196 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 292030 removed, 3941 remain, 3553 are dead but not yet removable
buffer usage: 68541 hits, 14415 misses, 20638 dirtied
avg read rate: 1.674 MB/s, avg write rate: 2.396 MB/s
system usage: CPU 0.23s/1.09u sec elapsed 67.29 sec
--
[2016-04-22 00:52:13.013 CDT] LOG:  automatic vacuum of table
"kgrittn.public.high_throughput": index scans: 1
pages: 0 removed, 20233 remain, 0 skipped due to pins, 19575
skipped frozen
tuples: 8463 removed, 30533 remain, 28564 are dead but not yet removable
buffer usage: 8136 hits, 4 misses, 158 dirtied
avg read rate: 0.027 MB/s, avg write rate: 1.065 MB/s
system usage: CPU 0.00s/0.03u sec elapsed 1.15 sec
--
[2016-04-22 01:28:22.812 CDT] LOG:  automatic vacuum of table
"kgrittn.pg_catalog.pg_statistic": index scans: 1
pages: 0 removed, 68 remain, 0 skipped due to pins, 44 skipped frozen
tuples: 26 removed, 760 remain, 108 are dead but not yet removable
buffer usage: 37 hits, 27 misses, 12 dirtied
avg read rate: 4.963 MB/s, avg write rate: 2.206 MB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.04 sec
--
[2016-04-22 06:51:23.042 CDT] LOG:  automatic vacuum of table
"kgrittn.pg_catalog.pg_statistic": index scans: 1
pages: 0 removed, 68 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 692 removed, 403 remain, 16 are dead but not yet removable
buffer usage: 90 hits, 109 misses, 76 dirtied
avg read rate: 1.646 MB/s, avg write rate: 1.148 MB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.51 sec
[2016-04-22 06:52:45.174 CDT] LOG:  automatic vacuum of table
"kgrittn.public.high_throughput": index scans: 1
pages: 0 removed, 28152 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 928116 removed, 14021 remain, 14021 are dead but not
yet removable
buffer usage: 88738 hits, 33068 misses, 45857 dirtied
avg read rate: 1.811 MB/s, avg write rate: 2.511 MB/s
system usage: CPU 0.43s/1.93u sec elapsed 142.68 sec
--
[2016-04-22 06:53:23.665 CDT] LOG:  automatic vacuum of table
"kgrittn.public.high_throughput": index scans: 1
pages: 0 removed, 28313 remain, 0 skipped due to pins, 27853
skipped frozen
tuples: 43 removed, 9002 remain, 8001 are dead but not yet removable
buffer usage: 9795 hits, 22 misses, 28 dirtied
avg read rate: 0.159 MB/s, avg write rate: 0.202 MB/s
system usage: CPU 0.00s/0.03u sec elapsed 1.08 sec
--
[2016-04-22 07:22:25.240 CDT] LOG:  automatic vacuum of table
"kgrittn.public.high_throughput": index scans: 1
pages: 0 removed, 28313 remain, 0 skipped due to pins, 25627
skipped frozen
tuples: 51 removed, 149639 remain, 147733 are dead but not yet removable
buffer usage: 14227 hits, 18 misses, 15 dirtied
avg read rate: 0.089 MB/s, avg write rate: 0.074 MB/s

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-22 Thread Robert Haas

On Thu, Apr 21, 2016 at 7:20 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Apr 21, 2016 at 4:01 PM, Gavin Flower
>>  wrote:
>>> Why not 4?  As most processors now have at least 4 physical cores, & surely
>>> it be more likely to flush out race conditions.
>
>> Because if we did that, then it's extremely likely that people would
>> end up writing queries that are faster only if workers are present,
>> and then not get any workers.
>
> Is that because max_worker_processes is only 8 by default?  Maybe we
> need to raise that, at least for beta purposes?

I'm not really in favor of that.  I mean, almost all of our default
settings are optimized for running PostgreSQL on, for example, a
Raspberry Pi 2, so it would seem odd to suddenly swing the other
direction and assume that there are more than 8 unused CPU cores.  It
doesn't make sense to me to roll out settings in beta that we wouldn't
be willing to release with if they work out.  That's why, honestly, I
would prefer max_parallel_degree=1, which I think would be practical
for many real-world deployments.  max_parallel_degree=2 is OK.  Beyond
that, we're just setting people up to fail, I think.  Higher settings
should probably only be used on substantial hardware, and not
everybody has that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw : Not able to update foreign table referring to a local table's view when use_remote_estimate = true

2016-04-22 Thread Robert Haas

On Fri, Apr 22, 2016 at 8:44 AM, Rajkumar Raghuwanshi
 wrote:
> I observed below in postgres_fdw.
>
> Observation: Update a foreign table which is referring to a local table's
> view (with use_remote_estimate = true) getting failed with below error.
> ERROR:  column "ctid" does not exist
> CONTEXT:  Remote SQL command: EXPLAIN SELECT c1, ctid
> FROM public.lt_view FOR UPDATE
>
> create extension postgres_fdw;
> create server link_server foreign data wrapper postgres_fdw options (host
> 'localhost',dbname 'postgres', port '5447');
> create user mapping for public server link_server;
>
> create table lt (c1 integer, c2 integer);
> insert into lt values (1,null);
> create view lt_view as select * from lt;
> create foreign table ft (c1 integer,c2 integer) server link_server options
> (table_name 'lt_view');
>
> --alter server with use_remote_estimate 'false'
> alter server link_server options (add use_remote_estimate 'false');
> --update foreign table refering to local view -- able to update
> update ft set c2 = c1;
> UPDATE 1
>
> --alter server with use_remote_estimate 'true'
> alter server link_server options (SET use_remote_estimate 'true');
> --update foreign table refering to local view -- fail, throwing error
> update ft set c2 = c1;
> psql:/home/edb/Desktop/edb_work/Postgres_Fdw/dml_pushdown_35882/observation_view.sql:24:
> ERROR:  column "ctid" does not exist
> CONTEXT:  Remote SQL command: EXPLAIN SELECT c1, ctid FROM public.lt_view
> FOR UPDATE

Hmm, interesting.  Offhand, I don't really see how to make that case
work: postgres_fdw's UPDATE support supposes that the remote relation
has CTIDs.  If it doesn't, we're out of luck.  The "direct update"
mode might work if we can get that far, but here we're bombing out
during the planning phase, so we never have a chance to try it.

I wouldn't say this is a bug, exactly; more like an unsupported case.
It would be nice to make it work, though, if someone can figure out
how.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgres_fdw : Not able to update foreign table referring to a local table's view when use_remote_estimate = true

2016-04-22 Thread Rajkumar Raghuwanshi

Hi,

I observed below in postgres_fdw.

*Observation:* Update a foreign table which is referring to a local table's
view (with use_remote_estimate = true) getting failed with below error.
ERROR:  column "ctid" does not exist
CONTEXT:  Remote SQL command: EXPLAIN SELECT c1, ctid
FROM public.lt_view FOR UPDATE

create extension postgres_fdw;
create server link_server foreign data wrapper postgres_fdw options (host
'localhost',dbname 'postgres', port '5447');
create user mapping for public server link_server;

create table lt (c1 integer, c2 integer);
insert into lt values (1,null);
create view lt_view as select * from lt;
create foreign table ft (c1 integer,c2 integer) server link_server options
(table_name 'lt_view');

--alter server with use_remote_estimate 'false'
alter server link_server options (add use_remote_estimate 'false');
--update foreign table refering to local view -- able to update
update ft set c2 = c1;
UPDATE 1

--alter server with use_remote_estimate 'true'
alter server link_server options (SET use_remote_estimate 'true');
--update foreign table refering to local view -- fail, throwing error
update ft set c2 = c1;
psql:/home/edb/Desktop/edb_work/Postgres_Fdw/dml_pushdown_35882/observation_view.sql:24:
ERROR:  column "ctid" does not exist
CONTEXT:  Remote SQL command: EXPLAIN SELECT c1, ctid FROM public.lt_view
FOR UPDATE

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

>

Re: [HACKERS] Missing lookup in msvcr120 for pgwin32_putenv

2016-04-22 Thread Magnus Hagander

On Thu, Apr 21, 2016 at 11:16 PM, Michael Paquier  wrote:

> Hi all,
>
> While looking at e545281 I bumped into the following thing that has
> visibly been forgotten since VS2013 support has been added:
> --- a/src/port/win32env.c
> +++ b/src/port/win32env.c
> @@ -67,6 +67,9 @@ pgwin32_putenv(const char *envval)
> "msvcr110", 0, NULL
> },  /* Visual Studio 2012 */
> {
> +   "msvcr120", 0, NULL
> +   },  /* Visual Studio 2013 */
> +   {
> NULL, 0, NULL
> }
> };
>
> Attached is a patch. This should be backpatched to 9.3 where, if I
> recall correctly, support for VS2013 has been added.
>
>
Applied and backpatched. Thanks!


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] kqueue

2016-04-22 Thread Thomas Munro

On Fri, Apr 22, 2016 at 12:21 PM, Andres Freund  wrote:
> On 2016-04-21 14:25:06 -0400, Robert Haas wrote:
>> On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund  wrote:
>> > On 2016-04-21 14:15:53 -0400, Robert Haas wrote:
>> >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
>> >>  wrote:
>> >> > On the WaitEventSet thread I posted a small patch to add kqueue
>> >> > support[1].  Since then I peeked at how some other software[2]
>> >> > interacts with kqueue and discovered that there are platforms
>> >> > including NetBSD where kevent.udata is an intptr_t instead of a void
>> >> > *.  Here's a version which should compile there.  Would any NetBSD
>> >> > user be interested in testing this?  (An alternative would be to make
>> >> > configure to test for this with some kind of AC_COMPILE_IFELSE
>> >> > incantation but the steamroller cast is simpler.)
>> >>
>> >> Did you code this up blind or do you have a NetBSD machine yourself?
>> >
>> > RMT, what do you think, should we try to get this into 9.6? It's
>> > feasible that the performance problem 98a64d0bd713c addressed is also
>> > present on free/netbsd.
>>
>> My personal opinion is that it would be a reasonable thing to do if
>> somebody can demonstrate that it actually solves a real problem.
>> Absent that, I don't think we should rush it in.
>
> On linux you needed a 2 socket machine to demonstrate the problem, but
> both old ones (my 2009 workstation) and new ones were sufficient. I'd be
> surprised if the situation on freebsd is any better, except that you
> might hit another scalability bottleneck earlier.
>
> I doubt there's many real postgres instances operating on bigger
> hardware on freebsd, with sufficient throughput to show the problem. So
> I think the argument for including is more along trying to be "nice" to
> more niche-y OSs.

What has BSD ever done for us?!  (Joke...)

I vote to leave this patch in the next commitfest where it is, and
reconsider if someone shows up with a relevant problem report on large
systems.  I can't see any measurable performance difference on a 4
core laptop running FreeBSD 10.3.  Maybe kqueue will make more
difference even on smaller systems in future releases if we start
using big wait sets for distributed/asynchronous work, in-core
pooling/admission control etc.

Here's a new version of the patch that fixes some stupid bugs.  I have
run regression tests and some basic sanity checks on OSX 10.11.4,
FreeBSD 10.3, NetBSD 7.0 and OpenBSD 5.8.  There is still room to make
an improvement that would drop the syscall from AddWaitEventToSet and
ModifyWaitEvent, compressing wait set modifications and waiting into a
single syscall (kqueue's claimed advantage over the competition).

While doing that I discovered that unpatched master doesn't actually
build on recent NetBSD systems because our static function strtoi
clashes with a non-standard libc function of the same name[1] declared
in inttypes.h.  Maybe we should rename it, like in the attached?

[1] http://netbsd.gw.com/cgi-bin/man-cgi?strtoi++NetBSD-current

-- 
Thomas Munro
http://www.enterprisedb.com

rename-strtoi.patch
Description: Binary data

kqueue-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wire protocol compression

2016-04-22 Thread Craig Ringer

On 21 April 2016 at 22:21, Andreas Karlsson  wrote:

> Wouldn't such a solution be just as vulnerable to CRIME as TLS is? I
> thought the reason for removing compression from TLS is to discourage
> people from writing applications which are vulnerable to compression based
> attacks by not proving an easy for people to just compress everything.
> 
>

Probably... but you could then use compression without encryption without
hacks like forcing a noop cipher. If you're running traffic over a VPN WAN
link or something that could be a pretty sensible option. OTOH, such a VPN
may have its own compression, rendering compression by Pg unnecessary. The
downside is that the VPN overheads can be considerable as can the general
performance impact.

Personally I admit I just don't care that much about the CRIME attack for
most of the deployments I'm interested in. It requires the attacker be able
to make alterations on the server.

"The attacker then observes the change in size of the compressed request
payload, which contains both the secret cookie that is sent by the browser
only to the target site, and variable content created by the attacker, as
the variable content is altered."

https://en.wikipedia.org/wiki/CRIME

I'm not especially concerned that authorized user 'unpriv-1' can hijack
user 'super' 's connections if unpriv-1 can somehow modify tables super is
reading *and* snoop super's traffic, doing it all on a tight schedule.
We've probably got bigger security issues than that.

Our auth salting helps a lot anyway, and while there are situations where a
privileged user could be fetching some slow-changing short and
security-critical content repeatedly from a table as part of a join the
unprivileged user can modify data in, I'm again just not that concerned.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Fix of doc for synchronous_standby_names.

2016-04-22 Thread Amit Langote

Horiguchi-san,

On 2016/04/22 14:21, Kyotaro HORIGUCHI wrote:
> I came to think that both of you are misunderstanding how
> synchronous standbys are choosed so I'd like to clarify the
> behavior.

I certainly had a different (and/or wrong) idea in mind about how this
works.  Thanks a lot for clarifying.  I'm still a little confused, so
please see below.

> Kyotaro HORIGUCHI  wrote:
 But this particular sentence seems to be talking
 about what's the case for any given slot.
>>>
>>> Right, that's my reading also.
> 
> In SyncRepInitConfig, every walsender sets sync_standby_priority
> by itself. The priority value is the index of its
> application_name in s_s_names list (1 based).
> 
> When a walsender receives a feedback from walreceiver, it may
> release backends waiting for certain LSN to be secured.
> 
> First, SyncRepGetSyncStandbys collects active standbys. Then it
> loops from the hightest priority value to pick up all of the
> active standbys for each priority value until all of the seats
> are occupied. Then SyncRepOldestSyncRepPtr calculates the oldest
> LSNs only among the standbys SyncRepGetSyncStandbys
> returned. Finally, it releases backends using the LSNs.
> 
> In short, every 'slot' in s_s_names can corresponds to two or
> more *synchronous* standbys.
> 
> The resulting behavior is as the following.
> 
>> I don't certainly understnd what the 'sync slot' means. If it
>> means a name in a replication set description, that is, 'nameN'
>> in the following setting of s_s_names.
>>
>> '2(name1, name2, name3)'
>>
>> There may be two or more duplicates even in the
>> single-sync-age. But only one synchronous standby was allowed so
>> any 'sync slot' may have at least one matching synchronous
>> standby in the single-sync-age. This is what I see in the
>> sentense. Is this wrong?
>>
>> Now, we can have multiple synchronous standbys so, for example,
>> if three standbys with the name 'name1', two of them are choosed
>> as synchronous. This is a new behavior in the multi-sync-age and
>> syncrep.c has been changed so as to do so.
>>
>> For a supplemnet, the following case.
>>
>> '5(name1, name2, name3)'
>>
>> and the following standbys
>>
>> (name1, name1, name2, name2, name3, name3)
> 
> This was a bad example,
> 
> (name1, name1, name2, name2, name2, name2, name3)
> 
> For this case, the followings are choosed as synchornous standby.
> 
> (name1, name1, name2, name2, name2)
> 
> Three of the four name2s are choosed but which name2s is an
> implement matter.
> 
>> # However, 5 for three names causes a warning..

OK, I see.  I tried to understand it (by carrying it out) using the
following example.

synchronous_standby_names: '3 (aa, aa, bb, bb, cc)'

Individual names get assigned the following priorities based on their
position in the list:

aa: 1, bb: 3, cc: 5

Then standbys connect one-by-one, say the following (host and
application_name):

host1: aa
host2: aa
host3: bb
host4: bb
host5: cc

syncrep.c will assign following priorities (also shown is their sync status)

host1: aa: 1 (sync)
host2: aa: 1 (sync)
host3: bb: 3 (sync)
host4: bb: 3 (potential)
host5: cc: 5 (potential)

As also evident in pg_stat_replication:

SELECT application_name, sync_priority, sync_state FROM
pg_stat_replication ORDER BY 2;
 application_name | sync_priority | sync_state
--+---+
 aa   | 1 | sync
 aa   | 1 | sync
 bb   | 3 | sync
 bb   | 3 | potential
 cc   | 5 | potential
(5 rows)

Hm, I see (though I wondered if one of the 'aa's should have been assigned
priority 2 and likewise for 'bb').

Following is the documentation paragraph under scrutiny:

"""
The name of a standby server for this purpose is the
application_name setting of the standby, as set in the
primary_conninfo of the standby's WAL receiver.  There is
no mechanism to enforce uniqueness. In case of duplicates one of the
matching standbys will be considered as higher priority, though
exactly which one is indeterminate.
"""

Consider the name 'aa', it turns out that there are in fact 2 connected
standbys with that name - so duplicate.  Although, both got assigned
priority 1 (IOW, neither of them is considered higher priority than the
other as the sentence above seems to suggest).  2/3 sync spots are now
occupied.

Then let's look at 'bb'.  Here (maybe) it makes sense.  Two standbys named
'bb' show up, but only 1/3 sync spot is now left.  So, one of them is
counted as sync whereas the other becomes a potential sync.  So, what is
indeterminate is which (standby) becomes which (sync/potential).  However,
it isn't which one gets *higher priority* that is indeterminate as the
concerned sentence says (both got the same priority viz. 3).

I'm kind of confused.  Should the sentence say something else altogether?

By the way, I see the following in comment header of SyncRepGetSyncStandbys()

 *
 * If there

Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-22 Thread Aleksander Alekseev

> > And the above is called an ad-hominem.
> 
> An "ad hominem" attack means against the person rather than on the
> topic of the issue, but I don't think Aleksander did that.  I'm not
> sure why you think what he wrote was out of line.  It reads OK to me.

Frankly when I re-read my own e-mails sometimes I find them a little
bit dry to say the least. But it's not intentional. I'm just having some
difficulties expressing myself on my second language. I should probably
use more words like "please" and "thank you" to smooth this effect. My
sincere apologies to anyone who was offended in any way. 

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wire protocol compression

2016-04-22 Thread Craig Ringer

On 21 April 2016 at 22:07, Tom Lane  wrote:

>
> Yeah.  I think this should definitely be in the list of things we want
> to add when we do the fabled 4.0 protocol revision (and, indeed, it's
> been in the above-cited list for awhile).  Whether we've yet gotten to
> the point of having critical mass for a revision ... meh, I doubt it.
>
>
Well, the critical mass is determined as much as anything by who's willing
to put the large amount of work into implementing the protocol rev, testing
clients, verifying performance, etc etc etc. So far I don't think anyone's
leaping to volunteer.

We could use various hacks to enable wire compression without a protocol
rev, like sending a gzip header before the startup packet so a server that
doesn't understand it just sees a nonsensical startup packet and borks. But
they all either boil down to uglier versions of genuine protocol bump or
are potentially subject to the issues you raised earlier with
GUC-controlled protocol changes.

In retrospect it's a huge pity there was no set of client-library-only
key/value fields defined in the startup packet in addition to GUCs that
libpq lets clients set via PGOPTIONS etc. That would've let us extend the
protocol much more easily by sending capabilities like "I understand
protocol compression".

Personally I think we should do that anyway - allow a startup-only GUC like
proto_compression=gzip .

To help protect against abuse via

  PGOPTIONS="-c proto_compression=gzip" ./exploitable-old-libpq-based-binary

we could do something like sending an ErrorResponse with a newly defined
SQLSTATUS like 08P02 Client must support gzip compression . A client that
supports gzip may ignore this error and proceed normally. A client that
doesn't will not recognise the SQLSTATUS and since it's in a fatal category
08 (or rather, not the two non-fatal categories 00, 01 and 02) it'll die
and won't see any upsetting new protocol messages. Clients that don't
support it will even get a nice error message:

psql: FATAL:  unrecognized configuration parameter "proto_compression"

It's really just a hack around bumping the protocol to add capability
negotiation though, and it'd scale very poorly if it was then used for more
things. Lots of reconnections needed because we can't do a two-way
negotiation.

I think retrofitting protocol compression might be compelling enough to
justify that. But it's not really significantly better than just sending a
v4 protoversion field and reconnecting if the server gets upset with that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Why doesn't src/backend/port/win32/socket.c implement bind()?

2016-04-22 Thread Michael Paquier

On Thu, Apr 21, 2016 at 11:46 PM, Tom Lane  wrote:
> I wrote:
>> Michael Paquier  writes:
>>> And this gives the patch attached, just took the time to hack it.
>
>> I think this is a good idea, but (1) I'm inclined not to restrict it to
>> Windows, and (2) I think we should hold off applying it until we've seen
>> a failure or two more, and can confirm whether d1b7d4877 does anything
>> useful for the error messages.
>
> OK, we now have failures from both bowerbird and jacana with the error
> reporting patch applied:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39
>
> and they both boil down to this:
>
> pg_ctl: could not start server
> Examine the log output.
> # pg_ctl failed; logfile:
> LOG:  could not bind IPv4 socket: Permission denied
> HINT:  Is another postmaster already running on port 60200? If not, wait a 
> few seconds and retry.
> WARNING:  could not create listen socket for "127.0.0.1"
> FATAL:  could not create any TCP/IP sockets
> LOG:  database system is shut down
>
> So "permission denied" is certainly more useful than "no error", which
> makes me feel that d1b7d4877+22989a8e3 are doing what they intended to
> and should get back-patched --- any objections?

+1. That's useful in itself.

> However,
> [...]
>
> So theory A is that some other program is binding random high port numbers
> with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
> Windows antivirus software doing what Windows antivirus software typically
> does, ie inject random permissions failures depending on the phase of the
> moon.  It's not very clear that a test along the lines described (that is,
> attempt to connect to, not bind to, the target port) would pre-detect
> either type of error.  Under theory A, a connect() test would recognize
> the problem only if the other program were using the port to listen rather
> than make an outbound connection; and the latter seems much more likely.
> (Possibly we could detect the latter case by checking the error code
> returned by connect(), but Michael's proposed patch does no such thing.)

Perl's connect() can be made more chatty. $! returns the error string,
$!+0 the errno. With the patch I sent previously, we'd need to change
this portion:
+   socket(SOCK, PF_INET, SOCK_STREAM, $proto) or die;
+   $found = 0 if connect(SOCK, $paddr);
+   close(SOCK);
Basically, that would something like that, which would be still better
than nothing I think:
if (!connect())
{
 print 'connect error = ', $!, '\n';
}
Honestly, I think even if we will never reach perfection here,
something like my previous patch would still allow us to make the
tests more reliable on a platform where services listen to localhost.

> Under theory B, we're pretty much screwed, we don't know what will happen.

Indeed. If things are completely random, there is nothing guaranteeing
us that a connect() failing at instant T, meaning that a port is
available at this moment, is not going to be taken at moment (T+1)
because of the window between which the free port is checked and
postgres is going to bind this port. If we free up the port just
before starting Postgres there would be a reduced failure window,
still that cannot be reduced to 0.

> BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would
> have failed to translate it --- surely that's an oversight?

Yes, and I can see you fixed that with 125ad53 already.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump dump catalog ACLs

2016-04-22 Thread Peter Geoghegan

On Fri, Apr 22, 2016 at 12:25 AM, Noah Misch  wrote:
> Folks run clusters with ~1000 databases; we previously accepted at least one
> complex performance improvement[1] based on that use case.  On the faster of
> the two machines I tested, the present thread's commits slowed "pg_dumpall
> --schema-only --binary-upgrade" by 1-2s per database.  That doubles pg_dump
> runtime against the installcheck regression database.  A run against a cluster
> of one hundred empty databases slowed fifteen-fold, from 8.6s to 131s.
> "pg_upgrade -j50" probably will keep things tolerable for the 1000-database
> case, but the performance regression remains jarring.  I think we should not
> release 9.6 with pg_dump performance as it stands today.

As someone that is responsible for many such clusters, I strongly agree.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump dump catalog ACLs

2016-04-22 Thread Noah Misch

On Wed, Apr 20, 2016 at 10:50:21PM -0400, Stephen Frost wrote:
> On Wednesday, April 20, 2016, Noah Misch  wrote:
> > On Wed, Apr 20, 2016 at 11:12:44AM -0400, Stephen Frost wrote:
> > > > On Sun, Apr 17, 2016 at 11:02:28PM -0400, Noah Misch wrote:
> > > > > (3) pg_dumpall became much slower around the time of these commits.  
> > > > > On one
> > > > > machine (POWER7 3.55 GHz), a pg_dumpall just after initdb slowed from 
> > > > > 0.25s at
> > > > > commit 6c268df^ to 4.0s at commit 7a54270.  On a slower machine 
> > > > > (Opteron
> > > > > 1210), pg_dumpall now takes 19s against such a fresh cluster.

> > > the additional time for
> > > pg_dump is due to the queries looking at the catalog objects and is
> > > therefore relatively fixed and is primairly only a large amount of the
> > > time when dumping databases which are mostly empty.
> >
> > Do you think it would be okay to release 9.6 with pg_dump still adding that
> > amount of time per database?
> 
> For my 2c, the answer is "yes". I've actually looked at how this could be
> improved using a bit of caching in pg_dump for certain things, but I didn't
> think those would be appropriate to include in this patch and would be a
> general pg_dump performance improvement.
> 
> I'm certainly open to improving these issues now if we agree that they
> should be fixed for 9.6.  If we don't want to include such changes in 9.6
> then I will propose then for post-9.6.

Folks run clusters with ~1000 databases; we previously accepted at least one
complex performance improvement[1] based on that use case.  On the faster of
the two machines I tested, the present thread's commits slowed "pg_dumpall
--schema-only --binary-upgrade" by 1-2s per database.  That doubles pg_dump
runtime against the installcheck regression database.  A run against a cluster
of one hundred empty databases slowed fifteen-fold, from 8.6s to 131s.
"pg_upgrade -j50" probably will keep things tolerable for the 1000-database
case, but the performance regression remains jarring.  I think we should not
release 9.6 with pg_dump performance as it stands today.

[1] 
http://www.postgresql.org/message-id/flat/1718942738eb65c8407fcd864883f...@fuzzy.cz


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-22 Thread Peter Geoghegan

On Thu, Nov 5, 2015 at 2:44 PM, Jeff Janes  wrote:
> The bug theoretically exists in 9.5, but it wasn't until 9.6 (commit
> e95680832854cf300e64c) that free pages were recycled aggressively
> enough that it actually becomes likely to be hit.

In other words: The bug could be in 9.5, but that hasn't been
conclusively demonstrated. Fair?

I'm not an expert on GIN at all; I know far more about B-Tree. But,
commit e95680832854cf300e64c seems a bit odd to me. I don't see any
argument for why it's okay that the recycling of pages can happen
immediately for the pending list, rather than requiring it to happen
at some time later with a safe interlock (some like B-Tree's use of
RecentGlobalXmin). The GIN README discusses a few such issues, but it
wasn't updated by the commit I mentioned, which I would have expected.

OTOH, after all of 10 minutes I can't see what's special about
ginvacuumcleanup() that makes its long established
RecordFreeIndexPage() call fundamentally safer, which if true would be
a surprisingly obvious defect to go unfixed for all these years. This
is what you yourself said about it, I think. I need to look at it
again with fresh eyes, but offhand having no safe interlock for the
well established RecordFreeIndexPage() call for GIN seems like an
implausibly obvious bug.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

61 matches

Mail list logo