Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Noah Misch
On Mon, May 11, 2015 at 05:33:04PM -0400, Bruce Momjian wrote:
> On Tue, May 12, 2015 at 12:29:56AM +0300, Heikki Linnakangas wrote:
> > On 05/12/2015 12:00 AM, Bruce Momjian wrote:
> > >Multi-xacts were made durable in Postgres 9.3 (released 2013-09-09) to
> > >allow primary-key-column-only locks.  1.7 years later, we are still
> > >dealing with bugs related to this feature.  Obviously, something is
> > >wrong.
> > >
> > >There were many 9.3 minor releases containing multi-xacts fixes, and
> > >these fixes have extended into 9.4.  After the first few bug-fix
> > >releases, I questioned whether we needed to revert or rework the
> > >feature, but got no positive response.  Only in the past few weeks have
> > >we got additional people involved.
> > 
> > The "revert or rework" ship had already sailed at that point. I
> 
> True.
> 
> > don't think we had much choice than just soldier through the bugs
> > after the release.
> 
> The problem is we "soldiered on" without adding any resources to the
> problem or doing a systematic review once it became clear one was
> necessary.

In both this latest emergency and the Nov-Dec 2013 one, several people showed
up and helped.  We did well in that respect, but your idea that we should have
started a systematic post-commit review is a good one.  Hoping other parts of
the change were fine, the first team dispersed.


There's a lot we might try, but I'll focus on just a couple of points.

The adversarial relationship between author and committer is an important
guarantor of quality.  For the toughest patches, the principal author should
seek out an independent committer rather than self-commit the patch.

When I want to keep a soon-to-be-committed patch's bugs out of PostgreSQL, I
can review it to find as many bugs as possible, or I can express nonspecific
mistrust.  That first option is expensive; if I did full-time patch review, I
might cover 1/4 of the changes.  That second option boils down to me telling a
committer that he is practicing bad judgment, which is painful for both of us
and unlikely to modify the patch's fate.  At the PgCon 2014 Developer Meeting,
it came out that most people had identified fklocks as the highest-risk 9.3
patch.  Here's an idea.  Shortly after the 9.5 release notes draft, let's take
a secret ballot to identify the changes threatening the most damage through
undiscovered bugs.  (Let's say the electorate consists of every committer and
every person who reviewed at least one patch during the release cycle.)
Publish the three top vote totals.  This serves a few purposes.  It sends a
message to each original committer that the community doubts his handling of
the change.  The secret ballot helps voters be honest, and seven votes against
your commit is hard to ignore.  It's a hint to that committer to drum up more
reviews and testing, to pick a simpler project next time, or even to revert.
The poll results would also help target beta testing and post-commit reviews.
For example, I would plan to complete a full post-commit review of one patch
in the list.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Leftovers after dcae5fac (improve speed of make check-world) in git tree with check-world

2015-05-11 Thread Michael Paquier
Hi all,

Since dcae5fac, we use a central path ($GIT_ROOT/tmp_install) to store
the installation deliverables during tests. Each module uses a
.gitignore with more or less those entries:
/log/
/results/
/tmp_check/

First note that /log/ is not used anymore, replaced by /tmp_check/log/
so we could just remove it. Also, as we do not ignore regression.diffs
and regression.out on purpose to check easily what has failed, what
about removing /tmp_check/ as well in those .gitignore files? It now
contains pgdata/ and log/, so it looks useful to me to show it with
git status to track it in case of failure as well.
Thoughts? Attached is a patch with all those files updated.
-- 
Michael
diff --git a/contrib/btree_gin/.gitignore b/contrib/btree_gin/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/btree_gin/.gitignore
+++ b/contrib/btree_gin/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/btree_gist/.gitignore b/contrib/btree_gist/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/btree_gist/.gitignore
+++ b/contrib/btree_gist/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/citext/.gitignore b/contrib/citext/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/citext/.gitignore
+++ b/contrib/citext/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/cube/.gitignore b/contrib/cube/.gitignore
index cb4c989..a6484a0 100644
--- a/contrib/cube/.gitignore
+++ b/contrib/cube/.gitignore
@@ -1,6 +1,4 @@
 /cubeparse.c
 /cubescan.c
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/dblink/.gitignore b/contrib/dblink/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/dblink/.gitignore
+++ b/contrib/dblink/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/dict_int/.gitignore b/contrib/dict_int/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/dict_int/.gitignore
+++ b/contrib/dict_int/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/dict_xsyn/.gitignore b/contrib/dict_xsyn/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/dict_xsyn/.gitignore
+++ b/contrib/dict_xsyn/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/earthdistance/.gitignore b/contrib/earthdistance/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/earthdistance/.gitignore
+++ b/contrib/earthdistance/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/file_fdw/.gitignore b/contrib/file_fdw/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/file_fdw/.gitignore
+++ b/contrib/file_fdw/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/hstore/.gitignore b/contrib/hstore/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/hstore/.gitignore
+++ b/contrib/hstore/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/hstore_plperl/.gitignore b/contrib/hstore_plperl/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/hstore_plperl/.gitignore
+++ b/contrib/hstore_plperl/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/hstore_plpython/.gitignore b/contrib/hstore_plpython/.gitignore
index ce6fab9..351dd41 100644
--- a/contrib/hstore_plpython/.gitignore
+++ b/contrib/hstore_plpython/.gitignore
@@ -1,6 +1,4 @@
 # Generated subdirectories
 /expected/python3/
-/log/
 /results/
 /sql/python3/
-/tmp_check/
diff --git a/contrib/intarray/.gitignore b/contrib/intarray/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/intarray/.gitignore
+++ b/contrib/intarray/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/ltree/.gitignore b/contrib/ltree/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/ltree/.gitignore
+++ b/contrib/ltree/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/ltree_plpython/.gitignore b/contrib/ltree_plpython/.gitignore
index ce6fab9..351dd41 100644
--- a/contrib/ltree_plpython/.gitignore
+++ b/contrib/ltree_plpython/.gitignore
@@ -1,6 +1,4 @@
 # Generated subdirectories
 /expected/python3/
-/log/
 /results/
 /sql/python3/
-/tmp_check/
diff --git a/contrib/pg_trgm/.gitignore b/contrib/pg_trgm/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/pg_trgm/.gitignore
+++ b/contrib/pg_trgm/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/contrib/pgcrypto/.gitignore b/contrib/pgcrypto/.gitignore
index 5dcb3ff..19b6c5b 100644
--- a/contrib/pgcrypto/.gitignore
+++ b/contrib/pgcrypto/.gitignore
@@ -1,4 +1,2 @@
 # Generated subdirectories
-/log/
 /results/
-/tmp_check/
diff --git a/con

[HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-11 Thread Craig Ringer
Hi all

For some time I've wanted a way to "SET SESSION AUTHORISATION" or "SET
ROLE" in a way that cannot simply be RESET, so that a connection may be
handed to a less-trusted service or application to do some work with.

This is most useful for connection pools, where it's currently necessary to
have a per-user pool, to trust users not to do anything naughty, or to
filter the functions and commands they can run through a whitelist to stop
them trying to escalate rights to the pooler user.

In the short term I'd like to:

* Add a WITH COOKIE option to "SET SESSION AUTHORIZATION", which takes an
app-generated code (much like PREPARE TRANSACTION does).

* Make DISCARD ALL, RESET SESSION AUTHORIZATION, etc, also take WITH
COOKIE. If session authorization was originally set with a cookie value,
the same cookie value must be passed or an ERROR will be raised when RESET
is attempted.

* A second SET SESSION AUTHORIZATION without a prior RESET would be
rejected with an ERROR if the first SET used a cookie value.

This can be done without protocol-level changes and with no backward
compatibility impact to existing applications. Any objections?


These commands will appear in the logs if log_statement = 'all', but the
codes are transient cookie values, not passwords. They'll be visible in
pg_stat_activity but only to the privileged user. It'll probably be
necessary to clear the last command string when executing SET SESSION
AUTHORIZATION so the new user can't snoop the cookie value from a
concurrent connection, but that should be simple enough.

As is currently the case, poolers will still have to use a superuser
connection if they want to pool across users.




In the longer term I want to add a protocol-level equivalent that lets a
session switch session authorization or role, for efficiency and log-spam
reasons.

I'm also interested in a way to allow SET SESSION AUTHORIZATION to a list
of permitted roles when run as a non-superuser, for connection pool use.
SET ROLE might do, but it's more visible to apps, wheras SET SESSION
AUTHORIZATION really makes the connection appear to "become" the target
user.

That's later though - first,


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

2015-05-11 Thread digoal zhou
   PostgreSQL (<=9.4) trend to smooth buffer write smooth in a
checkpoint_completion_target (checkpoint_timeout or checkpoint_segments),
but when we use synchronous_commit=off, there is a little problem for
the checkpoint_segments
target, because xlog write fast(for full page write which the first page
write after checkpoint), so checkpointer cann't sleep and write buffer not
smooth.
   There is an test:
# stap -DMAXSKIPPED=10 -v 1 -e '
global s_var, e_var, stat_var;

/* probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid,
int); */
probe process("/opt/pgsql/bin/postgres").mark("smgr__md__read__start") {
  s_var[pid(),1] = gettimeofday_us()
}

/* probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int,
int, int); */
probe process("/opt/pgsql/bin/postgres").mark("smgr__md__read__done") {
  e_var[pid(),1] = gettimeofday_us()
  if ( s_var[pid(),1] > 0 )
stat_var[pid(),1] <<< e_var[pid(),1] - s_var[pid(),1]
}

/* probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid,
int); */
probe process("/opt/pgsql/bin/postgres").mark("smgr__md__write__start") {
  s_var[pid(),2] = gettimeofday_us()
}

/* probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int,
int, int); */
probe process("/opt/pgsql/bin/postgres").mark("smgr__md__write__done") {
  e_var[pid(),2] = gettimeofday_us()
  if ( s_var[pid(),2] > 0 )
stat_var[pid(),2] <<< e_var[pid(),2] - s_var[pid(),2]
}

probe process("/opt/pgsql/bin/postgres").mark("buffer__sync__start") {
  printf("buffer__sync__start num_buffers: %d, dirty_buffers: %d\n",
$NBuffers, $num_to_write)
}

probe process("/opt/pgsql/bin/postgres").mark("checkpoint__start") {
  printf("checkpoint start\n")
}

probe process("/opt/pgsql/bin/postgres").mark("checkpoint__done") {
  printf("checkpoint done\n")
}

probe timer.s(1) {
  foreach ([v1,v2] in stat_var +) {
if ( @count(stat_var[v1,v2]) >0 ) {
  printf("r1_or_w2 %d, pid: %d, min: %d, max: %d, avg: %d, sum: %d,
count: %d\n", v2, v1, @min(stat_var[v1,v2]), @max(stat_var[v1,v2]),
@avg(stat_var[v1,v2]), @sum(stat_var[v1,v2]), @count(stat_var[v1,v2]))
}
  }

printf("--end-\n")
  delete s_var
  delete e_var
  delete stat_var
}'



Use the test table and data:
create table tbl(id primary key,info text,crt_time timestamp);
insert into tbl select generate_series(1,5000),now(),now();


Use pgbench test it.
$ vi test.sql
\setrandom id 1 5000
update tbl set crt_time=now() where id = :id ;


$ pgbench -M prepared -n -r -f ./test.sql -P 1 -c 28 -j 28 -T 1
When on schedule checkpoint occure , the tps:
progress: 255.0 s, 58152.2 tps, lat 0.462 ms stddev 0.504
progress: 256.0 s, 31382.8 tps, lat 0.844 ms stddev 2.331
progress: 257.0 s, 14615.5 tps, lat 1.863 ms stddev 4.554
progress: 258.0 s, 16258.4 tps, lat 1.652 ms stddev 4.139
progress: 259.0 s, 17814.7 tps, lat 1.526 ms stddev 4.035
progress: 260.0 s, 14573.8 tps, lat 1.825 ms stddev 5.592
progress: 261.0 s, 16736.6 tps, lat 1.600 ms stddev 5.018
progress: 262.0 s, 19060.5 tps, lat 1.448 ms stddev 4.818
progress: 263.0 s, 20553.2 tps, lat 1.290 ms stddev 4.146
progress: 264.0 s, 26223.0 tps, lat 1.042 ms stddev 3.711
progress: 265.0 s, 31953.0 tps, lat 0.836 ms stddev 2.837
progress: 266.0 s, 43396.1 tps, lat 0.627 ms stddev 1.615
progress: 267.0 s, 50487.8 tps, lat 0.533 ms stddev 0.647
progress: 268.0 s, 53537.7 tps, lat 0.502 ms stddev 0.598
progress: 269.0 s, 54259.3 tps, lat 0.496 ms stddev 0.624
progress: 270.0 s, 56139.8 tps, lat 0.479 ms stddev 0.524

The parameters for onschedule checkpoint:
checkpoint_segments = 512
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9

stap's output :
there is 156467 dirty blocks, we can see the buffer write per second, write
buffer is not smooth between time target.
but between xlog target.
156467/(4.5*60*0.9) = 579.5 write per second.


checkpoint start
buffer__sync__start num_buffers: 262144, dirty_buffers: 156467
r1_or_w2 2, pid: 19848, min: 41, max: 1471, avg: 49, sum: 425291, count:
8596
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 153, avg: 49, sum: 450597, count: 9078
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 643, avg: 51, sum: 429193, count: 8397
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 1042, avg: 55, sum: 449091, count:
8097
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 254, avg: 52, sum: 296668, count: 5617
--end-
r1_or_w2 2, pid: 19848, min: 39, max: 171, avg: 54, sum: 321027, count: 5851
--end-
r1_or_w2 2, pid: 19848, min: 41, max: 138, avg: 60, sum: 300056, count: 4953
--end

Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, May 11, 2015 at 7:04 PM, Tom Lane  wrote:
> > I think there's nobody, or at least very few people, who are getting
> > paid to find/fix bugs rather than write cool new features.  This is
> > problematic.  It doesn't help when key committers are overwhelmed by
> > trying to process other peoples' patches.  (And no, I'm not sure that
> > "appoint more committers" would improve matters.  What we've got is
> > too many barely-good-enough patches.  Tweaking the process to let those
> > into the tree faster will not result in better quality.)
> 
> I agree, although generally I think committers are responsible for
> fixing what they commit, and I've certainly dropped everything a few
> times to do so.  And people who will someday become committers are
> generally the sorts of people who do that, too.  Perhaps we've relied
> overmuch on that in some cases - e.g. I really haven't paid much
> attention to the multixact stuff until lately, because I assumed that
> it was Alvaro's problem.  And maybe that's not right.  But I know that
> when a serious bug is found in something I committed, I expect that if
> anyone else fixes it, that's a bonus.

For the record, I share the responsibility over committed items
principle, and I adhere to it to as full an extent as possible.
Whenever possible I try to enlist the submitter's help for a fix, but if
they do not respond I consider whatever fix to be on me.  (I have
dropped everything to get fixes done, on several occasions.)

As for multixacts, since it's what brings up this thread, many of you
realize that the amount of time I have spent fixing issues post-facto is
enormous.  If I had a glimpse of the effort that the bugfixing would
cost, I would have certainly dropped it -- spending more time on it
before commit was out of the question.  I appreciate the involvement of
others in the fixes that became necessary.

One lesson I have learned from all this is to try to limit the
additional complexity from any individual patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Amit Kapila
On Tue, May 12, 2015 at 4:55 AM, Andres Freund  wrote:
>
> On 2015-05-11 19:04:32 -0400, Tom Lane wrote:
> > I think there's nobody, or at least very few people, who are getting
> > paid to find/fix bugs rather than write cool new features.  This is
> > problematic.  It doesn't help when key committers are overwhelmed by
> > trying to process other peoples' patches.  (And no, I'm not sure that
> > "appoint more committers" would improve matters.  What we've got is
> > too many barely-good-enough patches.  Tweaking the process to let those
> > into the tree faster will not result in better quality.)
>
> +many
>
> Except perhaps that I'd expand "find/fix bugs" to include "review and
> integrate patches". Because I think few people are paid to do that
> either.

Well said and another thing to add to your point is helping in supporting
the other people's ideas by providing usecase and or much more robust
design that can be accepted in community.
I think one of the reasons for the same is that there is no reasonable
guarantee that if a person spends good amount of time on review, helping
other patches in design phase and fixing bugs, his feature patch/es will be
given more priority which makes it difficult to bargain with one's manager
or company to get more time to involve in such activities.  I think if the
current process of development includes some form of prioritization for
the feature patches by people who spend more time in helping other
patches/maintenance, then it can improve the situation.  Currently, we
do have some system in CF process which suggest that a person has
to review equal number and complexity of patches as he or she submits
for others to review, but I am not sure if that is followed strictly and is
sufficient.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Final Patch for GROUPING SETS

2015-05-11 Thread Andres Freund

>I think the problem is "just" that for each variable, in each grouping
>set - a very large number in a large cube - we're recursing through the
>whole ChainAggregate tree, as each Var just points to a var one level
>lower.

For small values of very large, that is. Had a little thinko there. Its still 
fault of recursing down all these levels, doing nontrivial work each time.

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2015-05-11 Thread Andres Freund
On 2015-04-30 05:35:26 +0100, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
>
>  Andres> This is not a real review.  I'm just scanning through the
>  Andres> patch, without reading the thread, to understand if I see
>  Andres> something "worthy" of controversy. While scanning I might have
>  Andres> a couple observations or questions.
>
>  >> + * A list of grouping sets which is structurally equivalent to a 
> ROLLUP
>  >> + * clause (e.g. (a,b,c), (a,b), (a)) can be processed in a 
> single pass over
>  >> + * ordered data.  We do this by keeping a separate set of 
> transition values
>  >> + * for each grouping set being concurrently processed; for each 
> input tuple
>  >> + * we update them all, and on group boundaries we reset some 
> initial subset
>  >> + * of the states (the list of grouping sets is ordered from most 
> specific to
>  >> + * least specific).  One AGG_SORTED node thus handles any number 
> of grouping
>  >> + * sets as long as they share a sort order.
>
>  Andres> Found "initial subset" not very clear, even if I probably
>  Andres> guessed the right meaning.
>
> How about:
>
>  *  [...], and on group boundaries we reset those states
>  *  (starting at the front of the list) whose grouping values have
>  *  changed (the list of grouping sets is ordered from most specific to
>  *  least specific).  One AGG_SORTED node thus handles any number [...]

sounds good.

>  >> + * To handle multiple grouping sets that _don't_ share a sort 
> order, we use
>  >> + * a different strategy.  An AGG_CHAINED node receives rows in 
> sorted order
>  >> + * and returns them unchanged, but computes transition values 
> for its own
>  >> + * list of grouping sets.  At group boundaries, rather than 
> returning the
>  >> + * aggregated row (which is incompatible with the input rows), 
> it writes it
>  >> + * to a side-channel in the form of a tuplestore.  Thus, a 
> number of
>  >> + * AGG_CHAINED nodes are associated with a single AGG_SORTED 
> node (the
>  >> + * "chain head"), which creates the side channel and, when it 
> has returned
>  >> + * all of its own data, returns the tuples from the tuplestore 
> to its own
>  >> + * caller.
>
>  Andres> This paragraph deserves to be expanded imo.
>
> OK, but what in particular needs clarifying?

I'm not sure ;). I obviously was a bit tired...

>  Andres> Are you intending to resolve this before an eventual commit?
...
>  Andres> Possibly after the 'structural' issues are resolved? Or do you
>  Andres> think this can safely be put of for another release?
>
> I think the feature is useful even without AGG_HASHED, even though that
> means it can sometimes be beaten on performance by using UNION ALL of
> many separate GROUP BYs; but I'd defer to the opinions of others on that
> point.

I agree.

>  Andres> * Arguably this converts the execution *tree* into a DAG. Tom
>  Andres> seems to be rather uncomfortable with that. I am wondering
>  Andres> whether this really is a big deal - essentially this only
>  Andres> happens in a relatively 'isolated' part of the tree right?
>  Andres> I.e. if those chained together nodes were considered one node,
>  Andres> there would not be any loops?  Additionally, the way
>  Andres> parametrized scans works already essentially "violates" the
>  Andres> tree paradigma somewhat.
>
> The major downsides as I see them with the current approach are:
>
> 1. It makes plans (and hence explain output) nest very deeply if you
> have complex grouping sets (especially cubes with high dimensionality).

That doesn't concern me overly much. If we feel the need to fudge the
explain output we certainly can.

> 2. A union-based approach would have a chance of including AGG_HASHED
> support without any significant code changes, just by using one HashAgg
> node per qualifying grouping set. However, this would be potentially
> significantly slower than teaching HashAgg to do multiple grouping sets,
> and memory usage would be an issue.

Your "however" imo pretty much disqualifies that as an argument.

> The obvious alternative is this:
>
>   -> CTE x
>  -> entire input subplan here
>   -> Append
>  -> GroupAggregate
> -> Sort
>-> CTE Scan x
>  -> GroupAggregate
> -> Sort
>-> CTE Scan x
>  -> HashAggregate
> -> CTE Scan x
>  [...]
>
> which was basically what we expected to do originally. But all of the
> existing code to deal with CTE / CTEScan is based on the assumption that
> each CTE has a rangetable entry before planning starts, and it is
> completely non-obvious how to arrange to generate such CTEs on the fly
> while planning.  Tom offered in December to look into that aspect for
> us, and of course we've heard nothing about it since.

I find Noah's argument about this kind of structure pretty
convincing. We'd eit

Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 7:34 PM, Andres Freund  wrote:
> You should try to understand why it's failing. Just prohibiting the
> rules, without understanding what's actually going on, could very well
> hide a real bug.

It's not as if I have no idea. ReplaceVarsFromTargetList() is probably
quite confused by all this, because the passed nomatch_varno argument
is often rt_index -- but what about EXCLUDED.*? adjustJoinTreeList()
does not know anything about EXCLUDED.* either. I see little practical
reason to make the rewriter do any better.

When I debugged the problem of the optimizer raising that "target
lists" error with a rule with an action with EXCLUDED.* (within
setrefs.c's fix_join_expr_mutator()), it looked like an off-by-one
issue here:

/* If it's for acceptable_rel, adjust and return it */
if (var->varno == context->acceptable_rel)
{
var = copyVar(var);
var->varno += context->rtoffset;
if (var->varnoold > 0)
var->varnoold += context->rtoffset;
return (Node *) var;
}

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-11 Thread Amit Kapila
On Tue, May 12, 2015 at 5:50 AM, Andrew Dunstan  wrote:
>
>
> On 05/11/2015 02:02 AM, Amit Kapila wrote:
>>
>> On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan mailto:and...@dunslane.net>> wrote:
>> >
>> >
>> >
>> > This generally looks good, but I have a couple of questions before I
commit it.
>> >
>> > First, why is the new option for the  BASE_BACKUP command of the
Streaming Replication protcol "TAR"? It seems rather misleading. Shouldn't
it be something like "TABLESPACEMAP"?
>> >
>>
>> The reason to keep new option's name as TAR was that tablespace_map
>> was generated for that format type, but I agree with you that something
>> like "TABLESPACEMAP" suits better, so I have changed it to
>> "TABLESPACE_MAP".  Putting '_' in name makes it somewhat consistent
>> with other names and filename it generates with this new option.
>>
>>
>> > Second, these lines in xlog.c seem wrong:
>> >
>> > else if ((ch == '\n' || ch == '\r') && prev_ch == '\\')
>> > str[i-1] = '\n';
>> >
>> > It looks to me like we should be putting ch in the string, not
arbitrarily transforming \r into \n.
>> >
>>
>> You are right, I have changed it as per your suggestion.
>>
>>
>
>
> OK, I have cleaned this up a bit - I had already started so I didn't take
your latest patch but instead applied relevant changes to my changeset.
Here is my latest version.
>
> In testing I notice that now "pg_baseback -F t" leaves it completely up
to the user on all platforms to create the relevant links in pg_tblspc/. It
includes the tablespace_map file in base.tar, but that's really just
informational.
>

Sorry, but I am not able to follow your point.   User don't need to create
the relevant links, they will get created during first startup (during
recovery)
from the backup.  I have tested and it works both on Windows and Linux.

Refer below code of patch in xlog.c

+
+ /* read the tablespace_map file if present and create symlinks. */
+ if (read_tablespace_map(&tablespaces))
+ {
..

> I think we need to add something to the pg_basebackup docs about that, at
the very least (and it will also need to be a release note item.)
>

Currently, below lines in patch suggest that this file is required for
recovery.
Do you expect more information to be added?

+These files are not merely for your information; their presence and
+contents are critical to the proper operation of the system's recovery
+process.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Etsuro Fujita
On 2015/05/12 7:42, Tom Lane wrote:
> On further consideration ...

Thanks for the consideration!

> I don't much like the division of labor between LockForeignRow and
> FetchForeignRow.  In principle that would lead to not one but two
> extra remote accesses per locked row in SELECT FOR UPDATE, at least
> in the case that an EvalPlanQual recheck is required.  (I see that
> in your prototype patch for postgres_fdw you attempt to avoid that
> by saving a retrieved tuple in LockForeignRow and then returning it
> in FetchForeignRow, but that seems extremely ugly and bug-prone,
> since there is nothing in the API spec insisting that those calls match
> up one-to-one.)  The fact that locking and fetching a tuple are separate
> operations in the heapam API is a historical artifact that probably
> doesn't make sense to duplicate in the FDW API.

I understand your concern about the postgres_fdw patch.  However, I
think we should divide this into the two routines as the core patch
does, because I think that would allow an FDW author to implement these
routines so as to improve the efficiency for scenarios that seldom need
fetching, by not retrieving and saving locked tuples in LockForeignRow.

> Another problem is that we fail to detect whether an EvalPlanQual recheck
> is required during a SELECT FOR UPDATE on a remote table, which we
> certainly ought to do if the objective is to make postgres_fdw semantics
> match the local ones.

I think that is interesting in theory, but I'm not sure that is worth
much in practice.

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improving regression tests to check LOCK TABLE and table permissions

2015-05-11 Thread Michael Paquier
Hi all,

As mentioned in this thread, it would be good to have regression tests
to test the interactions with permissions and LOCK TABLE:
http://www.postgresql.org/message-id/20150511195335.ge30...@tamriel.snowman.net
Attached is a patch achieving that.
Note that it does some checks on the modes SHARE ACCESS, ROW EXCLUSIVE
and ACCESS EXCLUSIVE to check all the code paths of
LockTableAclCheck@lockcmds.c.

I'll add an entry in the next CF to keep track of it.
Regards,
-- 
Michael
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index 64a9330..c0cd9fa 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -1569,3 +1569,86 @@ DROP USER regressuser4;
 DROP USER regressuser5;
 DROP USER regressuser6;
 ERROR:  role "regressuser6" does not exist
+-- permissions with LOCK TABLE
+CREATE USER locktable_user;
+CREATE TABLE lock_table (a int);
+-- LOCK TABLE and SELECT permission
+GRANT SELECT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+\c
+REVOKE SELECT ON lock_table FROM locktable_user;
+-- LOCK TABLE and INSERT permission
+GRANT INSERT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+\c
+REVOKE INSERT ON lock_table FROM locktable_user;
+-- LOCK TABLE and UPDATE permission
+GRANT UPDATE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should pass
+COMMIT;
+\c
+REVOKE UPDATE ON lock_table FROM locktable_user;
+-- LOCK TABLE and DELETE permission
+GRANT DELETE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should pass
+COMMIT;
+\c
+REVOKE DELETE ON lock_table FROM locktable_user;
+-- LOCK TABLE and TRUNCATE permission
+GRANT TRUNCATE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ERROR:  permission denied for relation lock_table
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should pass
+COMMIT;
+\c
+REVOKE TRUNCATE ON lock_table FROM locktable_user;
+-- clean up
+DROP TABLE lock_table;
+DROP USER locktable_user;
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 22b54a2..c1837c4 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -975,3 +975,87 @@ DROP USER regressuser3;
 DROP USER regressuser4;
 DROP USER regressuser5;
 DROP USER regressuser6;
+
+
+-- permissions with LOCK TABLE
+CREATE USER locktable_user;
+CREATE TABLE lock_table (a int);
+
+-- LOCK TABLE and SELECT permission
+GRANT SELECT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should fail
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ROLLBACK;
+\c
+REVOKE SELECT ON lock_table FROM locktable_user;
+
+-- LOCK TABLE and INSERT permission
+GRANT INSERT ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- should fail
+ROLLBACK;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS EXCLUSIVE MODE; -- should fail
+ROLLBACK;
+\c
+REVOKE INSERT ON lock_table FROM locktable_user;
+
+-- LOCK TABLE and UPDATE permission
+GRANT UPDATE ON lock_table TO locktable_user;
+SET SESSION AUTHORIZATION locktable_user;
+BEGIN;
+LOCK TABLE lock_table IN ROW EXCLUSIVE MODE; -- should pass
+COMMIT;
+BEGIN;
+LOCK TABLE lock_table IN ACCESS SHARE MODE; -- shou

Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Michael Paquier
On Tue, May 12, 2015 at 4:53 AM, Stephen Frost wrote:
> Yeah, it might not be bad to have tests for all the different lock types
> and make sure that the permissions match up.  I'd probably put those
> tests into 'permissions.sql' instead though.

You mean privileges.sql, right? I just wrote a patch with that. I'll
create a new thread with the patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Andres Freund
On 2015-05-11 19:33:02 -0700, Peter Geoghegan wrote:
> On Mon, May 11, 2015 at 7:22 PM, Peter Geoghegan  wrote:
> > You just get an error within the
> > optimizer following rewriting of a query involving the application of
> > a rule that specifies an ON CONFLICT DO UPDATE alternative/DO INSTEAD
> > action. I found it would say: "variable not found in subplan target
> > lists".
> 
> BTW, that error was only raised when the EXCLUDED.* pseudo-alias was
> actually used, which is why our previous testing missed it.

You should try to understand why it's failing. Just prohibiting the
rules, without understanding what's actually going on, could very well
hide a real bug.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 7:22 PM, Peter Geoghegan  wrote:
> You just get an error within the
> optimizer following rewriting of a query involving the application of
> a rule that specifies an ON CONFLICT DO UPDATE alternative/DO INSTEAD
> action. I found it would say: "variable not found in subplan target
> lists".

BTW, that error was only raised when the EXCLUDED.* pseudo-alias was
actually used, which is why our previous testing missed it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 7:11 PM, Andres Freund  wrote:
> Do I understand correctly that you cut this out because there
> essentially was a ruleutils bug with with EXCLUDED? If so, I don't find
> that acceptable. I'm pretty strictly convincded that independent of rule
> support, we shouldn't add things to insert queries that get_query_def
> can't deparse.

No, that wasn't the reason -- deparsing itself works fine. That code
remains within ruleutils.c because I agree with this principle of
yours.

I tested the deparsing logic before making the case added fail as
unsupported. If you remove the new ereport() call that relates to
non-suppport of ON CONFLICT DO UPDATE as a rule action, then the
CREATE RULE still succeeds, and you can deparse the query just fine
(by quering pg_rules, typically). You just get an error within the
optimizer following rewriting of a query involving the application of
a rule that specifies an ON CONFLICT DO UPDATE alternative/DO INSTEAD
action. I found it would say: "variable not found in subplan target
lists".

I can't say that I explored the question very thoroughly, but it seems
bothersome to have more than one relation involved in a Query involved
in rewriting.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Andres Freund
HI,

Don't have time to go through this in depth.

On 2015-05-11 18:53:22 -0700, Peter Geoghegan wrote:
> Note that the patch proposes to de-support CREATE RULE with an
> alternative action involving ON CONFLICT DO UPDATE. Such a rule seems
> particularly questionable to me, and I'm pretty sure it was understood
> that only ON CONFLICT DO NOTHING should be supported as an action for
> a rule (recall that INSERT statements with ON CONFLICT can, in
> general, never target relations with rules). At least, I believe
> Heikki said that.

> Deparsing with an inference clause is now correctly supported.  However,
> user-defined rules with ON CONFLICT DO UPDATE are now formally
> disallowed/unsupported.  It seemed there would be significant complexity
> involved in making this work correctly with the EXCLUDED.*
> pseudo-relation, which was not deemed worthwhile.  Such a user-defined
> rule seems very questionable anyway.

Do I understand correctly that you cut this out because there
essentially was a ruleutils bug with with EXCLUDED? If so, I don't find
that acceptable. I'm pretty strictly convincded that independent of rule
support, we shouldn't add things to insert queries that get_query_def
can't deparse.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 6:53 PM, Peter Geoghegan  wrote:
> Attached patch fixes some issues with ON CONFLICT DO UPDATE/DO
> NOTHING. There is a commit message which explains the changes at a
> high level.

I just realized that there is a silly bug here. Attached is a fix that
applies on top of the original.


-- 
Peter Geoghegan
From 1b32558d188762eb5c7214ea5ae042897e7d004f Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Mon, 11 May 2015 19:02:12 -0700
Subject: [PATCH 2/2] Add closing bracket

---
 src/backend/utils/adt/ruleutils.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 355acc9..aa21693 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -7768,6 +7768,9 @@ get_rule_expr(Node *node, deparse_context *context,
 get_rule_expr((Node *) iexpr->expr,
 			  context, showimplicit);
 
+if (need_parens)
+	appendStringInfoChar(buf, ')');
+
 context->varprefix = varprefix;
 
 if (iexpr->infercollid)
@@ -7782,8 +7785,6 @@ get_rule_expr(Node *node, deparse_context *context,
 
 	get_opclass_name(inferopclass, inferopcinputtype, buf);
 }
-if (need_parens)
-	appendStringInfoChar(buf, ')');
 			}
 			break;
 
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor ON CONFLICT related fixes

2015-05-11 Thread Peter Geoghegan
Attached patch fixes some issues with ON CONFLICT DO UPDATE/DO
NOTHING. There is a commit message which explains the changes at a
high level. The only real bug fix is around deparsing by ruleutils.c.

Note that the patch proposes to de-support CREATE RULE with an
alternative action involving ON CONFLICT DO UPDATE. Such a rule seems
particularly questionable to me, and I'm pretty sure it was understood
that only ON CONFLICT DO NOTHING should be supported as an action for
a rule (recall that INSERT statements with ON CONFLICT can, in
general, never target relations with rules). At least, I believe
Heikki said that.

Thoughts?
-- 
Peter Geoghegan
From 349a0a1fee6d86de8c5cc4120474ddc48aeb43e0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Mon, 11 May 2015 15:37:54 -0700
Subject: [PATCH] Fixes to a variety of minor ON CONFLICT issues

Deparsing with an inference clause is now correctly supported.  However,
user-defined rules with ON CONFLICT DO UPDATE are now formally
disallowed/unsupported.  It seemed there would be significant complexity
involved in making this work correctly with the EXCLUDED.*
pseudo-relation, which was not deemed worthwhile.  Such a user-defined
rule seems very questionable anyway.

In passing, re-factor InferenceElem representation of opclass, to defer
opfamily lookup for Relation index matching until index inference proper
(i.e., within the optimizer).  This is done for the convenience of the
new ruleutils.c code, but independently make senses.

Finally, fix a few typos, and rename a variable -- "candidates" seemed
like a misnomer for the return value of infer_arbiter_indexes().
---
 contrib/pg_stat_statements/pg_stat_statements.c |  3 +-
 doc/src/sgml/ref/create_rule.sgml   |  5 ++
 src/backend/executor/nodeModifyTable.c  |  2 +-
 src/backend/nodes/copyfuncs.c   |  3 +-
 src/backend/nodes/equalfuncs.c  |  3 +-
 src/backend/nodes/outfuncs.c|  3 +-
 src/backend/nodes/readfuncs.c   |  3 +-
 src/backend/optimizer/plan/setrefs.c|  3 +-
 src/backend/optimizer/util/plancat.c| 34 +++
 src/backend/parser/parse_clause.c   | 16 ++
 src/backend/parser/parse_utilcmd.c  |  5 ++
 src/backend/utils/adt/ruleutils.c   | 75 -
 src/include/nodes/primnodes.h   |  3 +-
 src/test/regress/expected/insert_conflict.out   |  2 +-
 src/test/regress/expected/rules.out | 67 --
 src/test/regress/sql/rules.sql  | 32 +++
 16 files changed, 177 insertions(+), 82 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6abe3f0..f97cc2c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2637,8 +2637,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 InferenceElem *ie = (InferenceElem *) node;
 
 APP_JUMB(ie->infercollid);
-APP_JUMB(ie->inferopfamily);
-APP_JUMB(ie->inferopcinputtype);
+APP_JUMB(ie->inferopclass);
 JumbleExpr(jstate, ie->expr);
 			}
 			break;
diff --git a/doc/src/sgml/ref/create_rule.sgml b/doc/src/sgml/ref/create_rule.sgml
index 53fdf56..46a8a7c 100644
--- a/doc/src/sgml/ref/create_rule.sgml
+++ b/doc/src/sgml/ref/create_rule.sgml
@@ -281,6 +281,11 @@ UPDATE mytable SET name = 'foo' WHERE id = 42;
match the condition id = 42.  This is an
implementation restriction that might be fixed in future releases.
   
+  
+   Presently, a rule alternative action cannot contain ON
+   CONFLICT DO UPDATE.  This is an implementation
+   restriction that might be fixed in future releases.
+  
  
 
  
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 34435c7..55a1cc7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -414,7 +414,7 @@ ExecInsert(ModifyTableState *mtstate,
    estate, true, &specConflict,
    arbiterIndexes);
 
-			/* adjust the tuple's state accordingly */
+			/* adjust the tuple's state */
 			if (!specConflict)
 heap_finish_speculative(resultRelationDesc, tuple);
 			else
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 76b63af..957c2bc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1803,8 +1803,7 @@ _copyInferenceElem(const InferenceElem *from)
 
 	COPY_NODE_FIELD(expr);
 	COPY_SCALAR_FIELD(infercollid);
-	COPY_SCALAR_FIELD(inferopfamily);
-	COPY_SCALAR_FIELD(inferopcinputtype);
+	COPY_SCALAR_FIELD(inferopclass);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index e032142..f26626e 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -687,8 +687,7 @@ _equalInferenceElem(const InferenceElem *a, co

Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 7:04 PM, Tom Lane  wrote:
> I think there's nobody, or at least very few people, who are getting
> paid to find/fix bugs rather than write cool new features.  This is
> problematic.  It doesn't help when key committers are overwhelmed by
> trying to process other peoples' patches.  (And no, I'm not sure that
> "appoint more committers" would improve matters.  What we've got is
> too many barely-good-enough patches.  Tweaking the process to let those
> into the tree faster will not result in better quality.)

I agree, although generally I think committers are responsible for
fixing what they commit, and I've certainly dropped everything a few
times to do so.  And people who will someday become committers are
generally the sorts of people who do that, too.  Perhaps we've relied
overmuch on that in some cases - e.g. I really haven't paid much
attention to the multixact stuff until lately, because I assumed that
it was Alvaro's problem.  And maybe that's not right.  But I know that
when a serious bug is found in something I committed, I expect that if
anyone else fixes it, that's a bonus.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-11 Thread Kouhei Kaigai
Hi,

Let me back to the original concern and show possible options
we can take here. At least, the latest master is not usable to
implement custom-join logic without either of these options.

option-1)
Revert create_plan_recurse() to non-static function for extensions.
It is the simplest solution, however, it is still gray zone which
functions shall be public and whether we deal with the non-static
functions as a stable API or not.
IMO, we shouldn't treat non-static functions as stable APIs, even
if it can be called by extensions not only codes in another source
file. In fact, we usually changes definition of non-static functions
even though we know extensions uses. It is role of extension to
follow up the feature across major version.


option-2)
Tom's suggestion. Add a new list field of Path nodes on CustomPath,
then create_customscan_plan() will call static create_plan_recurse()
function to construct child plan nodes.
Probably, the attached patch will be an image of this enhancement,
but not tested yet, of course. Once we adopt this approach, I'll
adjust my PG-Strom code towards the new interface within 2 weeks
and report problems if any.


option-3)
Enforce authors of custom-scan provider to copy and paste createplan.c.
I really don't want this option and nobody will be happy.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Kaigai Kouhei(海外 浩平)
> Sent: Monday, May 11, 2015 12:48 PM
> To: 'Andres Freund'; Robert Haas
> Cc: Tom Lane; Kohei KaiGai; Thom Brown; Shigeru Hanada;
> pgsql-hackers@postgreSQL.org
> Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> 
> > On 2015-05-10 21:26:26 -0400, Robert Haas wrote:
> > > On Sun, May 10, 2015 at 8:41 PM, Tom Lane  wrote:
> > > > > This commit reverts create_plan_recurse() as static function.
> > > > Yes.  I am not convinced that external callers should be calling that,
> > > > and would prefer not to enlarge createplan.c's API footprint without a
> > > > demonstration that this is right and useful.  (This is one of many
> > > > ways in which this patch is suffering from having gotten committed
> > > > without submitted use-cases.)
> >
> > Wasn't there a submitted use case? IIRC Kaigai had referenced some
> > pg-strom (?) code using it?
> >
> > I'm failing to see how create_plan_recurse() being exposed externally is
> > related to "having gotten committed without submitted use-cases".  Even
> > if submitted, presumably as simple as possible code, doesn't use it,
> > that's not a proof that less simple code does not need it.
> >
> Yes, PG-Strom code uses create_plan_recurse() to construct child plan
> node of the GPU accelerated custom-join logic, once it got chosen.
> Here is nothing special. It calls create_plan_recurse() as built-in
> join path doing on the underlying inner/outer paths.
> It is not difficult to submit as a working example, however, its total
> code size (excludes GPU code) is 25KL at this moment.
> 
> I'm not certain whether it is a simple example.
> 
> > > Your unwillingness to make functions global or to stick PGDLLIMPORT
> > > markings on variables that people want access to is hugely
> > > handicapping extension authors.  Many people have complained about
> > > that on multiple occasions.  Frankly, I find it obstructionist and
> > > petty.
> >
> > While I don't find the tone of the characterization super helpful, I do
> > tend to agree that we're *far* too conservative on that end.  I've now
> > seen a significant number of extension that copied large swathes of code
> > just to cope with individual functions not being available.  And even
> > cases where that lead to minor forks with such details changed.
> >
> I may have to join the members?
> 
> > I know that I'm "fearful" of asking for functions being made
> > public. Because it'll invariably get into a discussion of merits that's
> > completely out of proportion with the size of the change.  And if I, who
> > has been on the list for a while now, am "afraid" in that way, you can
> > be sure that others won't even dare to ask, lest argue their way
> > through.
> >
> > I think the problem is that during development the default often is to
> > create function as static if they're used only in one file. Which is
> > fine. But it really doesn't work if it's a larger battle to change
> > single incidences.  Besides the pain of having to wait for the next
> > major release...
> >
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei 



custom-join-children.patch
Description: custom-join-children.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 04:18 PM, Simon Riggs wrote:

On 11 May 2015 at 23:47, Bruce Momjian mailto:br...@momjian.us>> wrote:

On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
> >The releases themselves are not the problem, but rather the volume of
> >bugs and our slowness in getting additional people involved to remove
> >data corruption bugs more quickly and systematically.  Our reputation
> >for reliability has been harmed by this inactivity.
> >
>
> What I am arguing is that the release cycle is at least a big part
> of the problem. We are trying to get so many new features that bugs
> are increasing and quality is decreasing.

Now that is an interesting observation --- are we too focused on patches
and features to realize when we need to seriously revisit an issue?


I think we are unused to bugs. We have a much lower bug rate than any
other system.


True we are used to having extremely high quality releases but if you 
look at the release notes since say 9.2, we are seeing a much larger 
increase in bug rates.


It is true that generally speaking our bug rate is low in comparison to 
other databases. That said, I think we are also resting on some laurels 
here per my previous paragraph.




I think we seriously need to review our policy of adding major new
features and have them enabled by default with no parameter to disable
them. In the early years of PostgreSQL everything had an off switch,
e.g. stats, bgwriter and even autovacuum defaulted to off for many years.


That's interesting although I am unsure of the cost of such a thing.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 04:04 PM, Tom Lane wrote:

Bruce Momjian  writes:

On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:

What I am arguing is that the release cycle is at least a big part
of the problem. We are trying to get so many new features that bugs
are increasing and quality is decreasing.



Now that is an interesting observation --- are we too focused on patches
and features to realize when we need to seriously revisit an issue?


I think there's nobody, or at least very few people, who are getting
paid to find/fix bugs rather than write cool new features.  This is
problematic.  It doesn't help when key committers are overwhelmed by
trying to process other peoples' patches.  (And no, I'm not sure that
"appoint more committers" would improve matters.  What we've got is
too many barely-good-enough patches.  Tweaking the process to let those
into the tree faster will not result in better quality.)


Exactly correct.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to improve a few appendStringInfo* calls

2015-05-11 Thread Peter Eisentraut
On 4/11/15 6:25 AM, David Rowley wrote:
> I've attached a small patch which just fixes a few appendStringInfo*
> calls that are not quite doing things the way that it was intended. 

done



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixing busted citext function declarations

2015-05-11 Thread David E. Wheeler
On May 11, 2015, at 5:01 PM, Tom Lane  wrote:

> Me too.  Something fell through the cracks rather badly there :-(.
> Would you check your commit history to see if anything else got missed?

Let’s see…

In 
https://github.com/theory/citext/commit/4030b4e1ad9fd9f994a6cdca1126a903682acae4
 I copied your use of specifying the full path to pg_catalog function, which is 
still in core.

In 
https://github.com/theory/citext/commit/c24132c098a822f5a8669ed522e747e01e1c0835,
 I made some tweaks based on you change you made to some version of my patch. 
Most are minor, or just for functions needed for 8.4 and not later versions.

In 
https://github.com/theory/citext/commit/2c7e997fd60e2b708d06c128e5fd2db51c7a9f33,
 I added a cast to bpchar, which is in core.

In 
https://github.com/theory/citext/commit/cf988024d18a6ddd9a8146ab8cabfe6e0167ba26
 and 
https://github.com/theory/citext/commit/22f91a0d50003a0c1c27d1fbf0bb5c0a1e3a3cad
 I switched from VARSIZE_ANY_EXHDR() to strlen() at your suggestion. Also still 
there.


Anyway, those are all from 2008 and pretty much just copy changes you made to 
core. The return value of regexp_matches() is the only significant change since 
then. So I think we’re good.

Best,

David\

smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-11 Thread Andrew Dunstan


On 05/11/2015 02:02 AM, Amit Kapila wrote:
On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan > wrote:

>
>
>
> This generally looks good, but I have a couple of questions before I 
commit it.

>
> First, why is the new option for the  BASE_BACKUP command of the 
Streaming Replication protcol "TAR"? It seems rather misleading. 
Shouldn't it be something like "TABLESPACEMAP"?

>

The reason to keep new option's name as TAR was that tablespace_map
was generated for that format type, but I agree with you that something
like "TABLESPACEMAP" suits better, so I have changed it to
"TABLESPACE_MAP".  Putting '_' in name makes it somewhat consistent
with other names and filename it generates with this new option.


> Second, these lines in xlog.c seem wrong:
>
> else if ((ch == '\n' || ch == '\r') && prev_ch == '\\')
> str[i-1] = '\n';
>
> It looks to me like we should be putting ch in the string, not 
arbitrarily transforming \r into \n.

>

You are right, I have changed it as per your suggestion.





OK, I have cleaned this up a bit - I had already started so I didn't 
take your latest patch but instead applied relevant changes to my 
changeset. Here is my latest version.


In testing I notice that now "pg_baseback -F t" leaves it completely up 
to the user on all platforms to create the relevant links in pg_tblspc/. 
It includes the tablespace_map file in base.tar, but that's really just 
informational. I think we need to add something to the pg_basebackup 
docs about that, at the very least (and it will also need to be a 
release note item.)


cheers

andrew


diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index e25e0d0..def43a2 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -836,8 +836,11 @@ SELECT pg_start_backup('label');
  pg_start_backup creates a backup label file,
  called backup_label, in the cluster directory with
  information about your backup, including the start time and label
- string.  The file is critical to the integrity of the backup, should
- you need to restore from it.
+ string.  The function also creates a tablespace map file,
+ called tablespace_map, in the cluster directory with
+ information about tablespace symbolic links in pg_tblspc/
+ if one or more such link is present.  Both files are critical to the
+ integrity of the backup, should you need to restore from it.
 
 
 
@@ -965,17 +968,20 @@ SELECT pg_stop_backup();
 

 It's also worth noting that the pg_start_backup function
-makes a file named backup_label in the database cluster
-directory, which is removed by pg_stop_backup.
-This file will of course be archived as a part of your backup dump file.
-The backup label file includes the label string you gave to
-pg_start_backup, as well as the time at which
-pg_start_backup was run, and the name of the starting WAL
-file.  In case of confusion it is therefore possible to look inside a
-backup dump file and determine exactly which backup session the dump file
-came from.  However, this file is not merely for your information; its
-presence and contents are critical to the proper operation of the system's
-recovery process.
+makes files named backup_label and
+tablesapce_map in the database cluster directory,
+which are removed by pg_stop_backup.  These files will of
+course be archived as a part of your backup dump file.  The backup label
+file includes the label string you gave to pg_start_backup,
+as well as the time at which pg_start_backup was run, and
+the name of the starting WAL file.  In case of confusion it is therefore
+possible to look inside a backup dump file and determine exactly which
+backup session the dump file came from.  The tablespace map file includes
+the symbolic link names as they exist in the directory
+pg_tblspc/ and the full path of each symbolic link.
+These files are not merely for your information; their presence and
+contents are critical to the proper operation of the system's recovery
+process.

 

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fb39731..24d43d9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16591,11 +16591,12 @@ SELECT set_config('log_statement_stats', 'off', false);
 pg_start_backup accepts an
 arbitrary user-defined label for the backup.  (Typically this would be
 the name under which the backup dump file will be stored.)  The function
-writes a backup label file (backup_label) into the
-database cluster's data directory, performs a checkpoint,
-and then returns the backup's starting transaction log location as text.
-The user can ignore this result value, but it is
-provided in case it is useful.
+writes a backup label file (backup_label) and, if there
+are any links in the pg_tblspc/ directory, a ta

Re: [HACKERS] Fixing busted citext function declarations

2015-05-11 Thread Tom Lane
"David E. Wheeler"  writes:
> On May 5, 2015, at 9:40 AM, Tom Lane  wrote:
>> In
>> http://www.postgresql.org/message-id/bn1pr04mb37467aa1d412223b3d4a595df...@bn1pr04mb374.namprd04.prod.outlook.com
>> it's revealed that the citext extension misdeclares its versions of
>> regexp_matches(): they should return SETOF text[] but they're marked
>> as returning just text[].

> I wanted to make sure my backport was fixed for this, but it turns out it was 
> already fixed as of this commit:

>   https://github.com/theory/citext/commit/99c925f

> Note that I credited you for the spot --- way back in October 2009! Pretty 
> confused how the same change wasn’t made to the core contrib module back 
> then.

Me too.  Something fell through the cracks rather badly there :-(.
Would you check your commit history to see if anything else got missed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Andres Freund
On 2015-05-11 19:04:32 -0400, Tom Lane wrote:
> I think there's nobody, or at least very few people, who are getting
> paid to find/fix bugs rather than write cool new features.  This is
> problematic.  It doesn't help when key committers are overwhelmed by
> trying to process other peoples' patches.  (And no, I'm not sure that
> "appoint more committers" would improve matters.  What we've got is
> too many barely-good-enough patches.  Tweaking the process to let those
> into the tree faster will not result in better quality.)

+many

Except perhaps that I'd expand "find/fix bugs" to include "review and
integrate patches". Because I think few people are paid to do that
either.  I now partially am (which obviously isn't sufficient). There's
no way it's possible to e.g. work on integrating something like upsert
in a reasonable timeframe otherwise.

The lack of paid time to integrate stuff properly also leads to part of
the quality problem, besides delaying stuff.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixing busted citext function declarations

2015-05-11 Thread David E. Wheeler
Tom,

On May 5, 2015, at 9:40 AM, Tom Lane  wrote:

> In
> http://www.postgresql.org/message-id/bn1pr04mb37467aa1d412223b3d4a595df...@bn1pr04mb374.namprd04.prod.outlook.com
> it's revealed that the citext extension misdeclares its versions of
> regexp_matches(): they should return SETOF text[] but they're marked
> as returning just text[].

I wanted to make sure my backport was fixed for this, but it turns out it was 
already fixed as of this commit:

  https://github.com/theory/citext/commit/99c925f

Note that I credited you for the spot --- way back in October 2009! Pretty 
confused how the same change wasn’t made to the core contrib module back then.

Best,

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Simon Riggs
On 11 May 2015 at 23:47, Bruce Momjian  wrote:

> On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
> > >The releases themselves are not the problem, but rather the volume of
> > >bugs and our slowness in getting additional people involved to remove
> > >data corruption bugs more quickly and systematically.  Our reputation
> > >for reliability has been harmed by this inactivity.
> > >
> >
> > What I am arguing is that the release cycle is at least a big part
> > of the problem. We are trying to get so many new features that bugs
> > are increasing and quality is decreasing.
>
> Now that is an interesting observation --- are we too focused on patches
> and features to realize when we need to seriously revisit an issue?
>

I think we are unused to bugs. We have a much lower bug rate than any other
system.

I think we seriously need to review our policy of adding major new features
and have them enabled by default with no parameter to disable them. In the
early years of PostgreSQL everything had an off switch, e.g. stats,
bgwriter and even autovacuum defaulted to off for many years.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
>> What I am arguing is that the release cycle is at least a big part
>> of the problem. We are trying to get so many new features that bugs
>> are increasing and quality is decreasing.

> Now that is an interesting observation --- are we too focused on patches
> and features to realize when we need to seriously revisit an issue?

I think there's nobody, or at least very few people, who are getting
paid to find/fix bugs rather than write cool new features.  This is
problematic.  It doesn't help when key committers are overwhelmed by
trying to process other peoples' patches.  (And no, I'm not sure that
"appoint more committers" would improve matters.  What we've got is
too many barely-good-enough patches.  Tweaking the process to let those
into the tree faster will not result in better quality.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multixid hindsight design

2015-05-11 Thread Tom Lane
Heikki Linnakangas  writes:
> So the lesson here is that having a permanent pg_multixact is not nice, 
> and we should get rid of it. Here's how to do that:

That would be cool, but ...

> Looking at the tuple header, the CID and CTID fields are only needed, 
> when either xmin or xmax is running. Almost: in a HOT-updated tuple, 
> CTID is required even after xmax has committed, but since it's a HOT 
> update, the new tuple is always on the same page so you only need the 
> offsetnumber part.

I think this is totally wrong.  HOT update or not, you need the forward
link represented by ctid not just until xmin/xmax commit, but until
they're in the past according to all active snapshots.  That's so that
other transactions can chain forward to the "current" version of a tuple
which they found according to their snapshots.

It might be you can salvage the idea anyway, since it's still true that
the forward links wouldn't be needed after a crash and restart.  But the
argument as stated is wrong.

(There's also the question of forensic requirements, although I'm aware
that it's barely worth bringing that up since nobody else here seems to
put much weight on that.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
On Mon, May 11, 2015 at 03:42:26PM -0700, Joshua Drake wrote:
> >The releases themselves are not the problem, but rather the volume of
> >bugs and our slowness in getting additional people involved to remove
> >data corruption bugs more quickly and systematically.  Our reputation
> >for reliability has been harmed by this inactivity.
> >
> 
> What I am arguing is that the release cycle is at least a big part
> of the problem. We are trying to get so many new features that bugs
> are increasing and quality is decreasing.

Now that is an interesting observation --- are we too focused on patches
and features to realize when we need to seriously revisit an issue?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 02:15 PM, Bruce Momjian wrote:


On Mon, May 11, 2015 at 02:11:48PM -0700, Joshua Drake wrote:





Here are some options

Slow down the release cycle
The shortness of the release cycle puts a preference on adding
features versus providing a more mature outcome.

or

Increase the release cycle
Moving to a Ubuntu style release cycle would allow us to have a
window to scratch the itch but with the eventual (and known) release
of something that is LTS.


The releases themselves are not the problem, but rather the volume of
bugs and our slowness in getting additional people involved to remove
data corruption bugs more quickly and systematically.  Our reputation
for reliability has been harmed by this inactivity.



What I am arguing is that the release cycle is at least a big part of 
the problem. We are trying to get so many new features that bugs are 
increasing and quality is decreasing.


If we change the release cycle it will encourage an increase in eyeballs 
on code we are developing because people aren't going to be in such a 
rush to "get this feature done for this release".


Sincerely,

Joshua D. Drake


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Tom Lane
On further consideration ...

I don't much like the division of labor between LockForeignRow and
FetchForeignRow.  In principle that would lead to not one but two
extra remote accesses per locked row in SELECT FOR UPDATE, at least
in the case that an EvalPlanQual recheck is required.  (I see that
in your prototype patch for postgres_fdw you attempt to avoid that
by saving a retrieved tuple in LockForeignRow and then returning it
in FetchForeignRow, but that seems extremely ugly and bug-prone,
since there is nothing in the API spec insisting that those calls match
up one-to-one.)  The fact that locking and fetching a tuple are separate
operations in the heapam API is a historical artifact that probably
doesn't make sense to duplicate in the FDW API.

Another problem is that we fail to detect whether an EvalPlanQual recheck
is required during a SELECT FOR UPDATE on a remote table, which we
certainly ought to do if the objective is to make postgres_fdw semantics
match the local ones.

What I'm inclined to do is merge LockForeignRow and FetchForeignRow
into one operation, which would have the semantics of returning a
palloc'd tuple, or NULL on a SKIP LOCKED failure, and it would be expected
to acquire a lock according to erm->markType (in particular, no lock just
a re-fetch for ROW_MARK_REFERENCE).  In addition it needs some way of
reporting that the returned row is an updated version rather than the
original.  Probably just an extra output boolean would do for that.

This'd require some refactoring in nodeLockRows, because we'd want to
be able to hang onto the returned tuple without necessarily provoking
an EvalPlanQual cycle, but it doesn't look too bad.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deparsing utility commands

2015-05-11 Thread Alvaro Herrera
David Steele wrote:

> I have reviewed and tested this patch and everything looks good to me.
> It also looks like you added better coverage for schema DDL, which is a
> welcome addition.

Thanks -- I have pushed this now.

My hope is to get this test module extended quite a bit, not only to
cover existing commands, but also so that it causes future changes to
cause failure unless command collection is considered.  (In a previous
version of this patch, there was a test mode that ran everything in the
serial_schedule of regular regression tests.  That was IMV a good way to
ensure that new commands were also tested to run under command
collection.  That hasn't been enabled in the new test module, and I
think it's necessary.)

If anyone wants to contribute to the test module so that more is
covered, that would be much appreciated.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
On Tue, May 12, 2015 at 12:29:56AM +0300, Heikki Linnakangas wrote:
> On 05/12/2015 12:00 AM, Bruce Momjian wrote:
> >Multi-xacts were made durable in Postgres 9.3 (released 2013-09-09) to
> >allow primary-key-column-only locks.  1.7 years later, we are still
> >dealing with bugs related to this feature.  Obviously, something is
> >wrong.
> >
> >There were many 9.3 minor releases containing multi-xacts fixes, and
> >these fixes have extended into 9.4.  After the first few bug-fix
> >releases, I questioned whether we needed to revert or rework the
> >feature, but got no positive response.  Only in the past few weeks have
> >we got additional people involved.
> 
> The "revert or rework" ship had already sailed at that point. I

True.

> don't think we had much choice than just soldier through the bugs
> after the release.

The problem is we "soldiered on" without adding any resources to the
problem or doing a systematic review once it became clear one was
necessary.

> >I think we now know that our inaction didn't serve us well.  The
> >question is how can we identify chronic problems and get resources
> >involved sooner.  I feel we have been "asleep at the wheel" to some
> >extent on this.
> 
> Yeah. I think the problem was that no-one realized that this was a
> significant change to the on-disk format. It was deceptively
> backwards-compatible. When it comes to permanent on-disk structures,
> we should all be more vigilant in the review.

Yes, and the size/age of the patch helped mask problems too.  Are these
the lessons we need to learn?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Heikki Linnakangas

On 05/12/2015 12:00 AM, Bruce Momjian wrote:

Multi-xacts were made durable in Postgres 9.3 (released 2013-09-09) to
allow primary-key-column-only locks.  1.7 years later, we are still
dealing with bugs related to this feature.  Obviously, something is
wrong.

There were many 9.3 minor releases containing multi-xacts fixes, and
these fixes have extended into 9.4.  After the first few bug-fix
releases, I questioned whether we needed to revert or rework the
feature, but got no positive response.  Only in the past few weeks have
we got additional people involved.


The "revert or rework" ship had already sailed at that point. I don't 
think we had much choice than just soldier through the bugs after the 
release.



I think we now know that our inaction didn't serve us well.  The
question is how can we identify chronic problems and get resources
involved sooner.  I feel we have been "asleep at the wheel" to some
extent on this.


Yeah. I think the problem was that no-one realized that this was a 
significant change to the on-disk format. It was deceptively 
backwards-compatible. When it comes to permanent on-disk structures, we 
should all be more vigilant in the review.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multixid hindsight design

2015-05-11 Thread Heikki Linnakangas
I'd like to discuss how we should've implemented the infamous 9.3 
multixid/row-locking stuff, and perhaps still should in 9.6. Hindsight 
is always 20/20 - I'll readily admit that I didn't understand the 
problems until well after the release - so this isn't meant to bash 
what's been done. Rather, let's think of the future.


The main problem with the infamous multixid changes was that it made 
pg_multixact a permanent, critical, piece of data. Without it, you 
cannot decipher whether some rows have been deleted or not. The 9.3 
changes uncovered pre-existing issues with vacuuming and wraparound, but 
the fact that multixids are now critical turned those the otherwise 
relatively harmless bugs into data loss.


We have pg_clog, which is a similar critical data structure. That's a 
pain too - you need VACUUM and you can't easily move tables from one 
cluster to another for example - but we've learned to live with it. But 
we certainly don't need any more such data structures.


So the lesson here is that having a permanent pg_multixact is not nice, 
and we should get rid of it. Here's how to do that:



Looking at the tuple header, the CID and CTID fields are only needed, 
when either xmin or xmax is running. Almost: in a HOT-updated tuple, 
CTID is required even after xmax has committed, but since it's a HOT 
update, the new tuple is always on the same page so you only need the 
offsetnumber part. That leaves us with 8 bytes that are always available 
for storing "ephemeral" information. By ephemeral, I mean that it is 
only needed when xmin or xmax is in-progress. After that, e.g. after a 
shutdown, it's never looked at.


Let's add a new SLRU, called Tuple Ephemeral Data (TED). It is addressed 
by a 64-bit pointer, which means that it never wraps around. That 64-bit 
pointer is stored in the tuple header, in those 8 ephemeral bytes 
currently used for CID and CTID. Whenever a tuple is deleted/updated and 
locked at the same time, a TED entry is created for it, in the new SLRU, 
and the pointer to the entry is put on the tuple. In the TED entry, we 
can use as many bytes as we need to store the ephemeral data. It would 
include the CID (or possibly both CMIN and CMAX separately, now that we 
have the space), CTID, and the locking XIDs. The list of locking XIDs 
could be stored there directly, replacing multixids completely, or we 
could store a multixid there, and use the current pg_multixact system to 
decode them. Or we could store the multixact offset in the TED, 
replacing the multixact offset SLRU, but keep the multixact member SLRU 
as is.


The XMAX stored on the tuple header would always be a real transaction 
ID, not a multixid. Hence locked-only tuples don't need to be frozen 
afterwards.


The beauty of this would be that the TED entries can be zapped at 
restart, just like pg_subtrans, and pg_multixact before 9.3. It doesn't 
need to be WAL-logged, and we are free to change its on-disk layout even 
in a minor release.


Further optimizations are possible. If the TED entry fits in 8 bytes, it 
can be stored directly in the tuple header. Like today, if a tuple is 
locked but not deleted/updated, you only need to store the locker XID, 
and you can store the locking XID directly on the tuple. Or if it's 
deleted and locked, CTID is not needed, only CID and locker XID, so you 
can store those direcly on the tuple. Plus some spare bits to indicate 
what is stored. And if the XMIN is older than global-xmin, you could 
also steal the XMIN field for storing TED data, making it possible to 
store 12 bytes directly in the tuple header. Plus some spare bits again 
to indicate that you've done that.



Now, given where we are, how do we get there? Upgrade is a pain, because 
even if we no longer generate any new multixids, we'll have to be able 
to decode them after pg_upgrade. Perhaps condense pg_multixact into a 
simpler pg_clog-style bitmap at pg_upgrade, to make it small and simple 
to read, but it would nevertheless be a fair amount of code just to deal 
with pg_upgraded databases.


I think this is worth doing, even after we've fixed all the acute 
multixid bugs, because this would be more robust in the long run. It 
would also remove the need to do anti-wraparound multixid vacuums, and 
the newly-added tuning knobs related to that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
On Mon, May 11, 2015 at 02:11:48PM -0700, Joshua Drake wrote:
> 
> On 05/11/2015 02:00 PM, Bruce Momjian wrote:
> 
> >I think we now know that our inaction didn't serve us well.  The
> >question is how can we identify chronic problems and get resources
> >involved sooner.  I feel we have been "asleep at the wheel" to some
> >extent on this.
> 
> Here are some options
> 
> Slow down the release cycle
>   The shortness of the release cycle puts a preference on adding
> features versus providing a more mature outcome.
> 
> or
> 
> Increase the release cycle
>   Moving to a Ubuntu style release cycle would allow us to have a
> window to scratch the itch but with the eventual (and known) release
> of something that is LTS.

The releases themselves are not the problem, but rather the volume of
bugs and our slowness in getting additional people involved to remove
data corruption bugs more quickly and systematically.  Our reputation
for reliability has been harmed by this inactivity.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 02:00 PM, Bruce Momjian wrote:


I think we now know that our inaction didn't serve us well.  The
question is how can we identify chronic problems and get resources
involved sooner.  I feel we have been "asleep at the wheel" to some
extent on this.


Here are some options

Slow down the release cycle
	The shortness of the release cycle puts a preference on adding features 
versus providing a more mature outcome.


or

Increase the release cycle
	Moving to a Ubuntu style release cycle would allow us to have a window 
to scratch the itch but with the eventual (and known) release of 
something that is LTS.


JD  



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multi-xacts and our process problem

2015-05-11 Thread Bruce Momjian
Multi-xacts were made durable in Postgres 9.3 (released 2013-09-09) to
allow primary-key-column-only locks.  1.7 years later, we are still
dealing with bugs related to this feature.  Obviously, something is
wrong.

There were many 9.3 minor releases containing multi-xacts fixes, and
these fixes have extended into 9.4.  After the first few bug-fix
releases, I questioned whether we needed to revert or rework the
feature, but got no positive response.  Only in the past few weeks have
we got additional people involved.

I think we now know that our inaction didn't serve us well.  The
question is how can we identify chronic problems and get resources
involved sooner.  I feel we have been "asleep at the wheel" to some
extent on this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Joshua D. Drake


On 05/11/2015 10:24 AM, Josh Berkus wrote:


In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
this instead of adding a new GUC?  We already have a bunch of freezing
GUCs which fewer than 1% of our user base has any idea how to set.


That is a documentation problem not a user problem. Although I agree 
that yet another GUC for an obscure "feature" that should be internally 
intelligent is likely the wrong direction.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Tue, May 12, 2015 at 4:01 AM, Stephen Frost  wrote:
> > * Michael Paquier (michael.paqu...@gmail.com) wrote:
> >> On Mon, May 11, 2015 at 10:32 PM, Stephen Frost  wrote:
> >> > Now for a blast from the past...  This came up again on IRC recently and
> >> > reminded me that I ran into the same issue a couple years back.  Updated
> >> > patch includes the refactoring suggested and includes documentation.
> >> >
> >> > Barring objections, I'll push this later today.
> >>
> >> Small suggestion: a test case in src/test/isolation?
> >
> > This is entirely a permissions-related change and src/test/isolation is
> > for testing concurrent behavior, not about testing permissions.
> >
> > I'm not saying that we shouldn't have more tests there, but it'd not
> > be appropriate for this particular patch.
> 
> Perhaps. Note that we could have tests of this type though in lock.sql:
> create role foo login;
> create table aa (a int);
> grant insert on aa TO foo;
> \c postgres foo
> begin;
> insert into aa values (1);
> lock table aa in row exclusive mode; -- should pass

Yeah, it might not be bad to have tests for all the different lock types
and make sure that the permissions match up.  I'd probably put those
tests into 'permissions.sql' instead though.

> Just mentioning it for the sake of the archives, I cannot work on that for 
> now.

Ditto.  I'm trying to work through the postgres_fdw UPDATE push-down
patch now..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Disabling trust/ident authentication configure option

2015-05-11 Thread Robert Haas
On Thu, May 7, 2015 at 4:57 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Thu, May 7, 2015 at 11:02 AM, Stephen Frost  wrote:
>> > I realize it's not going to be popular, but I'd love to have 'trust'
>> > only allowed if a command-line option is passed to the postmaster or
>> > something along those lines.  It's really got no business being an
>> > option for a network service like PG.
>>
>> I disagree wholeheartedly.  There is such a thing as a trusted network.
>
> Likely a good topic of conversation to be had in Ottawa. :)  I agree
> that there are trusted networks, but the ones that I work with still
> expect network services to require authentication and authorization.
> Perhaps they're not really "trusted" then, from your perspective.  On
> the other hand, I suppose if you use pg_hba to limit which accounts can
> be logged into with 'trust' then you might be able to have, say, a
> "read-only" user/database that anyone could see.  That's a pretty narrow
> case though and I'd rather we figure out how to address it directly and
> more specifically (no-password login roles?) than the broad
> disable-all-authentication "trust" method.

Let's suppose that you have an application server and a DB server
running on the same node.  That turns out to be too much load, so you
move the application server to a separate machine and connect the two
machines with a crossover cable, or a VLAN that has nothing else on
it.  To me, it's quite sane to want connections on that network to
proceed without authentication or authorization.  If you've got to
open up the database more than that then, yes, you need authentication
and authorization.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Michael Paquier
On Tue, May 12, 2015 at 4:01 AM, Stephen Frost  wrote:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Mon, May 11, 2015 at 10:32 PM, Stephen Frost  wrote:
>> > Now for a blast from the past...  This came up again on IRC recently and
>> > reminded me that I ran into the same issue a couple years back.  Updated
>> > patch includes the refactoring suggested and includes documentation.
>> >
>> > Barring objections, I'll push this later today.
>>
>> Small suggestion: a test case in src/test/isolation?
>
> This is entirely a permissions-related change and src/test/isolation is
> for testing concurrent behavior, not about testing permissions.
>
> I'm not saying that we shouldn't have more tests there, but it'd not
> be appropriate for this particular patch.

Perhaps. Note that we could have tests of this type though in lock.sql:
create role foo login;
create table aa (a int);
grant insert on aa TO foo;
\c postgres foo
begin;
insert into aa values (1);
lock table aa in row exclusive mode; -- should pass

Just mentioning it for the sake of the archives, I cannot work on that for now.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
All,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > if (lockmode == AccessShareLock)
> > > aclresult = pg_class_aclcheck(reloid, GetUserId(),
> > >   ACL_SELECT);
> > > +   else if (lockmode == RowExclusiveLock)
> > > +   aclresult = pg_class_aclcheck(reloid, GetUserId(),
> > > +ACL_INSERT | ACL_UPDATE | ACL_DELETE | 
> > > ACL_TRUNCATE);
> > > else
> > > aclresult = pg_class_aclcheck(reloid, GetUserId(),
> > >   ACL_UPDATE | ACL_DELETE | 
> > > ACL_TRUNCATE);
> > 
> > Perhaps it would be better to refactor with a local variable for the
> > aclmask and just one instance of the pg_class_aclcheck call.  Also, I'm
> > pretty sure that the documentation work needed is more extensive
> > than the actual patch ;-).  Otherwise, I don't see a problem with this.
> 
> Now for a blast from the past...  This came up again on IRC recently and
> reminded me that I ran into the same issue a couple years back.  Updated
> patch includes the refactoring suggested and includes documentation.
> 
> Not going to be back-patched, as discussed with Robert.
> 
> Barring objections, I'll push this later today.

Done, finally.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] deparsing utility commands

2015-05-11 Thread David Steele
On 5/8/15 3:29 PM, Alvaro Herrera wrote:
>> Here is a complete version.  Barring serious problems, I intend to
>> commit this on Monday.

I have reviewed and tested this patch and everything looks good to me.
It also looks like you added better coverage for schema DDL, which is a
welcome addition.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


[HACKERS] Re: [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table

2015-05-11 Thread Peter Geoghegan
On Mon, May 11, 2015 at 9:47 AM, Andres Freund  wrote:
> On 2015-05-10 16:01:53 -0400, Tom Lane wrote:
>> The cause of the problem seems to be that the UPDATE performs a HOT update
>> of the new tuple, leaving in this case a dead tuple at (0,2) that is HOT
>> updated by (0,3).  When unique_key_recheck() is invoked for (0,2), it
>> believes, correctly, that it has to perform the recheck anyway ... but it
>> tells check_exclusion_constraint that the check is being performed for
>> (0,2).  So the index search inside check_exclusion_constraint finds the
>> live tuple at (0,3) and thinks that is a conflict.
>
> Heh, it's curious that this wasn't found up until now. I also wonder if
> this might be related to the spurious violations Peter G. has been
> observing...

I don't think so. Speculative insertion relies on the assumption that
the speculatively inserted tuple isn't MVCC visible to other sessions.
I actually prototyped an implementation that avoided the historic
"unprincipled deadlocks" of exclusion constraints (a known limitation
since they were added), by making *UPDATE* also do a speculative
insertion, and by making even non-UPSERT INSERTs insert speculatively.

This almost worked, but when time came to back out of a speculative
insertion on an UPDATE due to a conflict from a concurrent session,
the implementation couldn't handle it - it was just a mess to try and
figure out how that was supposed to work with heap_update(), and so
that prototype was scrapped.

For the benefit of those not closely involved in the ON CONFLICT
project, I should point out that ON CONFLICT will not accept a
deferred index as an arbiter index.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-11 Thread Abhijit Menon-Sen
At 2015-05-11 19:15:47 +0200, and...@anarazel.de wrote:
>
> TBH, I'd rather not touch unrelated things right now. We're pretty
> badly behind...

OK. That patch is independent; just ignore it.

> I don't really care how it's named, as long as it makes clear that
> it's not an exact measurement.

Not having heard any better suggestions, I picked "pgstatapprox" as a
compromise between length and familiarity/consistency with pgstattuple.

> > Should I count the space it would have free if it were initialised,
> > but leave the page alone for VACUUM to deal with?

And this is what the attached patch does.

I also cleaned up a few things that I didn't like but had left alone to
make the code look similar to pgstattuple. In particular, build_tuple()
now does nothing but build a tuple from values calculated earlier in
pgstatapprox_heap().

Thank you.

-- Abhijit

P.S. What, if anything, should be done about the complicated and likely
not very useful skip-only-min#-blocks logic in lazy_scan_heap?
>From 0a99fc61d36e3a3ff4003db95e5c299a5f07a850 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Fri, 26 Dec 2014 12:37:13 +0530
Subject: Add pgstatapprox to pgstattuple

---
 contrib/pgstattuple/Makefile   |   4 +-
 contrib/pgstattuple/pgstatapprox.c | 277 +
 contrib/pgstattuple/pgstattuple--1.2--1.3.sql  |  18 ++
 .../{pgstattuple--1.2.sql => pgstattuple--1.3.sql} |  18 +-
 contrib/pgstattuple/pgstattuple.control|   2 +-
 doc/src/sgml/pgstattuple.sgml  | 135 ++
 6 files changed, 450 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pgstattuple/pgstatapprox.c
 create mode 100644 contrib/pgstattuple/pgstattuple--1.2--1.3.sql
 rename contrib/pgstattuple/{pgstattuple--1.2.sql => pgstattuple--1.3.sql} (72%)

diff --git a/contrib/pgstattuple/Makefile b/contrib/pgstattuple/Makefile
index 862585c..6083dab 100644
--- a/contrib/pgstattuple/Makefile
+++ b/contrib/pgstattuple/Makefile
@@ -1,10 +1,10 @@
 # contrib/pgstattuple/Makefile
 
 MODULE_big	= pgstattuple
-OBJS		= pgstattuple.o pgstatindex.o $(WIN32RES)
+OBJS		= pgstattuple.o pgstatindex.o pgstatapprox.o $(WIN32RES)
 
 EXTENSION = pgstattuple
-DATA = pgstattuple--1.2.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
+DATA = pgstattuple--1.3.sql pgstattuple--1.2--1.3.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
 PGFILEDESC = "pgstattuple - tuple-level statistics"
 
 REGRESS = pgstattuple
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
new file mode 100644
index 000..46f5bc0
--- /dev/null
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -0,0 +1,277 @@
+/*
+ * contrib/pgstattuple/pgstatapprox.c
+ * Fast bloat estimation functions
+ */
+
+#include "postgres.h"
+
+#include "access/visibilitymap.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/multixact.h"
+#include "access/htup_details.h"
+#include "catalog/namespace.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/freespace.h"
+#include "storage/procarray.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
+#include "utils/tqual.h"
+#include "commands/vacuum.h"
+
+PG_FUNCTION_INFO_V1(pgstatapprox);
+
+typedef struct output_type
+{
+	uint64		table_len;
+	uint64		scanned_percent;
+	uint64		tuple_count;
+	uint64		tuple_len;
+	double		tuple_percent;
+	uint64		dead_tuple_count;
+	uint64		dead_tuple_len;
+	double		dead_tuple_percent;
+	uint64		free_space;
+	double		free_percent;
+} output_type;
+
+#define NUM_OUTPUT_COLUMNS 10
+
+/*
+ * This function takes an already open relation and scans its pages,
+ * skipping those that have the corresponding visibility map bit set.
+ * For pages we skip, we find the free space from the free space map
+ * and approximate tuple_len on that basis. For the others, we count
+ * the exact number of dead tuples etc.
+ *
+ * This scan is loosely based on vacuumlazy.c:lazy_scan_heap(), but
+ * we do not try to avoid skipping single pages.
+ */
+
+static void
+pgstatapprox_heap(Relation rel, output_type *stat)
+{
+	BlockNumber scanned,
+nblocks,
+blkno;
+	Buffer		vmbuffer = InvalidBuffer;
+	BufferAccessStrategy bstrategy;
+	TransactionId OldestXmin;
+	uint64		misc_count = 0;
+
+	OldestXmin = GetOldestXmin(rel, true);
+	bstrategy = GetAccessStrategy(BAS_BULKREAD);
+
+	nblocks = RelationGetNumberOfBlocks(rel);
+	scanned = 0;
+
+	for (blkno = 0; blkno < nblocks; blkno++)
+	{
+		Buffer		buf;
+		Page		page;
+		OffsetNumber offnum,
+	maxoff;
+		Size		freespace;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * If the page has only visible tuples, then we can find out the
+		 * free space from the FSM and move on.
+		 */
+
+		if (visibilitymap_test(rel, blkno, &vmbuffer))
+		{
+			freespace = GetRecordedFreeSpace(rel, blkno);
+			stat->tuple_len += BLCKSZ - freespace;
+			stat->free_space += freespa

Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Mon, May 11, 2015 at 10:32 PM, Stephen Frost  wrote:
> > Now for a blast from the past...  This came up again on IRC recently and
> > reminded me that I ran into the same issue a couple years back.  Updated
> > patch includes the refactoring suggested and includes documentation.
> >
> > Barring objections, I'll push this later today.
> 
> Small suggestion: a test case in src/test/isolation?

This is entirely a permissions-related change and src/test/isolation is
for testing concurrent behavior, not about testing permissions.

I'm not saying that we shouldn't have more tests there, but it'd not
be appropriate for this particular patch.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Michael Paquier
On Mon, May 11, 2015 at 10:32 PM, Stephen Frost  wrote:
> Now for a blast from the past...  This came up again on IRC recently and
> reminded me that I ran into the same issue a couple years back.  Updated
> patch includes the refactoring suggested and includes documentation.
>
> Barring objections, I'll push this later today.

Small suggestion: a test case in src/test/isolation?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres GSSAPI Encryption

2015-05-11 Thread Robbie Harwood
Stephen Frost  writes:

> Robbie,
>
> * Robbie Harwood (rharw...@redhat.com) wrote:
>
>> We'd I think also want a new kind of HBA entry (probably something along
>> the lines of `hostgss` to contrast with `hostssl`), but I'm not sure
>> what we'd want to do for the counterpart of `hostnossl` (`hostnogss`?
>> But then do we need `hostnogssnossl`?  Is this even a useful setting to
>> have?), so that will probably require broader discussion.
>
> Are you intending to support GSSAPI encryption *without* using the
> GSSAPI authentication mechanism?  If not, maybe we can come up with a
> way to have an option to the GSSAPI auth mechanism that behaves the way
> we want for GSSAPI encryption.  Perhaps something like:
>
> encryption = (none | request | require)
>
> If you need an option for it to still be able to fall-thru to the next
> pg_hba line, that'd be more difficult, but is that really a sensible
> use-case?

That's a good point.  I don't see GSSAPI encryption without GSSAPI
authentication as a particularly compelling use case, so a setting like
that (with default presumably to request for backward compatibility)
seems perfect.

As for fall-through on failure, I don't really know what's reasonable
here.  My understanding of the way it works currently suggests that it
would be *expected* to fall-through based on the way other things
behave, though it's certainly less effort on my part if it does not.

>> Finally, I think there are a couple different ways the protocol could
>> look.  I'd ideally like to opportunistically encrypt all
>> GSSAPI-authenticated connections and fallback to non-encrypted when the
>> other end doesn't support it (possibly dropping the connection if this
>> causes it to not match HBA), but I'm not sure if this will work with the
>> existing code.  A (less-nice) option could be to add a new
>> authentication method for gss->encryption, which feels aesthetically
>> misplaced.  The approach used for SSL today could probably be adopted as
>> a third approach, but I don't really see a gain from doing it this way
>> since encryption happens after authentication (opposite of SSL) in
>> GSSAPI.
>
> I'd definitely like to see us opportunistically encrypt all GSSAPI
> authenticated connections also.  The main case to consider is how to
> support migrating users who are currently using GSSAPI + SSL to GSSAPI
> auth+encryption, and how to support them if they want to continue to use
> GSSAPI just for user auth and use SSL for encryption.

So if we go with the "encryption" option, we might not need a specific
entry for GSS hosts.  For the SSL encryption/GSS auth case, we'd want it
to work the way it does now where you specify "hostssl" and then "gss"
as the method.  Then for the GSS for auth and encryption, one would use
a normal "host" entry, then specify gss as the method, and then set
encryption=require.

One consequence of this would be that you can do "double encryption" by
specifying "hostssl", then "gss" with "encrypt = require".  I don't
think this is something more desirable than just one or the other, but
unless it's actually a problem (and I don't think it is) to have around,
I think the setup would work nicely.  We could also default "encrypt" to
"none" when hostssl is specified if it is a problem.

Thanks!
--Robbie


signature.asc
Description: PGP signature


Re: [HACKERS] Use outerPlanState() consistently in executor code

2015-05-11 Thread Qingqing Zhou
On Mon, May 4, 2015 at 1:23 PM, Robert Haas  wrote:
> I fixed several whitespace errors, reverted the permissions changes
> you included

Sorry about the permission changes - didn't notice that bite.

Thanks,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Alvaro Herrera
Robert Haas wrote:

> OK, I have made this change.  Barring further trouble reports, this
> completes the multixact work I plan to do for the next release.

Many thanks for all the effort here -- much appreciated.

> 2. The recent changes adjust things - for good reason - so that the
> safe threshold for multixact member creation is advanced only at
> checkpoint time.  This means it's theoretically possible to have a
> situation where autovacuum has done all it can, but because no
> checkpoint has happened yet, the user can't create any more
> multixacts.  Thanks to some good work by Thomas, autovacuum will
> realize this and avoid spinning uselessly over every table in the
> system, which is good, but you're still stuck with errors until the
> next checkpoint.  Essentially, we're hoping that autovacuum will clean
> things up far enough in advance of hitting the threshold where we have
> to throw an error that a checkpoint will intervene before the error
> starts happening.  It's possible we could improve this further, but I
> think it would be unwise to mess with it right now.  It may be that
> there is no real-world problem here.

See my response to Josh.  I think much of the current rube-goldbergian
design is due to the fact that pg_control cannot be changed in back
branches.  Going forward, I think a better plan is to include more info
in pg_control, WAL-log more operations, remove checkpoint from the loop
and have everything happen at vacuum time.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Alvaro Herrera
Josh Berkus wrote:

> In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
> this instead of adding a new GUC?  We already have a bunch of freezing
> GUCs which fewer than 1% of our user base has any idea how to set.

If you have development resources to pour onto 9.5, I think it would be
better spent changing multixact usage tracking so that oldestOffset is
included in pg_control; also make pg_multixact truncation be WAL-logged.
With those changes, the need for a lot of pretty complicated code would
go away.  The fact that truncation is done by both vacuum and checkpoint
causes a lot of the mess we were in (and from which Robert and Thomas
took us --- thanks guys!).  Such a change is the first step towards
auto-tuning, I think.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Josh Berkus
On 05/11/2015 09:54 AM, Robert Haas wrote:
> OK, I have made this change.  Barring further trouble reports, this
> completes the multixact work I plan to do for the next release.  Here
> is what is outstanding:
> 
> 1. We might want to introduce a GUC to control the point at which
> member offset utilization begins clamping
> autovacuum_multixact_freeze_max_age.  It doesn't seem wise to do
> anything about this before pushing a minor release out.  It's not
> entirely trivial, and it may be helpful to learn more about how the
> changes already made work out in practice before proceeding.  Also, we
> might not back-patch this anyway.

-1 on back-patching a new GUC.  People don't know what to do with the
existing multixact GUCs, and without an age(multixact) function
built-in, any adjustments a user tries to make are likely to do more
harm than good.

In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning
this instead of adding a new GUC?  We already have a bunch of freezing
GUCs which fewer than 1% of our user base has any idea how to set.

> 2. The recent changes adjust things - for good reason - so that the
> safe threshold for multixact member creation is advanced only at
> checkpoint time.  This means it's theoretically possible to have a
> situation where autovacuum has done all it can, but because no
> checkpoint has happened yet, the user can't create any more
> multixacts.  Thanks to some good work by Thomas, autovacuum will
> realize this and avoid spinning uselessly over every table in the
> system, which is good, but you're still stuck with errors until the
> next checkpoint.  Essentially, we're hoping that autovacuum will clean
> things up far enough in advance of hitting the threshold where we have
> to throw an error that a checkpoint will intervene before the error
> starts happening.  It's possible we could improve this further, but I
> think it would be unwise to mess with it right now.  It may be that
> there is no real-world problem here.

Given that our longest possible checkpoint timeout is an hour, is it
even hypotethically possible that we would hit a limit in that time?
How many mxact members are we talking about?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-11 Thread Andres Freund
Hi,

On 2015-05-11 16:57:08 +0530, Abhijit Menon-Sen wrote:
> I've attached an updated patch for pgstatbloat, as well as a patch to
> replace two uses of BuildTupleFromCStrings() elsewhere in pgstattuple.

TBH, I'd rather not touch unrelated things right now. We're pretty badly
behind...

> I've made the changes you mentioned in your earlier mail, except that I
> have not changed the name pending further suggestions about what would
> be the best name.

I don't really care how it's named, as long as it makes clear that it's
not an exact measurement.

> > > I haven't checked, but I'm not sure that it's safe/meaningful to
> > > call PageGetHeapFreeSpace() on a new page.
> > 
> > OK, I'll check and fix if necessary.
> 
> You're right, PageGetHeapFreeSpace() isn't safe on a new page. I've
> added a guard to that call in the attached patch, but I'm not sure
> that's the right thing to do.

> Should I copy the orphaned new-page handling from lazy_scan_heap? What
> about for empty pages? Neither feels like a very good thing to do, but
> then neither does skipping the new page silently. Should I count the
> space it would have free if it were initialised, but leave the page
> alone for VACUUM to deal with? Or just leave it as it is?

I think there's absolutely no need for pgstattuple to do anything
here. It's not even desirable.

> + LockBuffer(buf, BUFFER_LOCK_SHARE);
> +
> + page = BufferGetPage(buf);
> +
> + if (!PageIsNew(page))
> + stat->free_space += PageGetHeapFreeSpace(page);
> +
> + if (PageIsNew(page) || PageIsEmpty(page))
> + {
> + UnlockReleaseBuffer(buf);
> + continue;
> + }

Shouldn't new pages be counted as being fully free or at least bloat?
Just disregarding them seems wrong.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade: quote directory names in delete_old_cluster script

2015-05-11 Thread Bruce Momjian
On Wed, Apr 29, 2015 at 10:52:45PM -0400, Bruce Momjian wrote:
> On Mon, Feb 16, 2015 at 05:03:45PM -0500, Bruce Momjian wrote:
> > > All of our makefiles use single quotes, so effectively the only
> > > character we don't support in directory paths is the single quote itself.
> > 
> > This seems to say that Windows batch files don't have any special
> > meaning for single quotes:
> > 
> > 
> > http://stackoverflow.com/questions/24173825/what-does-single-quote-do-in-windows-batch-files
> > 
> > http://stackoverflow.com/questions/10737283/single-quotes-and-double-quotes-how-to-have-the-same-behaviour-in-unix-and-wind
> > 
> > Again, is it worth doing something platform-specific?  I can certainly
> > define a platform-specific macro for " or ' as and use that.  Good idea?
> 
> I have developed the attached patch to use platform-specific quoting of
> path names.

Applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 10:11 AM, Noah Misch  wrote:
> On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote:
>> Given your concerns, and the need to get a fix for this out the door
>> quickly, what I'm inclined to do for the present is go bump the
>> threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
>> changing anything else.
>
> +1
>
>> Your analysis shows that this is more in line
>> with the existing policy for multixact IDs than what I did, and it
>> will reduce the threat of frequent wraparound scans.  Now, it will
>> also increase the chances of somebody hitting the wall before
>> autovacuum can bail them out.  But maybe not that much.  If we need
>> 75% of the multixact member space to complete one cycle of
>> anti-wraparound vacuums, we're actually very close to the point where
>> the system just cannot work.  If that's one big table, we're done.
>
> Agreed.

OK, I have made this change.  Barring further trouble reports, this
completes the multixact work I plan to do for the next release.  Here
is what is outstanding:

1. We might want to introduce a GUC to control the point at which
member offset utilization begins clamping
autovacuum_multixact_freeze_max_age.  It doesn't seem wise to do
anything about this before pushing a minor release out.  It's not
entirely trivial, and it may be helpful to learn more about how the
changes already made work out in practice before proceeding.  Also, we
might not back-patch this anyway.

2. The recent changes adjust things - for good reason - so that the
safe threshold for multixact member creation is advanced only at
checkpoint time.  This means it's theoretically possible to have a
situation where autovacuum has done all it can, but because no
checkpoint has happened yet, the user can't create any more
multixacts.  Thanks to some good work by Thomas, autovacuum will
realize this and avoid spinning uselessly over every table in the
system, which is good, but you're still stuck with errors until the
next checkpoint.  Essentially, we're hoping that autovacuum will clean
things up far enough in advance of hitting the threshold where we have
to throw an error that a checkpoint will intervene before the error
starts happening.  It's possible we could improve this further, but I
think it would be unwise to mess with it right now.  It may be that
there is no real-world problem here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table

2015-05-11 Thread Andres Freund
Hi,

On 2015-05-10 16:01:53 -0400, Tom Lane wrote:
> The cause of the problem seems to be that the UPDATE performs a HOT update
> of the new tuple, leaving in this case a dead tuple at (0,2) that is HOT
> updated by (0,3).  When unique_key_recheck() is invoked for (0,2), it
> believes, correctly, that it has to perform the recheck anyway ... but it
> tells check_exclusion_constraint that the check is being performed for
> (0,2).  So the index search inside check_exclusion_constraint finds the
> live tuple at (0,3) and thinks that is a conflict.

Heh, it's curious that this wasn't found up until now. I also wonder if
this might be related to the spurious violations Peter G. has been
observing...

> The attached patch seems to fix the problem without breaking any existing
> regression tests, but I wonder if anyone can see a hole in it.

Looks good to me.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb start server recommendation

2015-05-11 Thread Bruce Momjian
On Fri, May  1, 2015 at 10:14:13AM -0400, Bruce Momjian wrote:
> Currently initdb outputs suggested text on starting the server:
> 
>   Success. You can now start the database server using:
>   
>   /u/pgsql/bin/postgres -D /u/pgsql/data
>   or
>   /u/pgsql/bin/pg_ctl -D /u/pgsql/data -l logfile start
> 
> I am now thinking pg_ctl should be recommended first.  At the time this
> text was written pg_ctl was new.

I have applied a patch to initdb to only recommend the pg_ctl method.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-11 Thread Heikki Linnakangas

On 05/08/2015 04:21 PM, Heikki Linnakangas wrote:

On 04/22/2015 10:07 AM, Michael Paquier wrote:

On Wed, Apr 22, 2015 at 3:38 PM, Heikki Linnakangas  wrote:

I feel that the best approach is to archive the last, partial segment, but
with the .partial suffix. I don't see any plausible real-wold setup where
the current behavior would be better. I don't really see much need to
archive the partial segment at all, but there's also no harm in doing it, as
long as it's clearly marked with the .partial suffix.


Well, as long as it is clearly archived at promotion, even with a
suffix, I guess that I am fine... This will need some tweaking on
restore_command for existing applications, but as long as it is
clearly documented I am fine. Shouldn't this be a different patch
though?


Ok, I came up with the attached, which adds the .partial suffix to the
partial WAL segment that's archived after promotion. I couldn't find any
natural place to talk about it in the docs, though. I think after the
docs changes from the main patch are applied, it would be natural to
mention this in the "Continuous archiving in standby", so I think I'll
add that later.

Barring objections, I'll push this later tonight.


Applied that part.


Now that we got this last-partial-segment problem out of the way, I'm
going to try fixing the problem you (Michael) pointed out about relying
on pgstat file. Meanwhile, I'd love to get more feedback on the rest of
the patch, and the documentation.


And here is a new version of the patch. I kept the approach of using 
pgstat, but it now only polls pgstat every 10 seconds, and doesn't block 
to wait for updated stats.


- Heikki

From 08ca3cc7b9824503b793e149247ea9c6d3a7f323 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 16 Apr 2015 14:40:24 +0300
Subject: [PATCH v3 1/1] Make WAL archival behave more sensibly in standby
 mode.

This adds two new archive_modes, 'shared' and 'always', to indicate whether
the WAL archive is shared between the primary and standby, or not. In
shared mode, the standby tracks which files have been archived by the
primary. The standby refrains from recycling files that the primary has not
yet archived, and at failover, the standby archives all those files too
from the old timeline. In 'always' mode, the standby's WAL archive is taken
to be separate from the primary's, and the standby independently archives
all files it receives from the primary.

This adds a new "archival status" message to the protocol. WAL sender sends
one automatically, when the last archived WAL file, as reported in pgstat,
changes. (Or rather, some time after it changes. We're not in a hurry, the
standby doesn't need an up-to-the-second status)

Fujii Masao and me.
---
 doc/src/sgml/config.sgml  |  12 +-
 doc/src/sgml/high-availability.sgml   |  48 +++
 doc/src/sgml/protocol.sgml|  31 +
 src/backend/access/transam/xlog.c |  29 +++-
 src/backend/postmaster/pgstat.c   |  44 ++
 src/backend/postmaster/postmaster.c   |  37 +++--
 src/backend/replication/walreceiver.c | 172 +++-
 src/backend/replication/walsender.c   | 186 ++
 src/backend/utils/misc/guc.c  |  21 +--
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/access/xlog.h |  14 +-
 src/include/pgstat.h  |   2 +
 12 files changed, 513 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0d8624a..ac845e0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2521,7 +2521,7 @@ include_dir 'conf.d'
 
 
  
-  archive_mode (boolean)
+  archive_mode (enum)
   
archive_mode configuration parameter
   
@@ -2530,7 +2530,15 @@ include_dir 'conf.d'

 When archive_mode is enabled, completed WAL segments
 are sent to archive storage by setting
-.
+. In addition to off,
+to disable, there are three modes: on, shared,
+and always. During normal operation, there is no
+difference between the three modes, but in archive recovery or
+standby mode, it indicates whether the WAL archive is shared between
+the primary and the standby server or not. See
+ for details.
+ 
+   
 archive_mode and archive_command are
 separate variables so that archive_command can be
 changed without leaving archiving mode.
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a17f555..62f7c75 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1220,6 +1220,54 @@ primary_slot_name = 'node_a_slot'
 

   
+
+  
+   Continuous archiving in standby
+
+   
+ continuous archiving
+ in standby
+   
+
+   
+ When continuous WAL archiving is used in

Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Tom Lane
I wrote:
> Etsuro Fujita  writes:
>> On 2015/05/11 8:50, Tom Lane wrote:
>>> I wonder if we should instead add a "ScanState*" field and expect the
>>> core code to set that up (ExecOpenScanRelation could do it with minor
>>> API changes...).

>> Sorry, I don't understand clearly what you mean, but that (the idea of
>> expecting the core to set it up) sounds inconsistent with your comment
>> on the earlier version of the API "BeginForeignFetch" [1].

> Well, the other way that it could work would be for the FDW's
> BeginForeignScan routine to search for a relevant ExecRowMark entry
> (which, per that previous discussion, it'd likely need to do anyway) and
> then plug a back-link to the ForeignScanState into the ExecRowMark.
> But I don't see any very good reason to make every FDW that's concerned
> with this do that, rather than doing it once in the core code.  I'm also
> thinking that having a link to the source scan node there might be useful
> someday for regular tables as well as FDWs.

And on the third hand ... in view of the custom/foreign join pushdown
stuff that just went in, it would be a mistake to assume that there
necessarily is a distinct source scan node associated with each
ExecRowMark.  The FDW can presumably find all the ExecRowMarks associated
with the rels that a join ForeignScan is scanning, but we can't assume
that ExecOpenScanRelation will be able to set up those links, and the FDW
might not want a simple link to the join scan node anyway.  So it's
probably best to leave it as an unspecified void* instead of trying to
nail down the meaning more precisely.

I still don't much like calling it "fdw_state" though, because I don't
think it should be documented as only for the use of FDWs.  (Custom scans
might have a use for a pointer field here too, for example.)  Maybe just
call it "void *extra" and document it as being for the use of whatever
plan node is sourcing the relation's tuples.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Tom Lane
Etsuro Fujita  writes:
> On 2015/05/11 8:50, Tom Lane wrote:
>> In particular, I find the addition of "void *fdw_state" to ExecRowMark
>> to be pretty questionable.  That does not seem like a good place to keep
>> random state.  (I realize that WHERE CURRENT OF keeps some state in
>> ExecRowMark, but that's a crock not something to emulate.)  ISTM that in
>> most scenarios, the state that LockForeignRow/FetchForeignRow would like
>> to get at is probably the FDW state associated with the ForeignScanState
>> that the tuple came from.  Which this API doesn't help with particularly.
>> I wonder if we should instead add a "ScanState*" field and expect the
>> core code to set that up (ExecOpenScanRelation could do it with minor
>> API changes...).

> Sorry, I don't understand clearly what you mean, but that (the idea of
> expecting the core to set it up) sounds inconsistent with your comment
> on the earlier version of the API "BeginForeignFetch" [1].

Well, the other way that it could work would be for the FDW's
BeginForeignScan routine to search for a relevant ExecRowMark entry
(which, per that previous discussion, it'd likely need to do anyway) and
then plug a back-link to the ForeignScanState into the ExecRowMark.
But I don't see any very good reason to make every FDW that's concerned
with this do that, rather than doing it once in the core code.  I'm also
thinking that having a link to the source scan node there might be useful
someday for regular tables as well as FDWs.

>> Also, as I mentioned, I'd be a whole lot happier if we had a way to test
>> this...

> Attached is a postgres_fdw patch that I used for the testing.  If you
> try it, edit postgresGetForeignRowMarkType as necessary.  I have to
> confess that I did the testing only in the normal conditions by the patch.

Thanks, this is helpful.  However, it pretty much proves my point about
wanting the back-link --- it seems like init_foreign_fetch_state, for
example, is uselessly repeating a lot of the setup already done for
the scan node itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tzdata and 9.4.2, etc.

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 10:13 AM, Thom Brown  wrote:
>> We are now three tzdata changes behind.  There are bugs in pg_dump
>> which create real restore errors for people using PostGIS, one of our
>> most popular extensions.
>>
>> Can we please wrap and ship 9.4.2, etc., and do it soon?
>
> +1

I'm more concerned about the latest round of multixact bugs than I am
about the tzdata, but I think that's at a point now where we can go
ahead and schedule a release.  And I think we should.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tzdata and 9.4.2, etc.

2015-05-11 Thread Thom Brown
On 4 May 2015 at 22:42, David Fetter  wrote:
> Folks,
>
> We are now three tzdata changes behind.  There are bugs in pg_dump
> which create real restore errors for people using PostGIS, one of our
> most popular extensions.
>
> Can we please wrap and ship 9.4.2, etc., and do it soon?

+1

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-11 Thread Noah Misch
On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote:
> Given your concerns, and the need to get a fix for this out the door
> quickly, what I'm inclined to do for the present is go bump the
> threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
> changing anything else.

+1

> Your analysis shows that this is more in line
> with the existing policy for multixact IDs than what I did, and it
> will reduce the threat of frequent wraparound scans.  Now, it will
> also increase the chances of somebody hitting the wall before
> autovacuum can bail them out.  But maybe not that much.  If we need
> 75% of the multixact member space to complete one cycle of
> anti-wraparound vacuums, we're actually very close to the point where
> the system just cannot work.  If that's one big table, we're done.

Agreed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE Permissions

2015-05-11 Thread Stephen Frost
All,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > if (lockmode == AccessShareLock)
> > aclresult = pg_class_aclcheck(reloid, GetUserId(),
> >   ACL_SELECT);
> > +   else if (lockmode == RowExclusiveLock)
> > +   aclresult = pg_class_aclcheck(reloid, GetUserId(),
> > +ACL_INSERT | ACL_UPDATE | ACL_DELETE | 
> > ACL_TRUNCATE);
> > else
> > aclresult = pg_class_aclcheck(reloid, GetUserId(),
> >   ACL_UPDATE | ACL_DELETE | 
> > ACL_TRUNCATE);
> 
> Perhaps it would be better to refactor with a local variable for the
> aclmask and just one instance of the pg_class_aclcheck call.  Also, I'm
> pretty sure that the documentation work needed is more extensive
> than the actual patch ;-).  Otherwise, I don't see a problem with this.

Now for a blast from the past...  This came up again on IRC recently and
reminded me that I ran into the same issue a couple years back.  Updated
patch includes the refactoring suggested and includes documentation.

Not going to be back-patched, as discussed with Robert.

Barring objections, I'll push this later today.

Thanks!

Stephen
From d2b0fbc9fd8c0783f870fef3c901f7f8891c6e90 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 11 May 2015 09:14:49 -0400
Subject: [PATCH] Allow LOCK TABLE .. ROW EXCLUSIVE MODE with INSERT

INSERT acquires RowExclusiveLock during normal operation and therefore
it makes sense to allow LOCK TABLE .. ROW EXCLUSIVE MODE to be executed
by users who have INSERT rights on a table (even if they don't have
UPDATE or DELETE).

Not back-patching this as it's a behavior change which, strictly
speaking, loosens security restrictions.

Per discussion with Tom and Robert (circa 2013).
---
 doc/src/sgml/ref/lock.sgml  |  8 +---
 src/backend/commands/lockcmds.c | 12 
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index 913afe7..b946eab 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -161,9 +161,11 @@ LOCK [ TABLE ] [ ONLY ] name [ * ]
 

 LOCK TABLE ... IN ACCESS SHARE MODE requires SELECT
-privileges on the target table.  All other forms of LOCK
-require table-level UPDATE, DELETE, or
-TRUNCATE privileges.
+privileges on the target table.  LOCK TABLE ... IN ROW EXCLUSIVE
+MODE requires INSERT, UPDATE, DELETE,
+or TRUNCATE privileges on the target table.  All other forms of
+LOCK require table-level UPDATE, DELETE,
+or TRUNCATE privileges.

 

diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index bdec2ff..a167082 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -169,13 +169,17 @@ static AclResult
 LockTableAclCheck(Oid reloid, LOCKMODE lockmode)
 {
 	AclResult	aclresult;
+	AclMode		aclmask;
 
 	/* Verify adequate privilege */
 	if (lockmode == AccessShareLock)
-		aclresult = pg_class_aclcheck(reloid, GetUserId(),
-	  ACL_SELECT);
+		aclmask = ACL_SELECT;
+	else if (lockmode == RowExclusiveLock)
+		aclmask = ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE;
 	else
-		aclresult = pg_class_aclcheck(reloid, GetUserId(),
-	  ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE);
+		aclmask = ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE;
+
+	aclresult = pg_class_aclcheck(reloid, GetUserId(), aclmask);
+
 	return aclresult;
 }
-- 
1.9.1



signature.asc
Description: Digital signature


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-11 Thread Etsuro Fujita
On 2015/05/11 8:50, Tom Lane wrote:
> Etsuro Fujita  writes:
>> [ EvalPlanQual-v6.patch ]
> 
> I've started to study this in a little more detail, and I'm not terribly
> happy with some of the API decisions in it.

Thanks for taking the time to review the patch!

> In particular, I find the addition of "void *fdw_state" to ExecRowMark
> to be pretty questionable.  That does not seem like a good place to keep
> random state.  (I realize that WHERE CURRENT OF keeps some state in
> ExecRowMark, but that's a crock not something to emulate.)  ISTM that in
> most scenarios, the state that LockForeignRow/FetchForeignRow would like
> to get at is probably the FDW state associated with the ForeignScanState
> that the tuple came from.  Which this API doesn't help with particularly.
> I wonder if we should instead add a "ScanState*" field and expect the
> core code to set that up (ExecOpenScanRelation could do it with minor
> API changes...).

Sorry, I don't understand clearly what you mean, but that (the idea of
expecting the core to set it up) sounds inconsistent with your comment
on the earlier version of the API "BeginForeignFetch" [1].

> I'm also a bit tempted to pass the TIDs to LockForeignRow and
> FetchForeignRow as Datums not ItemPointers.  We have the Datum format
> available already at the call sites, so this is free as far as the core
> code is concerned, and would only cost another line or so for the FDWs.
> This is by no means sufficient to allow FDWs to use some other type than
> "tid" for row identifiers; but it would be a down payment on that problem,
> and at least would avoid nailing the rowids-are-tids assumption into yet
> another global API.

That is a good idea.

> Also, as I mentioned, I'd be a whole lot happier if we had a way to test
> this...

Attached is a postgres_fdw patch that I used for the testing.  If you
try it, edit postgresGetForeignRowMarkType as necessary.  I have to
confess that I did the testing only in the normal conditions by the patch.

Sorry for the delay.  I took a vacation until yesterday.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/14504.1428446...@sss.pgh.pa.us
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 88,93  typedef struct PgFdwRelationInfo
--- 88,95 
   *
   * 1) SELECT statement text to be sent to the remote server
   * 2) Integer list of attribute numbers retrieved by the SELECT
+  * 3) SELECT statement text to be sent to the remote server
+  * 4) Integer list of attribute numbers retrieved by the SELECT
   *
   * These items are indexed with the enum FdwScanPrivateIndex, so an item
   * can be fetched with list_nth().  For example, to get the SELECT statement:
***
*** 98,104  enum FdwScanPrivateIndex
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs
  };
  
  /*
--- 100,110 
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs,
! 	/* SQL statement to execute remotely (as a String node) */
! 	FdwScanPrivateSelectSql2,
! 	/* Integer list of attribute numbers retrieved by SELECT */
! 	FdwScanPrivateRetrievedAttrs2
  };
  
  /*
***
*** 186,191  typedef struct PgFdwModifyState
--- 192,223 
  } PgFdwModifyState;
  
  /*
+  * Execution state for fetching/locking foreign rows.
+  */
+ typedef struct PgFdwFetchState
+ {
+ 	Relation	rel;			/* relcache entry for the foreign table */
+ 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
+ 
+ 	/* for remote query execution */
+ 	PGconn	   *conn;			/* connection for the fetch */
+ 	char	   *p_name;			/* name of prepared statement, if created */
+ 
+ 	/* extracted fdw_private data */
+ 	char	   *query;			/* text of SELECT command */
+ 	List	   *retrieved_attrs;	/* attr numbers retrieved by SELECT */
+ 
+ 	/* info about parameters for prepared statement */
+ 	int			p_nums;			/* number of parameters to transmit */
+ 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
+ 
+ 	HeapTuple	locked_tuple;
+ 
+ 	/* working memory context */
+ 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+ } PgFdwFetchState;
+ 
+ /*
   * Workspace for analyzing a foreign table.
   */
  typedef struct PgFdwAnalyzeState
***
*** 276,281  static TupleTableSlot *postgresExecForeignDelete(EState *estate,
--- 308,320 
  static void postgresEndForeignModify(EState *estate,
  		 ResultRelInfo *resultRelInfo);
  static int	postgresIsForeignRelUpdatable(Relation rel);
+ static RowMarkType postgresGetForeignRowMarkType(LockClauseStrength strength);
+ static bool postgresLockForeignRow(EState *estate,
+    ExecRowMark *erm,
+    ItemPointer tupleid);
+ static HeapTuple

Re: [HACKERS] "unaddressable bytes" in BRIN

2015-05-11 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> What I think this means is that during an index build
> brin_minmax_add_value() called numeric_lt() which detoasted one of its
> input values; later, brin_doinsert() inserts a tuple containing the
> value, but it tries to use more bytes than were allocated.  I haven't
> had time to actually study what is going on here, but wanted to archive
> this publicly.  (Value detoasting evidently plays a role here, but I
> don't know how.)

I went ahead and added this to the 9.5 open items list.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] multixacts woes

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 12:56 AM, Noah Misch  wrote:
> On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote:
>> On Sun, May 10, 2015 at 1:40 PM, Noah Misch  wrote:
>> > I don't know whether this deserves prompt remediation, but if it does, I 
>> > would
>> > look no further than the hard-coded 25% figure.  We permit users to operate
>> > close to XID wraparound design limits.  GUC maximums force an 
>> > anti-wraparound
>> > vacuum at no later than 93.1% of design capacity.  XID assignment warns at
>> > 99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
>> > pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
>> > 50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
>> > bulkiest mandatory cushion yet, an anti-wraparound vacuum when
>> > pg_multixact/members is just 25% full.
>>
>> That's certainly one possible approach.  I had discounted it because
>> you can't really get more than a small multiple out of it, but getting
>> 2-3x more room might indeed be enough to help some people quite a bit.
>> Just raising the threshold from 25% to say 40% would buy back a
>> healthy amount.
>
> Right.  It's fair to assume that the new VACUUM burden would be discountable
> at a 90+% threshold, because the installations that could possibly find it
> expensive are precisely those experiencing corruption today.  These reports
> took eighteen months to appear, whereas some corruption originating in commit
> 0ac5ad5 saw reports within three months.  Therefore, sites burning
> pg_multixact/members proportionally faster than both pg_multixact/offsets and
> XIDs must be unusual.  Bottom line: if we do need to reduce VACUUM burden
> caused by the commits you cited upthread, we almost certainly don't need more
> than a 4x improvement.

I looked into the approach of adding a GUC called
autovacuum_multixact_freeze_max_members to set the threshold.  I
thought to declare it this way:

{
+   {"autovacuum_multixact_freeze_max_members",
PGC_POSTMASTER, AUTOVACUUM,
+   gettext_noop("# of multixact members at which
autovacuum is forced to prevent multixact member wraparound."),
+   NULL
+   },
+   &autovacuum_multixact_freeze_max_members,
+   20, 1000, 40,
+   NULL, NULL, NULL
+   },

Regrettably, I think that's not going to work, because 40
overflows int.  We will evidently need to denote this GUC in some
other units, unless we want to introduce config_int64.

Given your concerns, and the need to get a fix for this out the door
quickly, what I'm inclined to do for the present is go bump the
threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without
changing anything else.  Your analysis shows that this is more in line
with the existing policy for multixact IDs than what I did, and it
will reduce the threat of frequent wraparound scans.  Now, it will
also increase the chances of somebody hitting the wall before
autovacuum can bail them out.  But maybe not that much.  If we need
75% of the multixact member space to complete one cycle of
anti-wraparound vacuums, we're actually very close to the point where
the system just cannot work.  If that's one big table, we're done.

Also, if somebody does have a workload where the auto-clamping doesn't
provide them with enough headroom, they can still improve things by
reducing autovacuum_multixact_freeze_max_age to a value less than the
value to which we're auto-clamping it.  If they need an effective
value of less than 10 million they are out of luck, but if that is the
case then there is a good chance that they are hosed anyway - an
anti-wraparound vacuum every 10 million multixacts sounds awfully
painful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-11 Thread Robert Haas
On Sun, May 10, 2015 at 11:07 PM, Andres Freund  wrote:
> On 2015-05-10 22:51:33 -0400, Robert Haas wrote:
>> > And there's definitely some things
>> > around that pretty much only still exist because changing them would
>> > break too much stuff.
>>
>> Such as what?
>
> Without even thinking about it:
> * linitial vs lfirst vs lnext. That thing still induces an impedance
>   mismatch when reading code for me, and I believe a good number of
>   other people.
> * Two 'string buffer' APIs with essentially only minor differences.
> * A whole bunch of libpq APIs. Admittedly that's a bit more exposed than
>   lots of backend only things.
> * The whole V0 calling convention that makes it so much easier to get
>   odd crashes.
>
> Admittedly that's all I could come up without having to think. But I do
> vaguely remember a lot of things we did not do because of bwcompat
> concerns.

I see your point, but I don't think it really detracts from mine.  The
fact that we have a few inconsistently-named list functions is not
preventing any core development project that would otherwise get
completed to instead not get completed.  Nor is any of that other
stuff, except maybe the libpq API, but that's a lot (not just a bit)
more exposed.

Also, I'd actually be in favor of looking for a way to identify the
StringInfo and PQexpBuffer stuff - and of partially deprecating the V0
calling convention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-05-11 Thread Abhijit Menon-Sen
Hi Andres.

I've attached an updated patch for pgstatbloat, as well as a patch to
replace two uses of BuildTupleFromCStrings() elsewhere in pgstattuple.

I've made the changes you mentioned in your earlier mail, except that I
have not changed the name pending further suggestions about what would
be the best name.

Also:

At 2015-05-09 15:36:49 +0530, a...@2ndquadrant.com wrote:
>
> At 2015-05-09 02:20:51 +0200, and...@anarazel.de wrote:
> >
> > I haven't checked, but I'm not sure that it's safe/meaningful to
> > call PageGetHeapFreeSpace() on a new page.
> 
> OK, I'll check and fix if necessary.

You're right, PageGetHeapFreeSpace() isn't safe on a new page. I've
added a guard to that call in the attached patch, but I'm not sure
that's the right thing to do.

Should I copy the orphaned new-page handling from lazy_scan_heap? What
about for empty pages? Neither feels like a very good thing to do, but
then neither does skipping the new page silently. Should I count the
space it would have free if it were initialised, but leave the page
alone for VACUUM to deal with? Or just leave it as it is?

-- Abhijit
>From 421267bba8394255feed8f9b9424d25d0be9f979 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Mon, 11 May 2015 15:54:48 +0530
Subject: Make pgstattuple use heap_form_tuple instead of
 BuildTupleFromCStrings

---
 contrib/pgstattuple/pgstatindex.c | 43 ++---
 contrib/pgstattuple/pgstattuple.c | 45 ++-
 2 files changed, 37 insertions(+), 51 deletions(-)

diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index a2ea5d7..608f729 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -257,7 +257,8 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 	{
 		TupleDesc	tupleDesc;
 		int			j;
-		char	   *values[10];
+		Datum		values[10];
+		bool		nulls[10];
 		HeapTuple	tuple;
 
 		/* Build a tuple descriptor for our result type */
@@ -265,33 +266,31 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 			elog(ERROR, "return type must be a row type");
 
 		j = 0;
-		values[j++] = psprintf("%d", indexStat.version);
-		values[j++] = psprintf("%d", indexStat.level);
-		values[j++] = psprintf(INT64_FORMAT,
-			   (indexStat.root_pages +
-indexStat.leaf_pages +
-indexStat.internal_pages +
-indexStat.deleted_pages +
-indexStat.empty_pages) * BLCKSZ);
-		values[j++] = psprintf("%u", indexStat.root_blkno);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.internal_pages);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.leaf_pages);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.empty_pages);
-		values[j++] = psprintf(INT64_FORMAT, indexStat.deleted_pages);
+		values[j++] = Int32GetDatum(indexStat.version);
+		values[j++] = Int32GetDatum(indexStat.level);
+		values[j++] = Int64GetDatum((indexStat.root_pages +
+	 indexStat.leaf_pages +
+	 indexStat.internal_pages +
+	 indexStat.deleted_pages +
+	 indexStat.empty_pages) * BLCKSZ);
+		values[j++] = Int32GetDatum(indexStat.root_blkno);
+		values[j++] = Int32GetDatum(indexStat.internal_pages);
+		values[j++] = Int32GetDatum(indexStat.leaf_pages);
+		values[j++] = Int32GetDatum(indexStat.empty_pages);
+		values[j++] = Int32GetDatum(indexStat.deleted_pages);
 		if (indexStat.max_avail > 0)
-			values[j++] = psprintf("%.2f",
-   100.0 - (double) indexStat.free_space / (double) indexStat.max_avail * 100.0);
+			values[j++] = Float8GetDatum(100.0 - (double) indexStat.free_space / (double) indexStat.max_avail * 100.0);
 		else
-			values[j++] = pstrdup("NaN");
+			values[j++] = CStringGetDatum("NaN");
 		if (indexStat.leaf_pages > 0)
-			values[j++] = psprintf("%.2f",
-   (double) indexStat.fragments / (double) indexStat.leaf_pages * 100.0);
+			values[j++] = Float8GetDatum((double) indexStat.fragments / (double) indexStat.leaf_pages * 100.0);
 		else
-			values[j++] = pstrdup("NaN");
+			values[j++] = CStringGetDatum("NaN");
 
-		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
-	   values);
+		for (j = 0; j < 10; j++)
+			nulls[j] = false;
 
+		tuple = heap_form_tuple(tupleDesc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
 	}
 
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index c3a8b1d..552f188 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -85,28 +85,20 @@ static Datum
 build_pgstattuple_type(pgstattuple_type *stat, FunctionCallInfo fcinfo)
 {
 #define NCOLUMNS	9
-#define NCHARS		32
 
 	HeapTuple	tuple;
-	char	   *values[NCOLUMNS];
-	char		values_buf[NCOLUMNS][NCHARS];
+	Datum		values[NCOLUMNS];
+	bool		nulls[NCOLUMNS];
 	int			i;
 	double		tuple_percent;
 	double		dead_tuple_percent;
 	double		free_percent;	/* free/reusable space in % */
 	TupleDesc	tupdesc;
-	AttInMetadata *attinmeta;
 
 	/* Build a tuple descriptor for our