date:20150703

Re: [HACKERS] Determine operator from it's function

2015-07-03 Thread Tom Lane

Jim Nasby  writes:
> On 7/3/15 2:33 AM, Heikki Linnakangas wrote:
>> On 07/03/2015 01:20 AM, Jim Nasby wrote:
>>> Is there a way to determine the operator that resulted in calling the
>>> operator function? I thought fcinfo->flinfo->fn_expr might get set to
>>> the OpExpr, but seems it can be a FuncExpr even when called via an
>>> operator...

>> Don't think there is. Why do you need to know?

> I'd like to support arbitrary operators in variant.

Why would you expect there to be multiple operators pointing at the same
function?  If there were multiple operators pointing at the same function,
why would you need to distinguish them?  ISTM that such a situation would
necessarily mean that there was no distinction worthy of notice.

(The particular situation you are bitching about comes from the fact that
eval_const_expressions's simplify_functions code deliberately ignores any
distinction between operators and functions.  But for its purposes, that
is *correct*, and I will strongly resist any claim that it isn't.  If you
are unhappy then you defined your operators wrongly.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More logging for autovacuum

2015-07-03 Thread Amit Kapila

On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh  wrote:
>
> On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera <
alvhe...@commandprompt.com> wrote:
>>
>> Gregory Stark wrote:
>> >
>> > I'm having trouble following what's going on with autovacuum and I'm
finding
>> > the existing logging insufficient. In particular that it's only
logging vacuum
>> > runs *after* the vacuum finishes makes it hard to see what vacuums are
running
>> > at any given time. Also, I want to see what is making autovacuum
decide to
>> > forgo vacuuming a table and the log with that information is at DEBUG2.
>>
>> So did this idea go anywhere?
>
>
> Assuming the thread stopped here, I'd like to rekindle the proposal.
>
> log_min_messages acts as a single gate for everything headed for the
server logs; controls for per-background process logging do not exist. If
one wants to see DEBUG/INFO messages for just the Autovacuum (or
checkpointer, bgwriter, etc.), they have to set log_min_messages to that
level, but the result would be a lot of clutter from other processes to
grovel through, to see the messages of interest.
>

I think that will be quite helpful.  During the patch development of
parallel sequential scan, it was quite helpful to see the LOG messages
of bgworkers, however one of the recent commits (91118f1a) have
changed those to DEBUG1, now if I have to enable all DEBUG1
messages, then what I need will be difficult to find in all the log
messages.
Having control of separate logging for background tasks will serve such
a purpose.

> The facilities don't yet exist, but it'd be nice if such parameters when
unset (ie NULL) pick up the value of log_min_messages. So by default, the
users will get the same behaviour as today, but can choose to tweak per
background-process logging when needed.
>

Will this proposal allow user to see all the messages by all the background
workers or will it be per background task.  Example in some cases user
might want to see the messages by all bgworkers and in some cases one
might want to see just messages by AutoVacuum and it's workers.

I think here designing User Interface is an important work if others also
agree
that such an option is useful, could you please elaborate your proposal?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"

2015-07-03 Thread Jan de Visser

On July 3, 2015 06:21:09 PM Tom Lane wrote:
> I wonder whether we should consider inventing similar views for
> pg_hba.conf and pg_ident.conf.

(Apologies for the flurry of emails).

Was there not an attempt at a view for pg_hba.conf which ended in a lot of 
bikeshedding and no decisions?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"

2015-07-03 Thread Jan de Visser

On July 3, 2015 09:24:36 PM Jan de Visser wrote:
> On July 3, 2015 06:21:09 PM Tom Lane wrote:
> > BTW, this version of this patch neglects to update the comments in
> > miscadmin.h, and it makes the return convention for
> > ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy
> > and inconsistency in the comments is a symptom of that.  I didn't read it
> > in enough detail to say whether there are other problems.
> 
> OK, miscadmin.h. I'll go and look what that's all about. And would it make
> sense to find a better solution for the ProcessConfigFileInternal return
> value (which is convoluted, I agree - I went for the solution with the
> least impact on existing code), or should I improve documentation?
> 

Heh. I actually touched that file. I completely missed those comments (or saw 
them, thought that I should update them, and then forgot about them - just as 
likely). I'll obviously fix this if we carry this to completion.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"

2015-07-03 Thread Jan de Visser

On July 3, 2015 06:21:09 PM Tom Lane wrote:
> Jan de Visser  writes:
> > Attached a new patch, rebased against the current head. Errors in
> > pg_hba.conf and pg_ident.conf are now also noticed.
> > 
> > I checked the documentation for pg_ctl reload, and the only place where
> > it's explained seems to be runtime.sgml and that description is so
> > high-level that adding this new bit of functionality wouldn't make much
> > sense.
> 
> BTW, it's probably worth pointing out that the recent work on the
> pg_file_settings view has taken away a large part of the use-case for
> this, in that you can find out more with less risk by inspecting
> pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and
> hoping you didn't break anything too nastily.  Also, you can use
> pg_file_settings remotely, unlike pg_ctl (though admittedly you
> still need contrib/adminpack or something to allow uploading a
> new config file if you're doing remote admin).
> 
> I wonder whether we should consider inventing similar views for
> pg_hba.conf and pg_ident.conf.
> 
> While that's not necessarily a reason not to adopt this patch anyway,
> it does mean that we should think twice about whether we need to add
> a couple of hundred lines for a facility that's less useful than
> what we already have.

Since you were the one proposing the feature, I'm going to leave it to you 
whether or not I should continue with this. I have no use for this feature; 
for me it just seemed like a nice bite-sized feature to get myself acquainted 
with the code base and the development process. As far as I'm concerned that 
goal has already been achieved, even though finalizing a patch towards commit 
(and having my name in the release notes ha ha) would be the icing on the 
cake.

> 
> BTW, this version of this patch neglects to update the comments in
> miscadmin.h, and it makes the return convention for
> ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy
> and inconsistency in the comments is a symptom of that.  I didn't read it
> in enough detail to say whether there are other problems.

OK, miscadmin.h. I'll go and look what that's all about. And would it make 
sense to find a better solution for the ProcessConfigFileInternal return value 
(which is convoluted, I agree - I went for the solution with the least impact 
on existing code), or should I improve documentation?

> 
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Michael Paquier

On Sat, Jul 4, 2015 at 2:44 AM, Andres Freund wrote:
> This subthread is getting absurd, stopping here.

Yeah, I agree with Andres here, we are making a mountain of nothing
(Frenglish?). I'll send to the other thread some additional ideas soon
using a JSON structure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-03 Thread Noah Misch

On Wed, Jul 01, 2015 at 03:22:33AM -0400, Noah Misch wrote:
> On Tue, Jun 30, 2015 at 09:45:08AM -0400, Tom Lane wrote:
> > Noah Misch  writes:
> > > On Sun, Jun 28, 2015 at 07:00:14PM -0400, Tom Lane wrote:
> > >> Another idea would be to make a test during postmaster start to see
> > >> if this bug exists, and fail if so.  I'm generally on board with the
> > >> thought that we don't need to work on systems with such a bad bug,
> > >> but it would be a good thing if the failure was clean and produced
> > >> a helpful error message, rather than looking like a Postgres bug.
> > 
> > > Failing cleanly on unpatched Solaris is adequate, agreed.  A check at
> > > postmaster start isn't enough, because the postmaster might run in the C
> > > locale while individual databases or collations use problem locales.  The
> > > safest thing is to test after every setlocale(LC_COLLATE) and
> > > newlocale(LC_COLLATE).  That's once at backend start and once per backend 
> > > per
> > > collation used, more frequent than I would like.  Hmm.

Solaris does not have locale_t or strxfrm_l(), so per-collation checks aren't
relevant after all.  Checking at postmaster start and backend start seems
cheap enough, hence this patch.
commit 17d0abf (HEAD)
Author: Noah Misch 
AuthorDate: Fri Jul 3 19:59:50 2015 -0400
Commit: Noah Misch 
CommitDate: Fri Jul 3 19:59:50 2015 -0400

Revoke support for strxfrm() that write past the specified array length.

This formalizes a decision implicit in commit
4ea51cdfe85ceef8afabceb03c446574daa0ac23 and adds clean detection of
affected systems.  Vendor updates are available for each such known bug.
Back-patch to 9.5, where the aforementioned commit first appeared.

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 2ecadd6..4fad6f3 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -149,6 +149,8 @@ main(int argc, char *argv[])
 */
unsetenv("LC_ALL");
 
+   check_strxfrm_bug();
+
/*
 * Catch standard options before doing much else, in particular before 
we
 * insist on not being root.
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 84215e0..d91959e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -855,6 +855,64 @@ IsoLocaleName(const char *winlocname)
 
 
 /*
+ * Detect aging strxfrm() implementations that, in a subset of locales, write
+ * past the specified buffer length.  Affected users must update OS packages
+ * before using PostgreSQL 9.5 or later.
+ *
+ * Assume that the bug can come and go from one postmaster startup to another
+ * due to physical replication among diverse machines.  Assume that the bug's
+ * presence will not change during the life of a particular postmaster.  Given
+ * those assumptions, call this no less than once per postmaster startup per
+ * LC_COLLATE setting used.  No known-affected system offers strxfrm_l(), so
+ * there is no need to consider pg_collation locales.
+ */
+void
+check_strxfrm_bug(void)
+{
+   charbuf[32];
+   const int   canary = 0x7F;
+   boolok = true;
+
+   /*
+* Given a two-byte ASCII string and length limit 7, 8 or 9, Solaris 10
+* 05/08 returns 18 and modifies 10 bytes.  It respects limits above or
+* below that range.
+*
+* The bug is present in Solaris 8 as well; it is absent in Solaris 10
+* 01/13 and Solaris 11.2.  Affected locales include is_IS.ISO8859-1,
+* en_US.UTF-8, en_US.ISO8859-1, and ru_RU.KOI8-R.  Unaffected locales
+* include de_DE.UTF-8, de_DE.ISO8859-1, zh_TW.UTF-8, and C.
+*/
+   buf[7] = canary;
+   (void) strxfrm(buf, "ab", 7);
+   if (buf[7] != canary)
+   ok = false;
+
+   /*
+* illumos bug #1594 was present in the source tree from 2010-10-11 to
+* 2012-02-01.  Given an ASCII string of any length and length limit 1,
+* affected systems ignore the length limit and modify a number of bytes
+* one less than the return value.  The problem inputs for this bug do 
not
+* overlap those for the Solaris bug, hence a distinct test.
+*
+* Affected systems include smartos-20110926T021612Z.  Affected locales
+* include en_US.ISO8859-1 and en_US.UTF-8.  Unaffected locales include 
C.
+*/
+   buf[1] = canary;
+   (void) strxfrm(buf, "a", 1);
+   if (buf[1] != canary)
+   ok = false;
+
+   if (!ok)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYSTEM_ERROR),
+errmsg_internal("strxfrm(), in locale \"%s\", 
writes past the specified array length",
+
setlocale(LC_COLLATE, NULL)),
+errhint("Apply system library package 
updates.")));
+}
+
+
+/*
  * Cach

[HACKERS] More work on SortSupport for text - strcoll() and strxfrm() caching

2015-07-03 Thread Peter Geoghegan

Since apparently we're back to development work, I thought it was time
to share a patch implementing a few additional simple tricks to make
sorting text under a non-C locale even faster than in 9.5. These
techniques are mostly effective when values are physically clustered
together. This might be because there is a physical/logical
correlation, but cases involving any kind of clustering of values are
helped significantly.

Caching
==

The basic idea is that we cache strxfrm() blobs. Separately, we
exploit temporal locality and clustering of values by caching the
result of the most recent strcoll()-resolved comparison performed. The
strxfrm() technique helps a lot with low cardinality single attribute
sorts if we can avoid most strxfrm() work. On the other hand,
strcoll() comparison caching particularly helps with multi-attribute
sorts where there is a low to moderate cardinality secondary attribute
and low cardinality leading attribute. The master branch will still
opportunistically take the equality memcmp() fastpath plenty of times
for that second attribute, but there are no abbreviated keys to help
when that doesn't work out (because it isn't the leading attribute).

Regressions
==

The patch still helps with strcoll() avoidance when the ordering of a
moderate cardinality attribute is totally random, but it helps much
less there. I have not seen a regression for any case. I'm expecting
someone to ask me to do something with the program I wrote last year,
to prove the opportunistic memcmp() equality fastpath for text is
"free" [1]. This patch has exactly the same tension as last year's
memcmp() equality one [2]: I add something opportunistic, that in
general might consistently not work out at all in some cases, and on
the face of it implies extra costs for those cases -- costs which must
be paid every single time. So as with the opportunistic memcmp()
equality thing, the *actual* overhead for cases that do not benefit
must be virtually zero for the patch to be worthwhile. That is the
standard that I expect that this patch will be held to, too.

Benchmark
=

The query that I've been trying this out with is a typical rollup
query, using my "cities" sample data [3] (this is somewhat, although
not perfectly correlated on (country, province) before sorting):

postgres=# select country, province, count(*) from cities group by
rollup (country, province);
   country   |   province
  | count
-+---+
 Afghanistan | Badaẖšan
   |  5
 Afghanistan | Bādgīs
  |  2
 Afghanistan | Baġlān
  |  5
 Afghanistan | Balẖ
   |  6
 Afghanistan | Bāmiyān
  |  3
 Afghanistan | Farāh
  |  3
 Afghanistan | Fāryāb
  |  4
 Afghanistan | Ġawr
  |  3

*** SNIP *
...

 Zimbabwe| Manicaland
  | 22
 Zimbabwe| Mashonaland Central
  | 13
 Zimbabwe| Mashonaland East
  |  9
 Zimbabwe| Mashonaland West
  | 21
 Zimbabwe| Masvingo
  | 11
 Zimbabwe| Matabeleland North
  |  8
 Zimbabwe| Matabeleland South
  | 14
 Zimbabwe| Midlands
  | 14
 Zimbabwe| [null]
  |116
 [null]  | [null]
  | 317102
(3529 rows)

With master, this takes about 525ms when it stabilizes after a few
runs on my laptop. With the patch, it takes about 405ms. That's almost
a 25% reduction in total run time. If I perform a more direct test of
sort performance against this data with minimal non-sorting overhead,
I see a reduction of as much as 30% in total query runtime (I chose
this rollup query because it is obviously representative of the real
world).

If this data is *perfectly* correlated (e.g. because someone ran
CLUSTER) and some sort can use the dubious "bubble sort best case"
path [4] that we added to qsort back in 2006, the improvement still
hold up at ~20%, I've found.

Performance of the "C" locale
---

For this particular rollup query, my 25% improvement leaves the
collated text sort perhaps marginally faster than an equivalent query
that uses the "C" locale (with or without the patch applied). It's
hard to be sure that that effect is real -- many trials are needed --
but it's reasonable to speculate that it's possible to sometimes beat
the "C" locale because of factors like final abbreviated key
cardinality.

It's easy to *contrive* a case where the "C" locale is beaten even
with 9.5 -- just

Re: [HACKERS] Determine operator from it's function

2015-07-03 Thread Jim Nasby


On 7/3/15 2:33 AM, Heikki Linnakangas wrote:

On 07/03/2015 01:20 AM, Jim Nasby wrote:

Is there a way to determine the operator that resulted in calling the
operator function? I thought fcinfo->flinfo->fn_expr might get set to
the OpExpr, but seems it can be a FuncExpr even when called via an
operator...


Don't think there is. Why do you need to know?


I'd like to support arbitrary operators in variant. I did initial 
testing and it looked like I was getting an OpExpr in fn_expr, but I 
think that's because I was using a real table to test with. When I do 
something like 'a' < 'b' it looks like the operator gets written out of 
the plan. If that's indeed what's happening is there a hook I could use 
to change that behavior?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

2015-07-03 Thread David Rowley

On 3 July 2015 at 20:49, David Rowley  wrote:

> On 3 July 2015 at 20:06, Andres Freund  wrote:
>
>>
>> I've tested the patch just connecting to a database running on localhost
> and I'm not getting much of a speedup. Perhaps 1%, if that's not noise. I
> don't have enough hardware here to have client and server on separate
> machines, at least not with a stable network that goes through copper.
>
> Here's the results.
>

I forgot to mention that I was running windows 8.1 64 bit directly on
hardware. No VM.

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Andres Freund

On 2015-07-03 18:38:37 -0400, Tom Lane wrote:
> > Why exactly?  The first truncation in the (sub)xact would have assigned a
> new relfilenode, why do we need another one?  The file in question will
> go away on crash/rollback in any case, and no other transaction can see
> it yet.

Consider:

BEGIN;
CREATE TABLE;
INSERT largeval;
TRUNCATE;
INSERT 1;
COPY;
INSERT 2;
COMMIT;

INSERT 1 is going to be WAL logged. For that to work correctly TRUNCATE
has to be WAL logged, as otherwise there'll be conflicting/overlapping
tuples on the target page.

But:

The truncation itself is not fully wal logged, neither is the COPY. Both
rely on heap_sync()/immedsync(). For that to be correct the current
relfilenode's truncation may *not* be wal-logged, because the contents
of the COPY or the truncation itself will only be on-disk, not in the
WAL.

Only being on-disk but not in the WAL is a problem if we crash and
replay the truncate record.

> I'm prepared to believe that some bit of logic is doing the wrong
> thing in this state, but I do not agree that truncate-in-place is
> unworkable.

Unless we're prepared to make everything that potentially WAL logs
something do the rel->rd_createSubid == mySubid && dance, I can't see
that working.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Tom Lane

Andres Freund  writes:
> On 2015-07-03 19:26:05 +0200, Andres Freund wrote:
>> commit cab9a0656c36739f59277b34fea8ab9438395869
>> Author: Tom Lane 
>> Date:   Sun Aug 23 19:23:41 2009 +
>> 
>> Make TRUNCATE do truncate-in-place when processing a relation that was 
>> created
>> or previously truncated in the current (sub)transaction.  This is safe since
>> if the (sub)transaction later rolls back, we'd just discard the rel's current
>> physical file anyway.  This avoids unreasonable growth in the number of
>> transient files when a relation is repeatedly truncated.  Per a performance
>> gripe a couple weeks ago from Todd Cook.
>> 
>> to me the reasoning here looks flawed.

> It looks to me we need to re-neg on this a bit. I think we can still be
> more efficient than the general codepath: We can drop the old
> relfilenode immediately. But pg_class.relfilenode has to differ from the
> old after the truncation.

Why exactly?  The first truncation in the (sub)xact would have assigned a
new relfilenode, why do we need another one?  The file in question will
go away on crash/rollback in any case, and no other transaction can see
it yet.

I'm prepared to believe that some bit of logic is doing the wrong thing in
this state, but I do not agree that truncate-in-place is unworkable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Andres Freund

On 2015-07-03 19:26:05 +0200, Andres Freund wrote:
> On 2015-07-03 19:02:29 +0200, Andres Freund wrote:
> > Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm
> > right now missing how the whole "skip wal logging if relation has just
> > been truncated" optimization can ever actually be crashsafe unless we
> > use a new relfilenode (which we don't!).
> 
> We actually used to use a different relfilenode, but optimized that
> away: cab9a0656c36739f59277b34fea8ab9438395869
> 
> commit cab9a0656c36739f59277b34fea8ab9438395869
> Author: Tom Lane 
> Date:   Sun Aug 23 19:23:41 2009 +
> 
> Make TRUNCATE do truncate-in-place when processing a relation that was 
> created
> or previously truncated in the current (sub)transaction.  This is safe 
> since
> if the (sub)transaction later rolls back, we'd just discard the rel's 
> current
> physical file anyway.  This avoids unreasonable growth in the number of
> transient files when a relation is repeatedly truncated.  Per a 
> performance
> gripe a couple weeks ago from Todd Cook.
> 
> to me the reasoning here looks flawed.

It looks to me we need to re-neg on this a bit. I think we can still be
more efficient than the general codepath: We can drop the old
relfilenode immediately. But pg_class.relfilenode has to differ from the
old after the truncation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"

2015-07-03 Thread Tom Lane

Jan de Visser  writes:
> Attached a new patch, rebased against the current head. Errors in
> pg_hba.conf and pg_ident.conf are now also noticed.

> I checked the documentation for pg_ctl reload, and the only place where
> it's explained seems to be runtime.sgml and that description is so
> high-level that adding this new bit of functionality wouldn't make much
> sense.

BTW, it's probably worth pointing out that the recent work on the
pg_file_settings view has taken away a large part of the use-case for
this, in that you can find out more with less risk by inspecting
pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and
hoping you didn't break anything too nastily.  Also, you can use
pg_file_settings remotely, unlike pg_ctl (though admittedly you
still need contrib/adminpack or something to allow uploading a
new config file if you're doing remote admin).

I wonder whether we should consider inventing similar views for
pg_hba.conf and pg_ident.conf.

While that's not necessarily a reason not to adopt this patch anyway,
it does mean that we should think twice about whether we need to add
a couple of hundred lines for a facility that's less useful than
what we already have.

BTW, this version of this patch neglects to update the comments in
miscadmin.h, and it makes the return convention for
ProcessConfigFileInternal completely unintelligible IMO; the inaccuracy
and inconsistency in the comments is a symptom of that.  I didn't read it
in enough detail to say whether there are other problems.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace

2015-07-03 Thread Tom Lane

Andreas Karlsson  writes:
> On 03/21/2015 01:19 PM, Julien Tachoires wrote:
>>> I am confused by your fix. Wouldn't cleaner fix be to use
>>> tbinfo->reltablespace rather than tbinfo->reltoasttablespace when
>>> calling ArchiveEntry()?

>> Yes, doing this that way is cleaner. Here is a new version including
>> your fix. Thanks.

> I am now satisfied with how the patch looks. Thanks for your work. I 
> will mark this as ready for committeer now but not expect any committer 
> to look at it until the commitfest starts.

I have just looked through this thread, and TBH I think we should reject
this patch altogether --- not RWF, but "no we don't want this".  The
use-case remains hypothetical: no performance numbers showing a real-world
benefit have been exhibited AFAICS.  And allowing a toast table to be in a
different tablespace from its parent opens up a host of corner cases that,
despite the many revisions of the patch so far, probably haven't all been
addressed yet.  (For instance, I wonder whether pg_upgrade copes with
toast tables in non-parent tablespaces.)

I regret the idea of wasting all the work that's been poured into this,
but I think pushing this patch forward will just waste even more time,
now and in the future, for benefit that will be at most marginal.

If any committers nonetheless want to take this up, be advised that it's
far from committable as-is.  Here are some notes just from looking at
the pg_dump part of the patch:

* Addition in pg_backup_archiver.c seems pretty dubious; we don't
handle --no-tablespace that way for other cases.

* Why is getTables() collecting toast tablespace only from latest-model
servers?  This will likely lead to doing the wrong thing (ie, dumping
incorrect "ALTER SET TOAST TABLESPACE pg_default" commands) when dumping
from an older server.

* dumpTOASTTablespace (man, that's an ugly choice of name ... camel
case combined with all-upper-case-words is a bad idea) neglects the
possibility that it needs to quote the tablespace name.

* Assorted random whitespace inconsistencies, only some of which would
be cleaned up by pgindent.

I've not studied the rest of the patch, but it would clearly need to
be gone over very carefully to get to a committable state.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rounding to even for numeric data type

2015-07-03 Thread Tom Lane

Michael Paquier  writes:
> On Sat, May 2, 2015 at 9:53 PM, Fabien COELHO wrote:
>> Quick review: patches applies, make check is fine, all is well.

> Thanks for the feedback, Fabien!

>> All the casting tests could be put in "numeric.sql", as there are all
>> related to numeric and that would avoid duplicating the values lists.

> Not sure about that, the tests are placed here to be consistent with
> for is done for float8.

>> For the documentation, I would also add 3.5 so that rounding to even is even
>> clearer:-)

> Good idea. I reworked the example in the docs.

Pushed with minor adjustments --- you missed updating
int8-exp-three-digits.out, and I thought the documentation wording
could be better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"

2015-07-03 Thread Jan de Visser

On Fri, Jul 3, 2015 at 4:47 PM, I wrote:

> Attached a new patch, rebased against the current head. Errors in
> pg_hba.conf and pg_ident.conf are now also noticed.
>
> I checked the documentation for pg_ctl reload, and the only place where
> it's explained seems to be runtime.sgml and that description is so
> high-level that adding this new bit of functionality wouldn't make much
> sense.
>

Um. Make that config.sgml.

Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"

2015-07-03 Thread Jan de Visser

Attached a new patch, rebased against the current head. Errors in
pg_hba.conf and pg_ident.conf are now also noticed.

I checked the documentation for pg_ctl reload, and the only place where
it's explained seems to be runtime.sgml and that description is so
high-level that adding this new bit of functionality wouldn't make much
sense.

On Sat, Apr 25, 2015 at 10:03 AM, Jan de Visser  wrote:

> On April 22, 2015 06:04:42 PM Payal Singh wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  not tested
> > Implements feature:   tested, failed
> > Spec compliant:   not tested
> > Documentation:tested, failed
> >
> > Error in postgresql.conf gives the expected result on pg_ctl reload,
> > although errors in pg_hba.conf file don't. Like before, reload completes
> > fine without any information that pg_hba failed to load and only
> > information is present in the log file. I'm assuming pg_ctl reload should
> > prompt user if file fails to load irrespective of which file it is -
> > postgresql.conf or pg_hba.conf.
>
> Will fix. Not hard, just move the code that writes the .pid file to after
> the
> pg_hba reload.
>
> >
> > There is no documentation change so far, but I guess that's not yet
> > necessary.
>
> I will update the page for pg_ctl. At least the behaviour of -w/-W in
> relation
> to the reload command needs explained.
>
> >
> > gmake check passed all tests.
> >
> > The new status of this patch is: Waiting on Author
>
> jan
>
>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index df8037b..909a078 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1235,6 +1235,15 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
+	 * Update postmaster.pid with startup time as the last reload time:
+	 */
+	{
+		char last_reload_info[32];
+		snprintf(last_reload_info, 32, "%ld %d", (long) MyStartTime, 1);
+		AddToDataDirLockFile(LOCK_FILE_LINE_LAST_RELOAD, last_reload_info);
+	}
+
+	/*
 	 * Remember postmaster startup time
 	 */
 	PgStartTime = GetCurrentTimestamp();
@@ -2349,6 +2358,8 @@ static void
 SIGHUP_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
+	boolreload_success;
+	charlast_reload_info[32];
 
 	PG_SETMASK(&BlockSig);
 
@@ -2356,7 +2367,8 @@ SIGHUP_handler(SIGNAL_ARGS)
 	{
 		ereport(LOG,
 (errmsg("received SIGHUP, reloading configuration files")));
-		ProcessConfigFile(PGC_SIGHUP);
+		reload_success = ProcessConfigFile(PGC_SIGHUP);
+
 		SignalChildren(SIGHUP);
 		SignalUnconnectedWorkers(SIGHUP);
 		if (StartupPID != 0)
@@ -2379,13 +2391,25 @@ SIGHUP_handler(SIGNAL_ARGS)
 			signal_child(PgStatPID, SIGHUP);
 
 		/* Reload authentication config files too */
-		if (!load_hba())
+		if (!load_hba()) {
+			reload_success = 0; 
 			ereport(WARNING,
 	(errmsg("pg_hba.conf not reloaded")));
+		}
 
-		if (!load_ident())
+		if (!load_ident()) {
+			reload_success = 0;
 			ereport(WARNING,
 	(errmsg("pg_ident.conf not reloaded")));
+		}
+
+		/*
+		 * Write the current time and the result of the reload to the
+		 * postmaster.pid file.
+		 */
+		snprintf(last_reload_info, 32, "%ld %d",
+(long) time(NULL), reload_success);
+		AddToDataDirLockFile(LOCK_FILE_LINE_LAST_RELOAD, last_reload_info);
 
 #ifdef EXEC_BACKEND
 		/* Update the starting-point file for future children */
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index 5b5846c..2f5537d 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -117,12 +117,13 @@ STRING			\'([^'\\\n]|\\.|\'\')*\'
  * If a hard error occurs, no values will be changed.  (There can also be
  * errors that prevent just one value from being changed.)
  */
-void
+bool
 ProcessConfigFile(GucContext context)
 {
 	int			elevel;
 	MemoryContext config_cxt;
 	MemoryContext caller_cxt;
+bool  ok;
 
 	/*
 	 * Config files are processed on startup (by the postmaster only) and on
@@ -153,16 +154,19 @@ ProcessConfigFile(GucContext context)
 	/*
 	 * Read and apply the config file.  We don't need to examine the result.
 	 */
-	(void) ProcessConfigFileInternal(context, true, elevel);
+ok = ProcessConfigFileInternal(context, true, elevel) != NULL;
 
 	/* Clean up */
 	MemoryContextSwitchTo(caller_cxt);
 	MemoryContextDelete(config_cxt);
+	return ok;
 }
 
 /*
  * This function handles both actual config file (re)loads and execution of
- * show_all_file_settings() (i.e., the pg_file_settings view).  In the latter
+ * show_all_file_settings() (i.e., the pg_file_settings view).  In the former
+ * case, the settings are applied and this function returns the ConfigVariable
+ * list when this is successful, or NULL when it is not.  In the latter
  * case we don't apply any of the settings, but we make all the usual validity
  * checks, and we return the ConfigVariable list so that it can be printed out
  * by show_all_file

Re: [HACKERS] Asynchronous execution on FDW

2015-07-03 Thread Heikki Linnakangas


On 07/02/2015 08:48 AM, Kyotaro HORIGUCHI wrote:

- It was a problem when to give the first kick for async exec. It
   is not in ExecInit phase, and ExecProc phase does not fit,
   too. An extra phase ExecPreProc or something is too
   invasive. So I tried "pre-exec callback".

   Any init-node can register callbacks on their turn, then the
   registerd callbacks are called just before ExecProc phase in
   executor. The first patch adds functions and structs to enable
   this.


At a quick glance, I think this has all the same problems as starting 
the execution at ExecInit phase. The correct way to do this is to kick 
off the queries in the first IterateForeignScan() call. You said that 
"ExecProc phase does not fit" - why not?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Tom Lane

Martijn van Oosterhout  writes:
> With inserts the WAL records look as follows (relfilenodes changed):
> ...
> And amazingly, the database cluster successfuly recovers and there's no
> error now.  So the problem is *only* because there is no data in the
> table at commit time.  Which indicates that it's the 'newroot" record
> that saves the day normally.  And it's apparently generated by the
> first insert.

Yeah, because the correct "empty" state of a btree index is to have a
metapage but no root page, so the first insert forces creation of a root
page.  And, by chance, btree_xlog_newroot restores the metapage from
scratch, so this works even if the metapage had been missing or corrupt.

However, things would still break if the first access to the index was
a read attempt rather than an insert.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Martijn van Oosterhout

On Fri, Jul 03, 2015 at 07:21:21PM +0200, Andres Freund wrote:
> On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote:
> > Am I missing something. ISTM that if the truncate record was simply not
> > logged at all everything would work fine. The whole point is that the
> > table was created in this transaction and so if it exists the table on
> > disk must be the correct representation.
> 
> That'd not work either. Consider:
> 
> BEGIN;
> CREATE TABLE ...
> INSERT;
> TRUNCATE;
> INSERT;
> COMMIT;
> 
> If you replay that without a truncation wal record the second INSERT
> will try to add stuff to already occupied space. And they can have
> different lengths and stuff, so you cannot just ignore that fact.

I was about to disagree with you by suggesting that if the table was
created in this transaction then WAL logging is skipped. But testing
shows that inserts are indeed logged, as you point out.

With inserts the WAL records look as follows (relfilenodes changed):

martijn@martijn-jessie:~/git/ctm/docker$ sudo 
/usr/lib/postgresql/9.4/bin/pg_xlogdump -p /tmp/pgtest/postgres/pg_xlog/ 
00010001 |grep -wE '16386|16384|16390'
rmgr: Storage len (rec/tot): 16/48, tx:  0, lsn: 
0/016A79C8, prev 0/016A79A0, bkp: , desc: file create: base/12139/16384
rmgr: Sequencelen (rec/tot):158/   190, tx:683, lsn: 
0/016B4258, prev 0/016B2508, bkp: , desc: log: rel 1663/12139/16384
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016B4318, prev 0/016B4258, bkp: , desc: file create: base/12139/16386
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016B9468, prev 0/016B9418, bkp: , desc: file create: base/12139/16390
rmgr: Sequencelen (rec/tot):158/   190, tx:683, lsn: 
0/016BC938, prev 0/016BC880, bkp: , desc: log: rel 1663/12139/16384
rmgr: Sequencelen (rec/tot):158/   190, tx:683, lsn: 
0/016BCAF0, prev 0/016BCAA0, bkp: , desc: log: rel 1663/12139/16384
rmgr: Heaplen (rec/tot): 35/67, tx:683, lsn: 
0/016BCBB0, prev 0/016BCAF0, bkp: , desc: insert(init): rel 
1663/12139/16386; tid 0/1
rmgr: Btree   len (rec/tot): 20/52, tx:683, lsn: 
0/016BCBF8, prev 0/016BCBB0, bkp: , desc: newroot: rel 1663/12139/16390; 
root 1 lev 0
rmgr: Btree   len (rec/tot): 34/66, tx:683, lsn: 
0/016BCC30, prev 0/016BCBF8, bkp: , desc: insert: rel 1663/12139/16390; tid 
1/1
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016BCC78, prev 0/016BCC30, bkp: , desc: file truncate: base/12139/16386 
to 0 blocks
rmgr: Storage len (rec/tot): 16/48, tx:683, lsn: 
0/016BCCA8, prev 0/016BCC78, bkp: , desc: file truncate: base/12139/16390 
to 0 blocks
rmgr: Heaplen (rec/tot): 35/67, tx:683, lsn: 
0/016BCCD8, prev 0/016BCCA8, bkp: , desc: insert(init): rel 
1663/12139/16386; tid 0/1
rmgr: Btree   len (rec/tot): 20/52, tx:683, lsn: 
0/016BCD20, prev 0/016BCCD8, bkp: , desc: newroot: rel 1663/12139/16390; 
root 1 lev 0
rmgr: Btree   len (rec/tot): 34/66, tx:683, lsn: 
0/016BCD58, prev 0/016BCD20, bkp: , desc: insert: rel 1663/12139/16390; tid 
1/1
 
   relname   | relfilenode 
-+-
 test|   16386
 test_id_seq |   16384
 test_pkey   |   16390
(3 rows)

And amazingly, the database cluster successfuly recovers and there's no
error now.  So the problem is *only* because there is no data in the
table at commit time.  Which indicates that it's the 'newroot" record
that saves the day normally.  And it's apparently generated by the
first insert.

> Agreed. I think the problem is something else though. Namely that we
> reuse the relfilenode for heap_truncate_one_rel(). That's just entirely
> broken afaics. We need to allocate a new relfilenode and write stuff
> into that. Then we can forgo WAL logging the truncation record.

Would that properly initialise the index though?

Anyway, this is way outside my expertise, so I'll bow out now. Let me
know if I can be of more assistance.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-03 Thread Tom Lane

Petr Korobeinikov  writes:
> Fixed. Now both \ev and \sv  numbering lines starting with "1". New version
> attached.

Applied with a fair amount of mostly-cosmetic adjustment.

> As I've already noticed that pg_get_viewdef() does not support full syntax
> of creating or replacing views.

Oh?  If that were true, pg_dump wouldn't work on such views.  It is kind
of a PITA for this purpose that it doesn't include the CREATE text for
you, but we're surely not changing that behavior now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix broken Install.bat when target directory contains a space

2015-07-03 Thread Heikki Linnakangas


On 04/21/2015 04:02 PM, Michael Paquier wrote:

On Tue, Apr 21, 2015 at 4:33 PM, Asif Naeem  wrote:


The v2 patch looks good to me, just a minor concern on usage message i.e.

C:\PG\postgresql\src\tools\msvc>install

Invalid command line options.
Usage: "install.bat  [installtype]"
installtype: client



It seems that there are two install options i.e. client, all (any other
string other than client is being considered or treated as all), the
following install command works i.e.

install C:\PG\postgresql\inst option_does_not_exist


As your patch effects this area of code, I thought to share these findings
with you,o BTW, it is a minor thing that can be handled in another patch,


Well, that's the same behavior that this script has been having for ages.
Let's just update the usage message to mention both "all" and "client". I
see no point in breaking a behavior that has been like that for ages, and
the main point of this patch is to fix the install path issue.


Hmm. Why is install.bat not like build.bat, i.e. just a thin wrapper 
that just calls install.pl, passing all arguments?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-03 Thread Petr Korobeinikov

пт, 3 июля 2015 г. в 19:30, Tom Lane :

> So AFAICS, \ev and \sv should just number lines straightforwardly, with
> "1" being the first line of the CREATE command text.  Am I missing
> something?
>

Fixed. Now both \ev and \sv  numbering lines starting with "1". New version
attached.

As I've already noticed that pg_get_viewdef() does not support full syntax
of creating or replacing views. In my opinion, psql source code isn't the
place where some formatting hacks should be. So, can you give me an idea
how to produce already formatted output supporting "WITH" statement without
breaking backward compatibility of pg_get_viewdef() internals?


psql-ev-sv-support-v6.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Andres Freund

On 2015-07-03 10:27:05 -0700, Josh Berkus wrote:
> On 07/03/2015 03:12 AM, Sawada Masahiko wrote:
> > Thanks. So we can choice the next master server using by checking the
> > progress of each server, if hot standby is enabled.
> > And a such procedure is needed even today replication.
> > 
> > I think that the #2 problem which is Josh pointed out seems to be solved;
> > 1. I need to ensure that data is replicated to X places.
> > 2. I need to *know* which places data was synchronously replicated
> > to when the master goes down.
> > And we can address #1 problem using quorum commit.
> 
> It's not solved. I still have zero ways of knowing if a replica was in
> sync or not at the time the master went down.

What?

You pick the standby that's furthest ahead. And you use a high enough
quorum so that given your tolerance for failures you'll always be able
to reach at least one of the synchronous replicas. Then you promote the
one with the highest LSN. Done.

This is something that gets *easier* by quorum, not harder.

> I forked the subject line because I think that the inability to
> identify synch replicas under failover conditions is a serious problem
> with synch rep *today*, and pretending that it doesn't exist doesn't
> help us even if we don't fix it in 9.6.

That's just not how failovers can sanely work. And again, you *have* the
information you can have on the standbys already. You *know* what/from
when the last replayed xact is.

> Let me give you three cases where our lack of information on the replica
> side about whether it thinks it's in sync or not causes synch rep to
> fail to protect data.  The first case is one I've actually seen in
> production, and the other two are hypothetical but entirely plausible.
> 
> Case #1: two synchronous replica servers have the application name
> "synchreplica".  An admin uses the wrong Chef template, and deploys a
> server which was supposed to be an async replica with the same
> recovery.conf template, and it ends up in the "synchreplica" group as
> well. Due to restarts (pushing out an update release), the new server
> ends up seizing and keeping sync. Then the master dies.  Because the new
> server wasn't supposed to be a sync replica in the first place, it is
> not checked; they just fail over to the furthest ahead of the two
> original synch replicas, neither of which was actually in synch.

Nobody can protect you against such configuration errors. We can make it
harder to misconfigure, sure, but it doesn't have anything to do with
the topic at hand.

> Case #2: "2 { local, london, nyc }" setup.  At 2am, the links between
> data centers become unreliable, such that the on-call sysadmin disables
> synch rep because commits on the master are intolerably slow.  Then, at
> 10am, the links between data centers fail entirely.  The day shift, not
> knowing that the night shift disabled sync, fail over to London thinking
> that they can do so with zero data loss.

As I said earlier, you can check against that today by checking the last
replayed timestamp. SELECT pg_last_xact_replay_timestamp();

You don't have to pick the one that used to be a sync replica. You pick
the one with the most data received.

If the day shift doesn't bother to check the standbys now, they'd not
check either if they had some way to check whether a node was the chosen
sync replica.

> Case #3 "1 { london, frankfurt }, 1 { sydney, tokyo }" multi-group
> priority setup.  We lose communication with everything but Europe.  How
> can we decide whether to wait to get sydney back, or to promote London
> immedately?

You normally don't continue automatically at all in that situation. To
avoid/minimize data loss you want to have a majority election system to
select the new primary. That requires reaching the majority of the
nodes. This isn't something specific to postgres, if you look at any
solution out there, they're also doing it that way.

Statically choosing which of the replicas in a group is the current sync
one is a *bad* idea. You want to ensure that at least node in a group
has received the data, and stop waiting as soon that's the case.

> It's an issue *now* that the only data we have about the state of sync
> rep is on the master, and dies with the master.   And it severely limits
> the actual utility of our synch rep.  People implement synch rep in the
> first place because the "best effort" of asynch rep isn't good enough
> for them, and yet when it comes to failover we're just telling them
> "give it your best effort".

We don't tell them that, but apparently you do.

This subthread is getting absurd, stopping here.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Josh Berkus

On 07/03/2015 03:12 AM, Sawada Masahiko wrote:
> Thanks. So we can choice the next master server using by checking the
> progress of each server, if hot standby is enabled.
> And a such procedure is needed even today replication.
> 
> I think that the #2 problem which is Josh pointed out seems to be solved;
> 1. I need to ensure that data is replicated to X places.
> 2. I need to *know* which places data was synchronously replicated
> to when the master goes down.
> And we can address #1 problem using quorum commit.

It's not solved. I still have zero ways of knowing if a replica was in
sync or not at the time the master went down.

Now, you and others have argued persuasively that there are valuable use
cases for quorum commit even without solving that particular issue, but
there's a big difference between "we can work around this problem" and
the problem is solved.  I forked the subject line because I think that
the inability to identify synch replicas under failover conditions is a
serious problem with synch rep *today*, and pretending that it doesn't
exist doesn't help us even if we don't fix it in 9.6.

Let me give you three cases where our lack of information on the replica
side about whether it thinks it's in sync or not causes synch rep to
fail to protect data.  The first case is one I've actually seen in
production, and the other two are hypothetical but entirely plausible.

Case #1: two synchronous replica servers have the application name
"synchreplica".  An admin uses the wrong Chef template, and deploys a
server which was supposed to be an async replica with the same
recovery.conf template, and it ends up in the "synchreplica" group as
well. Due to restarts (pushing out an update release), the new server
ends up seizing and keeping sync. Then the master dies.  Because the new
server wasn't supposed to be a sync replica in the first place, it is
not checked; they just fail over to the furthest ahead of the two
original synch replicas, neither of which was actually in synch.

Case #2: "2 { local, london, nyc }" setup.  At 2am, the links between
data centers become unreliable, such that the on-call sysadmin disables
synch rep because commits on the master are intolerably slow.  Then, at
10am, the links between data centers fail entirely.  The day shift, not
knowing that the night shift disabled sync, fail over to London thinking
that they can do so with zero data loss.

Case #3 "1 { london, frankfurt }, 1 { sydney, tokyo }" multi-group
priority setup.  We lose communication with everything but Europe.  How
can we decide whether to wait to get sydney back, or to promote London
immedately?

I could come up with numerous other situations, but all of the three
above completely reasonable cases show how having the knowledge of what
time a replica thought it was last in sync is vital to preventing bad
failovers and data loss, and to knowing the quantity of data loss when
it can't be prevented.

It's an issue *now* that the only data we have about the state of sync
rep is on the master, and dies with the master.   And it severely limits
the actual utility of our synch rep.  People implement synch rep in the
first place because the "best effort" of asynch rep isn't good enough
for them, and yet when it comes to failover we're just telling them
"give it your best effort".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Andres Freund

On 2015-07-03 19:02:29 +0200, Andres Freund wrote:
> Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm
> right now missing how the whole "skip wal logging if relation has just
> been truncated" optimization can ever actually be crashsafe unless we
> use a new relfilenode (which we don't!).

We actually used to use a different relfilenode, but optimized that
away: cab9a0656c36739f59277b34fea8ab9438395869

commit cab9a0656c36739f59277b34fea8ab9438395869
Author: Tom Lane 
Date:   Sun Aug 23 19:23:41 2009 +

Make TRUNCATE do truncate-in-place when processing a relation that was 
created
or previously truncated in the current (sub)transaction.  This is safe since
if the (sub)transaction later rolls back, we'd just discard the rel's 
current
physical file anyway.  This avoids unreasonable growth in the number of
transient files when a relation is repeatedly truncated.  Per a performance
gripe a couple weeks ago from Todd Cook.

to me the reasoning here looks flawed.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Andres Freund

On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote:
> Am I missing something. ISTM that if the truncate record was simply not
> logged at all everything would work fine. The whole point is that the
> table was created in this transaction and so if it exists the table on
> disk must be the correct representation.

That'd not work either. Consider:

BEGIN;
CREATE TABLE ...
INSERT;
TRUNCATE;
INSERT;
COMMIT;

If you replay that without a truncation wal record the second INSERT
will try to add stuff to already occupied space. And they can have
different lengths and stuff, so you cannot just ignore that fact.

> The broken index is just one symptom.

Agreed. I think the problem is something else though. Namely that we
reuse the relfilenode for heap_truncate_one_rel(). That's just entirely
broken afaics. We need to allocate a new relfilenode and write stuff
into that. Then we can forgo WAL logging the truncation record.

> If you insert a row before commit then after replay the tuple should be there 
> still.

The insert would be WAL logged. COPY skips wal logging tho.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing the size of BufferTag & remodeling forks

2015-07-03 Thread Andres Freund

On 2015-07-03 13:59:07 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > 2) Replace relation forks, with the exception of the init fork which is
> >special anyway, with separate relfilenodes. Stored in seperate
> >columns in pg_class.
> 
> Different AMs have different fork needs; for heaps you want one main
> fork, one VM, one fsm.  But for indexes, the VM fork is not necessary,
> and some AMs might want different ones.  For instance, GIN would benefit
> from having separate forks to store the internal indexes, and BRIN would
> benefit from a separate fork for the revmap.
> 
> What I'm saying is that I'm not sure it's okay to store the forks in
> pg_class columns

Right. Part of the point of this design is that you could easily add
further forks without system wide knowledge and that it is *not*
required anymore that all relfilenodes are in one column. I think it'd
probably make sense to have at least _vm and _fsm in pg_class, but we
could easily add further ones elsewhere.

> instead perhaps we should have a separate catalog on
> which each relation can have many forks, or perhaps have the pg_class
> entry store an array (ick).  Or perhaps rather than "relvmfork" (the
> pg_class column for the relfilenode for the VM fork) we could store
> relfilenode1, relfilenode2 where the value for each N fork is
> AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is
> main, 2 is revmap; and so forth).

None of these sound particularly pretty to me. An array of relfilenodes
would probably be the least ugly one.

> FWIW the whole idea seems reasonable to me.  I worry about concurrent
> traffic into the pg_relfilenode shared table -- if temp table creation
> is common across many databases, is it going to become a contention
> point?

I don't think it'll be. It's essentially just inserts into a tiny table
with a single index, right? We can do a bootload of inserts into one of
these.

In an *assert enabled* build:
CREATE TABLE pg_relfilenode() WITH OIDS;
ALTER TABLE pg_relfilenode ADD CONSTRAINT pg_relfilenode_pkey PRIMARY KEY(oid);

which is pretty much how pg_relfilenode would look like. Although we'd
not go through the whole executor for the inserts.

pgbench -h localhost -p 5440 postgres -n -f /tmp/pg_relfilenode.sql -j 16 -c 16 
-T 20 -P 1
cat /tmp/pg_relfilenode.sql:
INSERT INTO pg_relfilenode DEFAULT VALUES;
progress: 5.0 s, 32168.4 tps, lat 0.495 ms stddev 1.728
progress: 6.0 s, 33719.6 tps, lat 0.473 ms stddev 0.773

andres@awork2:~/src/postgresql$ pgbench -h localhost -p 5440 postgres -n -f 
/tmp/temptable.sql -j 16 -c 16 -T 20 -P 1
CREATE TEMPORARY TABLE blarg() ON COMMIT DROP;
progress: 6.0 s, 5018.2 tps, lat 3.185 ms stddev 3.671
progress: 7.0 s, 4890.9 tps, lat 3.272 ms stddev 4.346

and that's with zero actual columns. If you instead add some:
CREATE TEMPORARY TABLE blarg(id serial primary key, data text, blarg int) ON 
COMMIT DROP;
progress: 7.0 s, 974.1 tps, lat 16.462 ms stddev 9.058
progress: 8.0 s, 999.9 tps, lat 16.045 ms stddev 7.011

So no, I don't think this'll be a relevant problem. We do so many
inserts for a single temp table that a single insert into another one
completely vanishes in comparison.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Martijn van Oosterhout

On Fri, Jul 03, 2015 at 12:53:56PM -0400, Tom Lane wrote:
> Fujii Masao  writes:
> > Okay, so probably we need to change WAL replay of TRUNCATE so that
> > the index file is truncated to one containing only meta page instead of
> > empty one. That is, the WAL replay of TRUNCATE would need to call
> > index_build() after smgrtruncate() maybe.
> 
> That seems completely unworkable.  For one thing, index_build would expect
> to be able to do catalog lookups, but we can't assume that the catalogs
> are in a good state yet.
> 
> I think the responsibility has to be on the WAL-writing end to emit WAL
> instructions that lead to a correct on-disk state.  Putting complex
> behavior into the reading side is fundamentally misguided.

Am I missing something. ISTM that if the truncate record was simply not
logged at all everything would work fine. The whole point is that the
table was created in this transaction and so if it exists the table on
disk must be the correct representation.

The broken index is just one symptom. The heap also shouldn't be
truncated at all. If you insert a row before commit then after replay
the tuple should be there still.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] more RLS oversights

2015-07-03 Thread Noah Misch

On Wed, Feb 25, 2015 at 11:37:24PM -0500, Stephen Frost wrote:
> * Stephen Frost (sfr...@snowman.net) wrote:
> > I agree that it's great that we're catching issues prior to when the
> > feature is released and look forward to anything else you (or anyone
> > else!) finds.
> 
> I've pushed a fix for this.  Please let me know if you see any other
> issues or run into any problems.

(1) CreatePolicy() and AlterPolicy() omit to call assign_expr_collations() on
the node trees.  Test case:

begin;
set row_security = force;
create table t (c) as values ('bar'::text);
create policy p on t using (c < ('foo'::text COLLATE "C"));
alter table t enable row level security;
table pg_policy;  -- note ":inputcollid 0"
select * from t;  -- ERROR:  could not determine which collation ...
rollback;

(2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for
each role in the TO clause.  Test case:

begin;
create role alice;
create table t (c) as values ('bar'::text);
grant select on table t to alice;
create policy p on t to alice using (true);
select refclassid::regclass, refobjid, refobjsubid, deptype
  from pg_depend where classid = 'pg_policy'::regclass;
select refclassid::regclass, refobjid, deptype
  from pg_shdepend where classid = 'pg_policy'::regclass;
savepoint q; drop role alice; rollback to q;
revoke all on table t from alice;
\d t
drop role alice;
\d t
rollback;

> +static void
> +dumpPolicy(Archive *fout, PolicyInfo *polinfo)
...
> + if (polinfo->polqual != NULL)
> + appendPQExpBuffer(query, " USING %s", polinfo->polqual);

(3) The USING clause needs parentheses; a dump+reload failed like so:

pg_restore: [archiver (db)] could not execute query: ERROR:  syntax error at or 
near "CASE"
LINE 2: CASE
^
Command was: CREATE POLICY "p2" ON "category" FOR ALL TO PUBLIC USING 
CASE
WHEN ("current_user"() = 'rls_regress_user1'::"name") THE...

Add the same parentheses to psql \d output also, keeping that output similar
to the SQL syntax.

(3a) I found this by hacking the rowsecurity.sql regression test to not drop
its objects, then running the pg_upgrade test suite.  New features that affect
pg_dump should leave objects in the regression database to test the pg_dump
support via that suite.

(4) When DefineQueryRewrite() is about to convert a table to a view, it checks
the table for features unavailable to views.  For example, it rejects tables
having triggers.  It omits to reject tables having relrowsecurity or a
pg_policy record.  Test case:

begin;
set row_security = force;
create table t (c) as select * from generate_series(1,5);
create policy p on t using (c % 2 = 1);
alter table t enable row level security;
table t;
truncate t;
create rule "_RETURN" as on select to t do instead
  select * from generate_series(1,5) t0(c);
table t;
select polrelid::regclass from pg_policy;
select relrowsecurity from pg_class where oid = 't'::regclass;
rollback;

> +  
> +   Referential integrity checks, such as unique or primary key constraints
> +   and foreign key references, will bypass row security to ensure that
> +   data integrity is maintained.  Care must be taken when developing
> +   schemas and row level policies to avoid a "covert channel" leak of
> +   information through these referential integrity checks.
...
> + /*
> +  * Row-level security should be disabled in the case where a foreign-key
> +  * relation is queried to check existence of tuples that references the
> +  * primary-key being modified.
> +  */
> + temp_sec_context = save_sec_context | SECURITY_LOCAL_USERID_CHANGE;
> + if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK
> + || qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK
> + || qkey->constr_queryno == RI_PLAN_RESTRICT_DEL_CHECKREF
> + || qkey->constr_queryno == RI_PLAN_RESTRICT_UPD_CHECKREF)
> + temp_sec_context |= SECURITY_ROW_LEVEL_DISABLED;

(5) This code does not use SECURITY_ROW_LEVEL_DISABLED for foreign keys with
CASCADE, SET NULL or SET DEFAULT actions.  The associated documentation says
nothing about this presumably-important distinction.

Is SECURITY_ROW_LEVEL_DISABLED mode even needed?  We switch users to the owner
of the FROM-clause table before running an RI query.  That means use of this
mode can only matter under row_security=force, right?  I would rest easier if
this mode went away, because it is a security landmine.  If user code managed
to run in this mode, it would bypass every policy in the database.  (I find no
such vulnerability today, because we use the mode only for parse analysis of
ri_triggers.c queries.)

(6) AlterPolicy() calls InvokeObjectPostAlterHook(PolicyRelationId, ...), but
CreatePolicy() and DropPolicy() lack their respective hook invocations.

(7) Using an aggregate function in a policy predicate elicits an inapposite
error message due to use of EXPR_KIND_WHERE for parse analysis.  Need a new
ParseExprKind.  Test case:

begin;
set row

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Andres Freund

On 2015-07-03 18:49:31 +0200, Andres Freund wrote:
> But the more interesting question is why that's not hhappening
> today. RelationTruncateIndexes() does call the index_build() which
> should end up WAL logging the index creation.

So that's because there's an XLogIsNeeded() preventing it.

Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm
right now missing how the whole "skip wal logging if relation has just
been truncated" optimization can ever actually be crashsafe unless we
use a new relfilenode (which we don't!).

Sure, we do an heap_sync() at the the end of the transaction. That's
nice and all. But it doesn't help if we crash and re-start WAL apply
from a checkpoint before the table was created. Because that'll replay
the truncation.

That's much worse than just the indexes - the rows added by a COPY
without WAL logging will also be truncated away, no?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing the size of BufferTag & remodeling forks

2015-07-03 Thread Alvaro Herrera

Andres Freund wrote:

> 2) Replace relation forks, with the exception of the init fork which is
>special anyway, with separate relfilenodes. Stored in seperate
>columns in pg_class.

Different AMs have different fork needs; for heaps you want one main
fork, one VM, one fsm.  But for indexes, the VM fork is not necessary,
and some AMs might want different ones.  For instance, GIN would benefit
from having separate forks to store the internal indexes, and BRIN would
benefit from a separate fork for the revmap.

What I'm saying is that I'm not sure it's okay to store the forks in
pg_class columns; instead perhaps we should have a separate catalog on
which each relation can have many forks, or perhaps have the pg_class
entry store an array (ick).  Or perhaps rather than "relvmfork" (the
pg_class column for the relfilenode for the VM fork) we could store
relfilenode1, relfilenode2 where the value for each N fork is
AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is
main, 2 is revmap; and so forth).

FWIW the whole idea seems reasonable to me.  I worry about concurrent
traffic into the pg_relfilenode shared table -- if temp table creation
is common across many databases, is it going to become a contention
point?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Tom Lane

Fujii Masao  writes:
> Okay, so probably we need to change WAL replay of TRUNCATE so that
> the index file is truncated to one containing only meta page instead of
> empty one. That is, the WAL replay of TRUNCATE would need to call
> index_build() after smgrtruncate() maybe.

That seems completely unworkable.  For one thing, index_build would expect
to be able to do catalog lookups, but we can't assume that the catalogs
are in a good state yet.

I think the responsibility has to be on the WAL-writing end to emit WAL
instructions that lead to a correct on-disk state.  Putting complex
behavior into the reading side is fundamentally misguided.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Andres Freund

On 2015-07-04 01:39:42 +0900, Fujii Masao wrote:
> Okay, so probably we need to change WAL replay of TRUNCATE so that
> the index file is truncated to one containing only meta page instead of
> empty one. That is, the WAL replay of TRUNCATE would need to call
> index_build() after smgrtruncate() maybe.
> 
> Then how should we implement that? Invent new WAL record type that
> calls smgrtruncate() and index_build() during WAL replay? Or add the
> special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay
> call index_build() only if the flag is found? Any other good idea?
> Anyway ISTM that we might need to add or modify WAL record.

It's easy enough to log something like a metapage with
log_newpage().

But the more interesting question is why that's not hhappening
today. RelationTruncateIndexes() does call the index_build() which
should end up WAL logging the index creation.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Fabien COELHO



(although actually, why wouldn't we want to just implement variable 
substitution exactly like it is in psql?


Pgbench variable substitution is performed when the script is run, not 
while the file is being processed for being split, which is when a lexer 
would be used. The situation is not the same with psql. The most it could 
do would be to keep track of what substitution are done in queries.



So this is looking *eminently* doable.


Possibly.  How much more effort would be involved compared to the quick 
patch I did, I wonder:-)


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Fujii Masao

On Fri, Jul 3, 2015 at 11:52 PM, Tom Lane  wrote:
> Fujii Masao  writes:
>> The optimization of TRUNCATE opereation that we can use when
>> CREATE TABLE and TRUNCATE are executed in the same transaction block
>> seems to cause the problem. In this case, only index file truncation is
>> logged, and index creation in btbuild() is not logged because wal_level
>> is minimal. Then at the subsequent crash recovery, index file is truncated
>> to 0 byte... Very simple fix is to log an index creation in that case,
>> but not sure if that's ok to do..
>
>> In 9.2 or before, this problem doesn't occur because no such error is thrown
>> even if an index file size is zero. But in 9.3 or later, since the planner
>> tries to read a meta page of an index to get the height of the btree tree,
>> an empty index file causes such error. The planner was changed that way by
>> commit 31f38f28, and the problem seems to be an oversight of that commit.
>
> What?  You want to blame the planner for failing because the index was
> left corrupt by broken WAL replay?  A failure would occur anyway at
> execution.

Yep, right. I was not thinking that such index with file size 0 is corrupted
because the reported problem didn't happen before that commit was added.
But that's my fault. Such index can cause an error even in other code paths.

Okay, so probably we need to change WAL replay of TRUNCATE so that
the index file is truncated to one containing only meta page instead of
empty one. That is, the WAL replay of TRUNCATE would need to call
index_build() after smgrtruncate() maybe.

Then how should we implement that? Invent new WAL record type that
calls smgrtruncate() and index_build() during WAL replay? Or add the
special flag to XLOG_SMGR_TRUNCATE record, and make WAL replay
call index_build() only if the flag is found? Any other good idea?
Anyway ISTM that we might need to add or modify WAL record.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-03 Thread Tom Lane

Jeevan Chalke  writes:
> Patch looks excellent now. No issues.
> Found a typo which I have fixed in the attached patch.

Starting to look at this ...

The business with numbering lines from SELECT seems to me to be completely
nonsensical.  In the first place, it fails to allow for views containing
WITH clauses.  But really it looks like it was cargo-culted over from
\ef/\sf without understanding why those commands number lines the way
they do.  The reason they do that is that for errors occurring inside a
function definition, the PL will typically report a line number relative
to the function body text, and so we're trying to be helpful about
interpreting line numbers of that kind.  But there's no comparable
behavior in the case of a view.  If you fat-finger a view, you'll get
a line number relative to the text of the whole CREATE command, eg

regression=# create or replace view z as
regression-# select 1/col
regression-# from bar;
ERROR:  relation "bar" does not exist
LINE 3: from bar;
 ^

So AFAICS, \ev and \sv should just number lines straightforwardly, with
"1" being the first line of the CREATE command text.  Am I missing
something?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Tom Lane

I wrote:
> As it stands, psqlscan.l has some external dependencies on the rest of
> psql, but we could perhaps refactor some of those away, and provide dummy
> implementations to satisfy others (eg pgbench could provide a dummy
> GetVariable() that just always returns NULL).

> So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it
> as-is (possibly after refactoring in psql).  A possible issue is avoiding
> unnecessary invocations of flex, though.  Maybe symlinking the .c file
> would work better.

A quick experiment with compiling psqlscan inside pgbench yields the
following failures:

pgbench.o: In function `psql_scan_setup':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1239: undefined reference to 
`pset'
pgbench.o: In function `escape_variable':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1950: undefined reference to 
`GetVariable'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1956: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1957: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1971: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1973: undefined reference to 
`psql_error'
pgbench.o: In function `evaluate_backtick':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1701: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1712: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1722: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1728: undefined reference to 
`psql_error'
pgbench.o: In function `yylex':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:511: undefined reference to 
`standard_strings'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:743: undefined reference to 
`GetVariable'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:751: undefined reference to 
`psql_error'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1037: undefined reference to 
`GetVariable'
pgbench.o: In function `psql_scan_slash_option':
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1619: undefined reference to 
`pset'
/home/postgres/pgsql/src/bin/pgbench/psqlscan.l:1628: undefined reference to 
`psql_error'

The pset references are to pset.encoding, pset.db, or pset.vars.  I'd
think the best way to deal with the encoding and connection are to pass
them as parameters to psql_scan_setup() which'd store them in 
the PsqlScanState.  pset.vars is only passed to GetVariable.  We could
refactor that away somehow (although actually, why wouldn't we want to
just implement variable substitution exactly like it is in psql?  Maybe
the right answer is to import psql/variables.c lock stock n barrel too...)
psql_error() and standard_strings() wouldn't be hard to provide.

So this is looking *eminently* doable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

2015-07-03 Thread Petr Jelinek


On 2015-07-03 15:50, Michael Paquier wrote:

On Thu, Jul 2, 2015 at 11:16 PM, Petr Jelinek  wrote:

I was going through the code and have few comments:
- Why do you change the return value of TryReuseIndex? Can't we use reuse
the same OidIsValid(stmt->oldNode) check that ATExecAddIndex is doing
instead?


As pointed out by Heikki previously, that is actually unnecessary,
comments are still lost even if the index is reused for constraints.
So perhaps for simplicity we could just unconditionally recreate the
comments all the time if they are available.



Ah ok, I missed Heikki's email.


- I think the changes to ATPostAlterTypeParse should follow more closely the
coding of transformTableLikeClause - namely use the idxcomment


I am not sure I follow here. Could you elaborate?



Well for indexes you don't really need to add the new AT command, as 
IndexStmt has char *idxcomment which it will automatically uses as 
comment if not NULL. While  I am not huge fan of the idxcomment it 
doesn't seem to be easy to remove it in the future and it's what 
transformTableLikeClause uses so it might be good to be consistent with 
that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Tom Lane

Fujii Masao  writes:
> The optimization of TRUNCATE opereation that we can use when
> CREATE TABLE and TRUNCATE are executed in the same transaction block
> seems to cause the problem. In this case, only index file truncation is
> logged, and index creation in btbuild() is not logged because wal_level
> is minimal. Then at the subsequent crash recovery, index file is truncated
> to 0 byte... Very simple fix is to log an index creation in that case,
> but not sure if that's ok to do..

> In 9.2 or before, this problem doesn't occur because no such error is thrown
> even if an index file size is zero. But in 9.3 or later, since the planner
> tries to read a meta page of an index to get the height of the btree tree,
> an empty index file causes such error. The planner was changed that way by
> commit 31f38f28, and the problem seems to be an oversight of that commit.

What?  You want to blame the planner for failing because the index was
left corrupt by broken WAL replay?  A failure would occur anyway at
execution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-03 Thread Andrew Dunstan



On 07/03/2015 01:47 AM, Marco Atzeri wrote:

On 7/2/2015 5:16 PM, Dave Page wrote:

The PostgreSQL Global Development Group announces that the alpha
release of PostgreSQL 9.5, the latest version of the world's leading
open source database, is available today.  This release contains
previews of all of the features which will be available in the final
release of version 9.5, although some details will change before then.
Please download, test, and report what you find.

Help Test for Bugs
--




building on cygwin and

$ perl --version

This is perl 5, version 22, subversion 0 (v5.22.0) built for 
cygwin-thread-multi-64int


build fail here, anyone seeing the same on other platforms ?

make -C hstore_plperl all
make[1]: Entering directory 
'/cygdrive/e/cyg_pub/devel/postgresql/prova/postgres 
ql-9.5alpha1-1.i686/build/contrib/hstore_plperl'
gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -We ndif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f 
   wrapv -fexcess-precision=standard -ggdb -O2 
-pipe -Wimplicit-function-declaratio   n 
-I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5a 
  lpha1/src/pl/plperl 
-I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr 
   c/postgresql-9.5alpha1/contrib/hstore -I. 
-I/pub/devel/postgresql/prova/postgres 
ql-9.5alpha1-1.i686/src/postgresql-9.5alpha1/contrib/hstore_plperl 
-I../../src/i   nclude 
-I/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql- 
 9.5alpha1/src/include 
-I/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE  -c 
-o hstore_plperl.o 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/p 
ostgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c
In file included from 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr

c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:6:0:
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 
  /contrib/hstore/hstore.h:79:0: warning: 
"HS_KEY" redefined

 #define HS_KEY(arr_,str_,i_) ((str_) + HSE_OFF((arr_)[2*(i_)]))
 ^
In file included from 
/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/perl.h: 
 3730:0,
 from 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr 
 c/postgresql-9.5alpha1/src/pl/plperl/plperl.h:48,
 from 
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/sr

c/postgresql-9.5alpha1/contrib/hstore_plperl/hstore_plperl.c:4:
/usr/lib/perl5/5.22/i686-cygwin-threads-64int/CORE/util.h:221:0: note: 
this is t   he location of the previous 
definition

 #  define HS_KEY(setxsubfn, popmark, apiver, xsver) \
 ^
gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -We ndif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f 
   wrapv -fexcess-precision=standard -ggdb -O2 
-pipe -Wimplicit-function-declaratio   n 
-shared -o hstore_plperl.dll -Wl,--out-implib=libhstore_plperl.a 
hstore_plpe   rl.o  -L../../src/port 
-L../../src/common -Wl,-no-undefined -Wl,--allow-multiple 
-definition -Wl,--enable-auto-import  -L/usr/local/lib 
-Wl,--as-needed   -L../..   /src/backend 
-lpostgres -lpgcommon -lpgport -lintl -lssl -lcrypto -lz -lreadline 
   -lcrypt -lldap

hstore_plperl.o: In function `hstore_to_plperl':
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1 
/contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to 
`hstoreUpgrade   '






That #define clash is annoying, We should probably #undefine HS_KEY if 
it's defined before including hstore.h.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Tom Lane

Fabien COELHO  writes:
>> I'm pretty clearly in favor of doing correct lexing. I think we should
>> generalize that and make it reusable. psql has it's own hacked up
>> version already, there seems little point in having variedly good copies
>> around.

> I must admit that I do not know how to share lexer rules but have 
> different actions on them (psql vs sql parser vs ...), as the action code 
> is intrinsically intertwined with expressions.

Obviously this is scope creep of the first magnitude, but ISTM that
it would be possible to share a lexer between psql and pgbench, since
in both of them the basic requirement is "break SQL commands apart and
identify newline-terminated backslash commands".  If we're gonna break
pgbench's backwards compatibility anyway, there would be a whole lot
to be said for just going over to psql's input parsing rules, lock
stock 'n barrel; and this would be a good way to achieve that.

As it stands, psqlscan.l has some external dependencies on the rest of
psql, but we could perhaps refactor some of those away, and provide dummy
implementations to satisfy others (eg pgbench could provide a dummy
GetVariable() that just always returns NULL).

So I'm imagining symlinking psqlscan.l into src/bin/pgbench and using it
as-is (possibly after refactoring in psql).  A possible issue is avoiding
unnecessary invocations of flex, though.  Maybe symlinking the .c file
would work better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-03 Thread Fujii Masao

On Fri, Jul 3, 2015 at 3:01 PM, Martijn van Oosterhout
 wrote:
> On Fri, Jul 03, 2015 at 02:34:44PM +0900, Fujii Masao wrote:
>> > Hmm, for me it is 100% reproducable. Are you familiar with Docker? I
>> > can probably construct a Dockerfile that reproduces it pretty reliably.
>>
>> I could reproduce the problem in the master branch by doing
>> the following steps.
>
> Thank you, I wasn't sure if you could kill the server fast enough
> without containers, but it looks like immediate mode is enough.
>
>> 1. start the PostgreSQL server with wal_level = minimal
>> 2. execute the following SQL statements
>>  begin;
>>  create table test(id serial primary key);
>>  truncate table test;
>>  commit;
>> 3. shutdown the server with immediate mode
>> 4. restart the server (crash recovery occurs)
>> 5. execute the following SQL statement
>> select * from test;
>>
>> The optimization of TRUNCATE opereation that we can use when
>> CREATE TABLE and TRUNCATE are executed in the same transaction block
>> seems to cause the problem. In this case, only index file truncation is
>> logged, and index creation in btbuild() is not logged because wal_level
>> is minimal. Then at the subsequent crash recovery, index file is truncated
>> to 0 byte... Very simple fix is to log an index creation in that case,
>> but not sure if that's ok to do..

In 9.2 or before, this problem doesn't occur because no such error is thrown
even if an index file size is zero. But in 9.3 or later, since the planner
tries to read a meta page of an index to get the height of the btree tree,
an empty index file causes such error. The planner was changed that way by
commit 31f38f28, and the problem seems to be an oversight of that commit.

I'm not familiar with that change of the planner, but ISTM that we can
simply change _bt_getrootheight() so that 0 is returned if an index file is
empty, i.e., meta page cannot be read, in order to work around the problem.
Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-03 Thread Andrew Dunstan



On 07/03/2015 06:27 AM, Heikki Linnakangas wrote:

On 05/27/2015 09:51 PM, Andrew Dunstan wrote:


On 05/27/2015 02:37 PM, Robert Haas wrote:

On Tue, May 26, 2015 at 2:50 AM, Shulgin, Oleksandr
 wrote:

Is it reasonable to add this patch to CommitFest now?

It's always reasonable to add a patch to the CommitFest if you would
like for it to be reviewed and avoid having it get forgotten about.
There seems to be some disagreement about whether we want this, but
don't let that stop you from adding it to the next CommitFest.


I'm not dead set against it either. When I have time I will take a
closer look.


Andrew, will you have the time to review this? Please add yourself as 
reviewer in the commitfest app if you do.


My 2 cents is that I agree with your initial reaction: This is a lot 
of infrastructure and generalizing things, for little benefit. Let's 
change the current code where we generate JSON to be consistent with 
whitespace, and call it a day.


- Heikki




I'm somewhat on vacation for the next week or so, so I won't claim it, 
but I'll try to make time to look at it. Other people (Merlin?) could 
also provide reviews.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Fabien COELHO



The home-grown lexer is missing e.g. dollar-quoting support, so this is not 
be parsed correctly:


do $$
begin
 ...
end;
$$;


That would be very nice to handle correctly, I've used DO-blocks in pgbench 
scripts many times, and it's a pain to have to write them in a single line.


Attached is a version which does that (I think), and a test script.

 - backslash-commands can be \-continuated
 - sql-commands may include $$-quotes and must end with a ';'
 - double-dash comments and blank line are skipped.

Obviously it is still a non-lexer hack which can be easily defeated, but 
ISTM that it handles non-contrived cases well. Anyway ISTM that dollar 
quoting cannot be handle as such and simply by a lexer, it is really an 
exception mechanism.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 2517a3a..b816673 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -696,11 +696,16 @@ pgbench  options  dbname
   
 
   
-   The format of a script file is one SQL command per line; multiline
-   SQL commands are not supported.  Empty lines and lines beginning with
-   -- are ignored.  Script file lines can also be
-   meta commands, which are interpreted by pgbench
-   itself, as described below.
+   The format of a script file is composed of lines which are each either
+   one SQL command or one meta command interpreted by
+   pgbench itself, as described below.
+   Meta-commands can be spread over multiple lines using backslash
+   (\) continuations, in which case the set of continuated
+   lines is considered as just one line.
+   SQL commands may be spead over several lines and must be
+   ;-terminated, and may contain simple dollar-quoted strings
+   over multiple lines.
+   Empty lines and lines beginning with -- are ignored.
   
 
   
@@ -768,7 +773,9 @@ pgbench  options  dbname
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+-- update an already defined aid:
+\set aid \
+  (1021 * :aid) % (10 * :scale) + 1
 
 

@@ -931,11 +938,15 @@ pgbench  options  dbname
 \setrandom tid 1 :ntellers
 \setrandom delta -5000 5000
 BEGIN;
-UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+UPDATE pgbench_accounts
+  SET abalance = abalance + :delta WHERE aid = :aid;
 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+UPDATE pgbench_tellers
+  SET tbalance = tbalance + :delta WHERE tid = :tid;
+UPDATE pgbench_branches
+  SET bbalance = bbalance + :delta WHERE bid = :bid;
+INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
+  VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
 END;
 
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 95be62c..c10fb29 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2430,7 +2430,10 @@ process_commands(char *buf, const char *source, const int lineno)
 }
 
 /*
- * Read a line from fd, and return it in a malloc'd buffer.
+ * Read a possibly \-continuated (for backslash commands) or ;-terminated
+ * (for SQL statements) lines from fd, and return it in a malloc'd buffer.
+ * Also handle possible $$-quotes.
+ *
  * Return NULL at EOF.
  *
  * The buffer will typically be larger than necessary, but we don't care
@@ -2443,6 +2446,13 @@ read_line_from_file(FILE *fd)
 	char	   *buf;
 	size_t		buflen = BUFSIZ;
 	size_t		used = 0;
+	bool		is_sql_statement = false;
+	bool		is_backslash_command = false;
+	/* simplistic $$-quoting handling */
+	int			ddquote_start = 0;
+	int 		ddquote_length = 0;
+	int 		ddquote_end = 0; /* where the previous $$-quote ended */
+	char	   *ddquote_string = NULL;
 
 	buf = (char *) palloc(buflen);
 	buf[0] = '\0';
@@ -2451,13 +2461,130 @@ read_line_from_file(FILE *fd)
 	{
 		size_t		thislen = strlen(tmpbuf);
 
+		/* coldly skip comments and empty lines */
+		{
+			int i = 0;
+
+			while (i < thislen && isspace(tmpbuf[i]))
+i++;
+
+			if (tmpbuf[i] == '\0') /* blank */
+continue;
+
+			if (tmpbuf[i] == '-' && tmpbuf[i+1] == '-') /* comment */
+continue;
+		}
+
 		/* Append tmpbuf to whatever we had already */
 		memcpy(buf + used, tmpbuf, thislen + 1);
 		used += thislen;
 
-		/* Done if we collected a newline */
-		if (thislen > 0 && tmpbuf[thislen - 1] == '\n')
-			break;
+		/* Determined what the current line is */
+		if (!is_backslash_command && !is_sql_statement)
+		{
+			int i = 0;
+
+			while (i < thislen && isspace(tmpbuf[i]))
+i++;
+
+			if (tmpbuf[i] == '\\')
+is_backslash_command = true;
+			else if (tmpbuf[i] != '\0')
+is_sql_statement = true;
+		}
+
+		/* Handle simple $$-quoting, may not work if several quotes on a line.
+		 */
+		if (is_sql_statement && ddquote_stri

Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

2015-07-03 Thread Michael Paquier

On Thu, Jul 2, 2015 at 11:16 PM, Petr Jelinek  wrote:
> I was going through the code and have few comments:
> - Why do you change the return value of TryReuseIndex? Can't we use reuse
> the same OidIsValid(stmt->oldNode) check that ATExecAddIndex is doing
> instead?

As pointed out by Heikki previously, that is actually unnecessary,
comments are still lost even if the index is reused for constraints.
So perhaps for simplicity we could just unconditionally recreate the
comments all the time if they are available.

> - I think the changes to ATPostAlterTypeParse should follow more closely the
> coding of transformTableLikeClause - namely use the idxcomment

I am not sure I follow here. Could you elaborate?

> - Also the changes to ATPostAlterTypeParse move the code somewhat
> uncomfortably over the 80 char width, I don't really see a way to fix that
> except for moving some of the nesting out to another function.

Yes, I did some refactoring here by adding a set of new routines
dedicated to attach generated commands to the correct queue. This way
the code is empty of large switch/case blocks.

Update patch is attached, with the issue reported by Heikki upthread
fixed as well.
Regards,
-- 
Michael
From 2244608d3d06b89202b506f31b00a7d58a57f9c5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 3 Jul 2015 22:47:22 +0900
Subject: [PATCH] Ensure COMMENT persistency of indexes and constraint with
 ALTER TABLE

When rewriting a table, in some cases indexes and constraints present
on it need to be recreated from scratch, making any existing comment
entry, as known as a description in pg_description, disappear after
ALTER TABLE.

This commit fixes this issue by tracking the existing constraint,
indexes, and combinations of both when running ALTER TABLE and recreate
COMMENT entries when appropriate. A set of regression tests is added
to test all the new code paths added.
---
 src/backend/commands/tablecmds.c  | 287 ++
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/alter_table.out |  95 ++
 src/test/regress/sql/alter_table.sql  |  37 
 4 files changed, 346 insertions(+), 74 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d394713..78e6b5c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -316,6 +316,13 @@ static void ATRewriteTables(AlterTableStmt *parsetree,
 List **wqueue, LOCKMODE lockmode);
 static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
 static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel);
+static void ATAttachQueueCommand(Oid oldId, Oid refRelId, List **wqueue,
+Node *stm, Relation rel, bool rewrite);
+static void ATAttachQueueIndexStmt(Oid oldId, List **wqueue,
+IndexStmt *stmt, Relation rel, bool rewrite);
+static void ATAttachQueueAlterTableStmt(Oid oldId, Oid refRelId,
+			List **wqueue, AlterTableStmt *stmt,
+			Relation rel, bool rewrite);
 static void ATSimplePermissions(Relation rel, int allowed_targets);
 static void ATWrongRelkindError(Relation rel, int allowed_targets);
 static void ATSimpleRecursion(List **wqueue, Relation rel,
@@ -386,6 +393,12 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
 	 char *cmd, List **wqueue, LOCKMODE lockmode,
 	 bool rewrite);
+static void RebuildObjectComment(AlteredTableInfo *tab,
+	 int cmdidx,
+	 ObjectType objtype,
+	 Oid objid,
+	 Oid classoid,
+	 List *objname);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
 static void TryReuseForeignKey(Oid oldId, Constraint *con);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -3498,6 +3511,9 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			address = ATExecAddIndex(tab, rel, (IndexStmt *) cmd->def, true,
 	 lockmode);
 			break;
+		case AT_ReAddComment:	/* Re-add existing comment */
+			CommentObject((CommentStmt *) cmd->def);
+			break;
 		case AT_AddConstraint:	/* ADD CONSTRAINT */
 			address =
 ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
@@ -4251,6 +4267,162 @@ ATGetQueueEntry(List **wqueue, Relation rel)
 	return tab;
 }
 
+
+/*
+ * ATAttachQueueCommand
+ *
+ * Attach each generated command to the proper place in the work queue.
+ * Note this could result in creation of entirely new work-queue entries.
+ *
+ * Also note that the command subtypes have to be tweaked, because it
+ * turns out that re-creation of indexes and constraints has to act a bit
+ * differently from initial creation.
+ */
+static void
+ATAttachQueueCommand(Oid oldId, Oid refRelId, List **wqueue,
+	 Node *stm, Relation rel, bool rewrite)
+{
+	switch (nodeTag(stm))
+	{
+		case T_IndexStmt:
+			ATAttachQueueIndexStmt(oldId, wqueue,
+   (IndexStmt *) stm, rel, rewrite);
+			break;
+		case T

Re: [HACKERS] pg_archivecleanup, and backup filename to specify as an argument

2015-07-03 Thread Fujii Masao

On Thu, Jul 2, 2015 at 8:02 PM, Fujii Masao  wrote:
> Hi,
>
> While I'm implementing the patch around pg_archivecleanup, I found
> the following warning about pg_archivecleanup in the document.
>
> Note that the .backup file name passed to the program should not
> include the extension.
>
> ISTM that pg_archivecleanup works as expected even if the .backup file
> with the extension is specified as follows. So, isn't this warning already
> obsolete? Or I'm missing something?
>
> $ pg_archivecleanup -d -x .zip myarchive
> 00010009.0028.backup.zip
> pg_archivecleanup: keep WAL file "myarchive/00010009" and 
> later
> pg_archivecleanup: removing file "myarchive/00010005.zip"
> pg_archivecleanup: removing file "myarchive/00010003.zip"
> pg_archivecleanup: removing file "myarchive/00010001.zip"
> pg_archivecleanup: removing file "myarchive/00010007.zip"
> pg_archivecleanup: removing file "myarchive/00010006.zip"
> pg_archivecleanup: removing file "myarchive/00010004.zip"
> pg_archivecleanup: removing file "myarchive/00010002.zip"
> pg_archivecleanup: removing file "myarchive/00010008.zip"

Even in 9.2 where -x option was added firstly, I confirmed that
I could pass the .backup file name including the extension to
pg_archivecleanup program. So I'm wondering if the warning in
question was incorrect from the beginning...
Or am I missing something?

In the past thread of -x option patch, I found the following
Robert comment. This makes me think that we unfortunately failed to
notice his comment and finally added the unnecessary warning into
the document.

-
http://www.postgresql.org/message-id/CA+TgmoZDYD_W7K_S1ZuEnqVNOaRWYCX=eetx+r27vb7akrr...@mail.gmail.com
Also, I'm wondering about this warning in the documentation:

+extension added by the compression program.  Note that the
+.backup file name passed to the program should not
+include the extension.

IIUC, the latest change you made makes that warning obsolete, no?

[rhaas pgsql]$ contrib/pg_archivecleanup/pg_archivecleanup -d -x .gz .
00010010.0020.backup.gz
pg_archivecleanup: keep WAL file "./00010010" and later
-

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-03 Thread Marco Atzeri


On 7/3/2015 8:19 AM, Michael Paquier wrote:

On Fri, Jul 3, 2015 at 2:47 PM, Marco Atzeri  wrote:

On 7/2/2015 5:16 PM, Dave Page wrote:



-lldap
hstore_plperl.o: In function `hstore_to_plperl':
/pub/devel/postgresql/prova/postgresql-9.5alpha1-1.i686/src/postgresql-9.5alpha1
/contrib/hstore_plperl/hstore_plperl.c:16: undefined reference to
`hstoreUpgrade   '


So... dangomushi is able to build it at least. Here are the logs to
the last build for example:
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dangomushi&dt=2015-07-02%2019%3A19%3A39&stg=make-contrib
What is the name of the library generated in hstore? Perhaps there is
a mismatch here.


OK thanks for the feedback.
It may be a cygwin perl specific issue.
I will investigate

Regards
Marco



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement

2015-07-03 Thread Ashutosh Bapat

On Fri, Jul 3, 2015 at 4:48 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

>
> On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas  wrote:
> >
> > On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
> >  wrote:
> > >
> http://www.postgresql.org/message-id/ca+tgmozm+-0r7h0edpzzjbokvvq+gavkchmno4fypveccw-...@mail.gmail.com
> >
> > I like the idea of the feature a lot, but the proposal to which you
> > refer here mentions some problems, which I'm curious how you think you
> > might solve.  (I don't have any good ideas myself, beyond what I
> > mentioned there.)
> >
>
> You're right, and we have another design (steps 1 and 2 was implemented
> last year):
>
>
> *** ALTER TABLE changes
>
> 1) ATController
> 1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c
> - AlterTableGetLockLevel:3023)
>
>
> 2) Prepare to change relpersistence (src/backend/commands/tablecmds.c -
> ATPrepCmd:3249-3270)
> • check temp table (src/backend/commands/tablecmds.c -
> ATPrepChangePersistence:11074)
> • check foreign key constraints (src/backend/commands/tablecmds.c -
> ATPrepChangePersistence:11102)
>
>
> 3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync
> (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if
> exists)
>
>
> 4) Create a new fork called  "TRANSIENT INIT FORK":
>
> • from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
>   ∘ new forkName (src/common/relpath.c) called "_initl"
>   ∘ insert XLog record to drop it if transaction abort
>
> • from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
>   ∘ new forkName (src/common/relpath.c) called "_initu"
>   ∘ insert XLog record to drop it if transaction abort
>

AFAIU, while reading WAL, the server doesn't know whether the transaction
that produced that WAL record aborted or committed. It's only when it sees
a COMMIT/ABORT record down the line, it can confirm whether the transaction
committed or aborted. So, one can only "redo" the things that WAL tells
have been done. We can not "undo" things based on the WAL. We might record
this fact "somewhere" while reading the WAL and act on it once we know the
status of the transaction, but I do not see that as part of this idea. This
comment applies to all the steps inserting WALs for undoing things.


>
>
> 5) Change the relpersistence in catalog (pg_class->relpersistence) (heap,
> toast, indexes)
>
>
> 6) Remove/Create INIT_FORK
>
> • from Unlogged to Logged
>   ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the
> pendingDeletes queue
> • from Logged to Unlogged
>   ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
>   ∘ create the INIT_FORK using "heap_create_init_fork"
>   ∘ insert XLog record to drop init fork if the transaction abort
>
>
>
> *** CRASH RECOVERY changes
>
> 1) During crash recovery
> (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)
>
>
This operation is carried out in two phases: one before replaying WAL
records and second after that. Are you sure that the WALs generated for the
unlogged or logged forks, as described above, have been taken care of?


> • if the transient fork "_initl" exists then
>   ∘ drop the transient fork "_initl"
>   ∘ if the init fork doesn't exist then create it
>   ∘ reset relation
> • if the transient fork "_initu" exists then
>   ∘ drop the transient fork "_initl"
>   ∘ if the init fork exists then drop it
>   ∘ don't reset the relation
>
>
Consider case of converting unlogged to logged. The server crashes after
6th step when both the forks are removed. During crash recovery, it will
not see any of the fork and won't reset the unlogged table. Remember the
table is still unlogged since the transaction was aborted due to the crash.
So, it has to have an init fork to be reset on crash recovery.

Similarly while converting from logged to unlogged. The server crashes in
the 6th step after creating the INIT_FORK, during crash recovery the init
fork will be seen and a supposedly logged table will be trashed.

The ideas in 1 and 2 might be better than having yet another init fork.


1. http://www.postgresql.org/message-id/533d457a.4030...@vmware.com

>
> Regards,
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira: http://www.timbira.com.br
> >> Blog: http://fabriziomello.github.io
> >> Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
> >> Github: http://github.com/fabriziomello
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Fabien COELHO




I'm pretty clearly in favor of doing correct lexing. I think we should
generalize that and make it reusable. psql has it's own hacked up
version already, there seems little point in having variedly good copies
around.


I must admit that I do not know how to share lexer rules but have 
different actions on them (psql vs sql parser vs ...), as the action code 
is intrinsically intertwined with expressions. Maybe some hack is 
possible. Having yet another SQL-lexer to maintain seems highly 
undesirable, especially just for pgbench.


I could see including something esimpler, in addition to the lexer, to 
allow sending multiple statements in one go.


Currently, probably

  SELECT 1; SELECT 1;

Does 2 statements in one go, but it is on one line.

May by allowing both continuations and ";" at the same time:

  -- two statements in one go
  SELECT 1; \
  SELECT 1;
  -- next statement on it's own
  SELECT
1;

Which could be reasonnably neat, and easy to implement.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-07-03 Thread Amit Kapila

On Thu, Jul 2, 2015 at 8:49 AM, Amit Kapila  wrote:
>
> On Thu, May 21, 2015 at 10:19 PM, Robert Haas 
wrote:
> >
> > On Sat, Mar 21, 2015 at 2:30 PM, Thom Brown  wrote:
> > > Looks like one of the patches I applied is newer than the one in your
list:
> > >
> > > HEAD Commit-id: 13a10c0ccd984643ef88997ac177da7c4b7e46a6
> > > parallel-mode-v9.patch
> > > assess-parallel-safety-v4.patch
> > > parallel-heap-scan.patch
> > > parallel_seqscan_v11.patch
> >
> > This patch hasn't been updated for a while, so here is a rebased
> > version now that we're hopefully mostly done making changes to
> > pg_proc.h for 9.5.  parallel-mode-v10 was mostly committed, and
> > parallel-heap-scan has now been incorporated into the parallel_seqscan
> > patch.  So you should now be able to test this feature by applying
> > just this patch, and the new version of the parallel seqscan patch
> > which I am given to understand that Amit will be posting pretty soon.
> >
>
> This patch again needs rebase.

Attached, find the rebased patch which can be used to review/test latest
version of parallel_seqscan patch which I am going to post in the parallel
seq. scan thread soonish.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


assess-parallel-safety-v6.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Fabien COELHO



Hello Heikki,


I'm not clear on why we'd need a full SQL lexer.


Attached is a "without lexer" version which does ;-terminated SQL commands
and \-continuated meta commands (may be useful for \shell and long \set
expressions).


As Tom pointed out, you need the full lexer to do this correctly. You can 
argue that something that handles the most common cases is enough, but 
realistically, by the time you've handled all the common cases correctly, 
you've just re-invented the lexer.


Sure. I understand that part of Josh argument is that we are discussing 
pgbench test scripts here, not real full-blown applications, and these are 
expected to be quite basic, plain mostly SQL things.


The home-grown lexer is missing e.g. dollar-quoting support, so this is not 
be parsed correctly:


do $$
begin
 ...
end;
$$;


Hmmm, good one, if indeed you want to use PL/pgSQL or even any arbitrary 
language in a pgbench scripts... I would rather have created functions 
(once, outside of pgbench) and would call them from the script, so that 
would be a simple SELECT.


That would be very nice to handle correctly, I've used DO-blocks in pgbench 
scripts many times, and it's a pain to have to write them in a single line.


Yep. With some languages I'm not sure that it is even possible.

Also worth noting that you can currently test so-called multi-statements 
with pgbench, by putting multiple statements on a single line.


Yes indeed, behind the hood pgbench expects just one line, or you have to 
change significantly the way statements are handled, which is way beyond 
my initial intentions on this one, and this would mean quite a lot of 
changes for more or less corner cases.


Your patch seems to still do that, but if we went with a full-blown SQL 
lexer, they would probably be split into two statements.


I think we should either bite the bullet and include the full SQL lexer in 
pgbench, or come up with some new syntax for marking the beginning and end of 
a statement. We could do something like bash here-documents or Postgres 
dollar-quoting, for example:


\set ...
select 1234; -- A statement on a single line, no change here

-- Begin a multi-line statement
\multi-line-statement END_TOKEN
select *
 from complicated;
END_TOKEN


I do not like the aesthetic of the above much. I really liked the idea of 
simply writing SQL queries just as in psql.


So maybe just handling $$-quoting would be enough to handle reasonable 
use-cases without troubling pgbench internal working too much? That would 
be a simple local changes in the line reader.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Andres Freund

On 2015-07-02 14:54:19 -0700, Josh Berkus wrote:
> On 07/02/2015 12:44 PM, Andres Freund wrote:
> > On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
> >> So there's two parts to this:
> >>
> >> 1. I need to ensure that data is replicated to X places.
> >>
> >> 2. I need to *know* which places data was synchronously replicated to
> >> when the master goes down.
> >>
> >> My entire point is that (1) alone is useless unless you also have (2).
> > 
> > I think there's a good set of usecases where that's really not the case.
> 
> Please share!  My plea for usecases was sincere.  I can't think of any.

"I have important data. I want to survive both a local hardware failure
(it's faster to continue using the local standby) and I want to protect
myself against actual disaster striking the primary datacenter". Pretty
common.

> >> And do note that I'm talking about information on the replica, not on
> >> the master, since in any failure situation we don't have the old
> >> master around to check.
> > 
> > How would you, even theoretically, synchronize that knowledge to all the
> > replicas? Even when they're temporarily disconnected?
> 
> You can't, which is why what we need to know is when the replica thinks
> it was last synced from the replica side.  That is, a sync timestamp and
> lsn from the last time the replica ack'd a sync commit back to the
> master successfully.  Based on that information, I can make an informed
> decision, even if I'm down to one replica.

I think you're mashing together nearly unrelated topics.

Note that we already have the last replayed lsn, and we have the
timestamp of the last replayed transaction.

> > If you want automated failover you need a leader election amongst the
> > surviving nodes. The replay position is all they need to elect the node
> > that's furthest ahead, and that information exists today.
> 
> I can do that already.  If quorum synch commit doesn't help us minimize
> data loss any better than async replication or the current 1-redundant,
> why would we want it?  If it does help us minimize data loss, how?

But it does make us safer against data loss? If your app gets back the
commit you know that the data has made it both to the local replica and
one other datacenter. And you're now safe agains the loss of either the
master's hardware (most likely scenario) and safe against the loss of
the entire primary datacenter. That you need additional logic to know to
which other datacenter to fail over is just yet another piece (which you
*can* build today).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Beena Emerson

Sawada Masahiko wrote:
>
> I think that the #2 problem which is Josh pointed out seems to be solved;
> 1. I need to ensure that data is replicated to X places.
> 2. I need to *know* which places data was synchronously replicated
> to when the master goes down.
> And we can address #1 problem using quorum commit.
> 
> Thought?

I agree. The knowledge of which servers where in sync(#2) would not actually
help us determine the new master and quorum solves #1.





-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5856459.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement

2015-07-03 Thread Fabrízio de Royes Mello

On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas  wrote:
>
> On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
>  wrote:
> >
http://www.postgresql.org/message-id/ca+tgmozm+-0r7h0edpzzjbokvvq+gavkchmno4fypveccw-...@mail.gmail.com
>
> I like the idea of the feature a lot, but the proposal to which you
> refer here mentions some problems, which I'm curious how you think you
> might solve.  (I don't have any good ideas myself, beyond what I
> mentioned there.)
>

You're right, and we have another design (steps 1 and 2 was implemented
last year):


*** ALTER TABLE changes

1) ATController
1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c -
AlterTableGetLockLevel:3023)


2) Prepare to change relpersistence (src/backend/commands/tablecmds.c -
ATPrepCmd:3249-3270)
• check temp table (src/backend/commands/tablecmds.c -
ATPrepChangePersistence:11074)
• check foreign key constraints (src/backend/commands/tablecmds.c -
ATPrepChangePersistence:11102)


3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync
(MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if
exists)


4) Create a new fork called  "TRANSIENT INIT FORK":

• from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
  ∘ new forkName (src/common/relpath.c) called "_initl"
  ∘ insert XLog record to drop it if transaction abort

• from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
  ∘ new forkName (src/common/relpath.c) called "_initu"
  ∘ insert XLog record to drop it if transaction abort


5) Change the relpersistence in catalog (pg_class->relpersistence) (heap,
toast, indexes)


6) Remove/Create INIT_FORK

• from Unlogged to Logged
  ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the
pendingDeletes queue
• from Logged to Unlogged
  ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
  ∘ create the INIT_FORK using "heap_create_init_fork"
  ∘ insert XLog record to drop init fork if the transaction abort



*** CRASH RECOVERY changes

1) During crash recovery
(src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)

• if the transient fork "_initl" exists then
  ∘ drop the transient fork "_initl"
  ∘ if the init fork doesn't exist then create it
  ∘ reset relation
• if the transient fork "_initu" exists then
  ∘ drop the transient fork "_initl"
  ∘ if the init fork exists then drop it
  ∘ don't reset the relation


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Andres Freund

On 2015-07-03 13:50:02 +0300, Heikki Linnakangas wrote:
> As Tom pointed out, you need the full lexer to do this correctly. You can
> argue that something that handles the most common cases is enough, but
> realistically, by the time you've handled all the common cases correctly,
> you've just re-invented the lexer.

Yes.

> I think we should either bite the bullet and include the full SQL lexer in
> pgbench, or come up with some new syntax for marking the beginning and end
> of a statement.

I'm pretty clearly in favor of doing correct lexing. I think we should
generalize that and make it reusable. psql has it's own hacked up
version already, there seems little point in having variedly good copies
around.

> We could do something like bash here-documents or Postgres
> dollar-quoting, for example:
> 
> \set ...
> select 1234; -- A statement on a single line, no change here
> 
> -- Begin a multi-line statement
> \multi-line-statement END_TOKEN
> select *
>   from complicated;
> END_TOKEN

Not pretty imo. I could see including something esimpler, in addition to
the lexer, to allow sending multiple statements in one go.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-03 Thread Heikki Linnakangas


On 06/21/2015 11:12 AM, Fabien COELHO wrote:


Hello Josh,


Add backslash continuations to pgbench custom scripts.
[...]
IMHO this approach is the best compromise.


I don't personally agree.  I believe that it it worth breaking backwards
compatibility to support line breaks in pgbench statements, and that if
we're not going to do that, supporting \ continuations is of little value.

As someone who actively uses pgbench to write custom benchmarks, I need
to write queries which I can test.  \ continuation does NOT work on the
psql command line, so that's useless for testing my queries; I still
have to reformat and troubleshoot.  If we added \ continuation, I
wouldn't use it.

I think we should support line breaks, and require semicolons for
end-of-statement.  Backwards-compatability in custom pgbench scripts is
not critical; pgbench scripts are neither used in produciton, nor used
in automated systems much that I know of.

I'm not clear on why we'd need a full SQL lexer.


Attached is a "without lexer" version which does ;-terminated SQL commands
and \-continuated meta commands (may be useful for \shell and long \set
expressions).


As Tom pointed out, you need the full lexer to do this correctly. You 
can argue that something that handles the most common cases is enough, 
but realistically, by the time you've handled all the common cases 
correctly, you've just re-invented the lexer.


The home-grown lexer is missing e.g. dollar-quoting support, so this is 
not be parsed correctly:


do $$
begin
  ...
end;
$$;

That would be very nice to handle correctly, I've used DO-blocks in 
pgbench scripts many times, and it's a pain to have to write them in a 
single line.


Also worth noting that you can currently test so-called multi-statements 
with pgbench, by putting multiple statements on a single line. Your 
patch seems to still do that, but if we went with a full-blown SQL 
lexer, they would probably be split into two statements.


I think we should either bite the bullet and include the full SQL lexer 
in pgbench, or come up with some new syntax for marking the beginning 
and end of a statement. We could do something like bash here-documents 
or Postgres dollar-quoting, for example:


\set ...
select 1234; -- A statement on a single line, no change here

-- Begin a multi-line statement
\multi-line-statement END_TOKEN
select *
  from complicated;
END_TOKEN

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-03 Thread Heikki Linnakangas


On 05/27/2015 09:51 PM, Andrew Dunstan wrote:


On 05/27/2015 02:37 PM, Robert Haas wrote:

On Tue, May 26, 2015 at 2:50 AM, Shulgin, Oleksandr
 wrote:

Is it reasonable to add this patch to CommitFest now?

It's always reasonable to add a patch to the CommitFest if you would
like for it to be reviewed and avoid having it get forgotten about.
There seems to be some disagreement about whether we want this, but
don't let that stop you from adding it to the next CommitFest.


I'm not dead set against it either. When I have time I will take a
closer look.


Andrew, will you have the time to review this? Please add yourself as 
reviewer in the commitfest app if you do.


My 2 cents is that I agree with your initial reaction: This is a lot of 
infrastructure and generalizing things, for little benefit. Let's change 
the current code where we generate JSON to be consistent with 
whitespace, and call it a day.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-07-03 Thread Etsuro Fujita


On 2015/07/03 15:32, Kouhei Kaigai wrote:

On 2015/07/02 23:13, Kouhei Kaigai wrote:

To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5.  It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2];



Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.



Maybe I'm missing something, but my image about this approach is that if
base relations for a given joinrel are all foreign tables and belong to
the same foreign server, then by calling that API there, we consider the
remote join over all the foreign tables, and that if not, we give up to
consider the remote join.



Your understanding is correct, but missing a point. Once foreign tables
to be joined are informed as a bitmap (joinrel->relids), it is not obvious
for extensions which relations are joined with INNER JOIN, and which ones
are joined with OUTER JOIN.


Can't FDWs get the join information through the root, which I think we 
would pass to the API as the argument?



Also, I don't want to stick on the assumption that relations involved in
remote join are all managed by same foreign-server no longer.
The following two ideas introduce possible enhancement of remote join
feature that involved local relations; replicated table or transformed
to VALUES() clause.

http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=x...@mail.gmail.com
http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f8010f2...@bpxm15gp.gisp.nec.co.jp


Interesting!


I think add_paths_to_joinrel() is the best location for foreign-join,
not only custom-join. Relocation to standard_join_search() has larger
disadvantage than its advantage.


I agree with you that it's important to ensure the expandability, and my 
question is, is it possible that the API call from standard_join_search 
also realize those idea if FDWs can get the join information through the 
root or something like that?


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko

On Fri, Jul 3, 2015 at 6:23 PM, Fujii Masao  wrote:
> On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko  wrote:
>> Yeah, quorum commit is helpful for minimizing data loss in comparison
>> with today replication.
>> But in this your case, how can we know which server we should use as
>> the next master server, after local data center got down?
>> If we choose a wrong one, we would get the data loss.
>
> Check the progress of each server, e.g., by using
> pg_last_xlog_replay_location(),
> and choose the server which is ahead of as new master.
>

Thanks. So we can choice the next master server using by checking the
progress of each server, if hot standby is enabled.
And a such procedure is needed even today replication.

I think that the #2 problem which is Josh pointed out seems to be solved;
1. I need to ensure that data is replicated to X places.
2. I need to *know* which places data was synchronously replicated
to when the master goes down.
And we can address #1 problem using quorum commit.

Thought?

Regards,

--
Sawada Masahiko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-03 Thread Syed, Rahila

Hello,

>TBH, I think that designing this as a hook-based solution is adding a whole 
>lot of complexity for no value.  The hard parts of the problem are collecting 
>the raw data and making the results visible to users, and both of those 
>require involvement of the core code.  Where is the benefit from pushing some 
>trivial >intermediate arithmetic into an external module?
>If there's any at all, it's certainly not enough to justify problems such as 
>you mention here.

>So I'd just create a "pgstat_report_percent_done()" type of interface in 
>pgstat.c and then teach VACUUM to call it directly.

Thank you for suggestion. I agree that adding code in core will reduce code 
complexity with no additional overhead. 
Going by the consensus, I will update the patch with code to collect and store 
progress information from vacuum in pgstat.c and
UI using pg_stat_activity view.

Thank you,
Rahila Syed

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Fujii Masao

On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko  wrote:
> On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao  wrote:
>> On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus  wrote:
>>> On 07/02/2015 12:44 PM, Andres Freund wrote:
 On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
> So there's two parts to this:
>
> 1. I need to ensure that data is replicated to X places.
>
> 2. I need to *know* which places data was synchronously replicated to
> when the master goes down.
>
> My entire point is that (1) alone is useless unless you also have (2).

 I think there's a good set of usecases where that's really not the case.
>>>
>>> Please share!  My plea for usecases was sincere.  I can't think of any.
>>>
> And do note that I'm talking about information on the replica, not on
> the master, since in any failure situation we don't have the old
> master around to check.

 How would you, even theoretically, synchronize that knowledge to all the
 replicas? Even when they're temporarily disconnected?
>>>
>>> You can't, which is why what we need to know is when the replica thinks
>>> it was last synced from the replica side.  That is, a sync timestamp and
>>> lsn from the last time the replica ack'd a sync commit back to the
>>> master successfully.  Based on that information, I can make an informed
>>> decision, even if I'm down to one replica.
>>>
> ... because we would know definitively which servers were in sync.  So
> maybe that's the use case we should be supporting?

 If you want automated failover you need a leader election amongst the
 surviving nodes. The replay position is all they need to elect the node
 that's furthest ahead, and that information exists today.
>>>
>>> I can do that already.  If quorum synch commit doesn't help us minimize
>>> data loss any better than async replication or the current 1-redundant,
>>> why would we want it?  If it does help us minimize data loss, how?
>>
>> In your example of "2" : { "local_replica", "london_server", "nyc_server" },
>> if there is not something like quorum commit, only local_replica is synch
>> and the other two are async. In this case, if the local data center gets
>> destroyed, you need to promote either london_server or nyc_server. But
>> since they are async, they might not have the data which have been already
>> committed in the master. So data loss! Of course, as I said yesterday,
>> they might have all the data and no data loss happens at the promotion.
>> But the point is that there is no guarantee that no data loss happens.
>> OTOH, if we use quorum commit, we can guarantee that either london_server
>> or nyc_server has all the data which have been committed in the master.
>>
>> So I think that quorum commit is helpful for minimizing the data loss.
>>
>
> Yeah, quorum commit is helpful for minimizing data loss in comparison
> with today replication.
> But in this your case, how can we know which server we should use as
> the next master server, after local data center got down?
> If we choose a wrong one, we would get the data loss.

Check the progress of each server, e.g., by using
pg_last_xlog_replay_location(),
and choose the server which is ahead of as new master.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko

On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao  wrote:
> On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus  wrote:
>> On 07/02/2015 12:44 PM, Andres Freund wrote:
>>> On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
 So there's two parts to this:

 1. I need to ensure that data is replicated to X places.

 2. I need to *know* which places data was synchronously replicated to
 when the master goes down.

 My entire point is that (1) alone is useless unless you also have (2).
>>>
>>> I think there's a good set of usecases where that's really not the case.
>>
>> Please share!  My plea for usecases was sincere.  I can't think of any.
>>
 And do note that I'm talking about information on the replica, not on
 the master, since in any failure situation we don't have the old
 master around to check.
>>>
>>> How would you, even theoretically, synchronize that knowledge to all the
>>> replicas? Even when they're temporarily disconnected?
>>
>> You can't, which is why what we need to know is when the replica thinks
>> it was last synced from the replica side.  That is, a sync timestamp and
>> lsn from the last time the replica ack'd a sync commit back to the
>> master successfully.  Based on that information, I can make an informed
>> decision, even if I'm down to one replica.
>>
 ... because we would know definitively which servers were in sync.  So
 maybe that's the use case we should be supporting?
>>>
>>> If you want automated failover you need a leader election amongst the
>>> surviving nodes. The replay position is all they need to elect the node
>>> that's furthest ahead, and that information exists today.
>>
>> I can do that already.  If quorum synch commit doesn't help us minimize
>> data loss any better than async replication or the current 1-redundant,
>> why would we want it?  If it does help us minimize data loss, how?
>
> In your example of "2" : { "local_replica", "london_server", "nyc_server" },
> if there is not something like quorum commit, only local_replica is synch
> and the other two are async. In this case, if the local data center gets
> destroyed, you need to promote either london_server or nyc_server. But
> since they are async, they might not have the data which have been already
> committed in the master. So data loss! Of course, as I said yesterday,
> they might have all the data and no data loss happens at the promotion.
> But the point is that there is no guarantee that no data loss happens.
> OTOH, if we use quorum commit, we can guarantee that either london_server
> or nyc_server has all the data which have been committed in the master.
>
> So I think that quorum commit is helpful for minimizing the data loss.
>

Yeah, quorum commit is helpful for minimizing data loss in comparison
with today replication.
But in this your case, how can we know which server we should use as
the next master server, after local data center got down?
If we choose a wrong one, we would get the data loss.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pgbench - remove thread fork-emulation

2015-07-03 Thread Heikki Linnakangas


On 04/28/2015 02:18 AM, Fabien COELHO wrote:

This patch removes the pgbench thread fork-emulation code and simplifies
things where possible, especially around pthread_create and pthread_join.
The stats collection for the report is done directly instead of using an
intermediate structure.

As a result, if no thread implementation is available, pgbench is
restricted to work with only the main thread (ie "pgbench -j 1 ...").


== Rational ==

Pgbench currently provides a thread emulation through process forks. This
feature was developed way back when it may have been common that some
platforms were not supporting threads. This is now very rare (can you name
one such platform?).

However, the thread fork-emulation feature has drawbacks: Namely,
processes are not threads, the memory is not shared (sure), so it hinders
simple implementation for some features, or results in not providing these
features with fork-emulation, or having a different behavior under
fork-emulation:

Latency collection (-r) is not supported with fork emulation.

Progress (-P) is reported differently with fork emulation.

For a new feature under discussion, which consist in allowing one log
instead of per-thread logs, supporting fork-emulation requires a (heavy)
post-processing external sort phase whereas with actual threads all
threads can share and append to the same log file with limited overhead,
which is significantly simpler.


I agree with all that, it's time to let the fork-emulation mode to go. 
Committed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

2015-07-03 Thread David Rowley

On 3 July 2015 at 20:06, Andres Freund  wrote:

> On 2015-07-02 23:56:16 +0300, Heikki Linnakangas wrote:
> > On 04/10/2015 01:46 PM, chenhj wrote:
> > >Result(execute time):
> > >default(8K),  7.370s
> > >set SO_SNDBUF to 32K, 4.159s(the current implement)
> > >set SO_SNDBUF to 64K, 2.875s
> > >set SO_SNDBUF to 128K,  1.593s
> > >set SO_SNDBUF to 256K,  1.324s
> >
> > I was about to commit the attached, but when I tested this between my
> > Windows 8.1 virtual machine and Linux host, I was not able to see any
> > performance difference. It may be because the case is hobbled by other
> > inefficiencies, in the virtualization or somewhere else, but I wonder if
> > others can reproduce the speedup?
>
> Given that too small sockets incur significantly smaller latency bumps
> in a virtualized environment than in a real network, and hit another set
> of buffers inside the virtualization technology,, I'm not particularly
> surprised by that.
>
>
I'm wondering what the original test setup was. I'm assuming psql and
postgres both running on separate windows machines?

I've tested the patch just connecting to a database running on localhost
and I'm not getting much of a speedup. Perhaps 1%, if that's not noise. I
don't have enough hardware here to have client and server on separate
machines, at least not with a stable network that goes through copper.

Here's the results.

Unpatched:

-- 100MB
Measure-Command { .\psql.exe -d postgres -t -A -c "select
'1'::char(1000),generate_series(1,10)" > $null }

TotalMilliseconds : 1997.3908
TotalMilliseconds : 2111.4119
TotalMilliseconds : 2040.4415
TotalMilliseconds : 2167.5532
TotalMilliseconds : 2087.6444
TotalMilliseconds : 2117.3759
TotalMilliseconds : 2100.3229
TotalMilliseconds : 2132.3522
TotalMilliseconds : 2129.9487
TotalMilliseconds : 2101.675

Median: 2106.54345
Average: 2098.61165

-- 500MB
Measure-Command { .\psql.exe -d postgres -t -A -c "select
'1'::char(1000),generate_series(1,50)" > $null }

TotalMilliseconds : 10344.4251
TotalMilliseconds : 10248.3671
TotalMilliseconds : 10370.3856
TotalMilliseconds : 10412.507
TotalMilliseconds : 10469.173
TotalMilliseconds : 10248.8889
TotalMilliseconds : 10331.9476
TotalMilliseconds : 10320.7841
TotalMilliseconds : 10470.3022
TotalMilliseconds : 10333.4203

Median: 10338.9227
Average: 10355.02009


Patched:

-- 100MB
Measure-Command { .\psql.exe -d postgres -t -A -c "select
'1'::char(1000),generate_series(1,10)" > $null }

TotalMilliseconds : 2066.3701
TotalMilliseconds : 2106.6628
TotalMilliseconds : 2110.2459
TotalMilliseconds : 2047.8337
TotalMilliseconds : 2081.9166
TotalMilliseconds : 2034.7086
TotalMilliseconds : 2082.9072
TotalMilliseconds : 2146.6878
TotalMilliseconds : 2133.351
TotalMilliseconds : 2076.6862

Median: 2082.4119
Average: 2088.73699

-- 500MB
Measure-Command { .\psql.exe -d postgres -t -A -c "select
'1'::char(1000),generate_series(1,50)" > $null }

TotalMilliseconds : 10217.4794
TotalMilliseconds : 10244.8074
TotalMilliseconds : 10451.7265
TotalMilliseconds : 10162.9862
TotalMilliseconds : 10304.1866
TotalMilliseconds : 10374.7922
TotalMilliseconds : 10227.9632
TotalMilliseconds : 10145.5825
TotalMilliseconds : 10298.7048
TotalMilliseconds : 10170.3754

Median: 10236.3853
Average: 10259.86042


Comparison (Unpatched / Patched)
100MB 500MB
Median 101.16% 101.00%
Average 100.47% 100.93%

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-03 Thread Sawada Masahiko

On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs  wrote:
> On 2 July 2015 at 16:30, Sawada Masahiko  wrote:
>
>>
>> Also, the flags of each heap page header might be set PD_ALL_FROZEN,
>> as well as all-visible
>
>
> Is it possible to have VM bits set to frozen but not visible?
>
> The description makes those two states sound independent of each other.
>
> Are they? Or not? Do we test for an impossible state?
>

It's impossible to have VM bits set to frozen but not visible.
These bit are controlled independently. But eventually, when
all-frozen bit is set, all-visible is also set.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix pgbench --progress report under (very) low rate

2015-07-03 Thread Heikki Linnakangas


On 06/15/2015 09:12 PM, Fabien COELHO wrote:



v3 rebase (after pgbench moved to src/bin) and minor style tweaking.


v4 adds a fix to another progress timing issue:

Currently if pgbench/postgres get stuck somewhere, the report catches up
by repeating progresses several time in a row, which looks like that:

progress: 10.0 s ...
progress: 11.0 s ... stuck...
progress: 14.2 s catchup for 11.0 -> 14.2
progress: 14.2 s stupid data
progress: 14.2 s stupid data
progress: 15.0 s ...
progress: 16.0 s ...

The correction removes the "stupid data" lines which compute a reports on
a very short time, including absurd tps figures.

Yet again, shame on me in the first place for this behavior.


Thanks, applied. I chose to also backpatch this, although arguably this 
is a change in behaviour that would not be good to change in a minor 
version. However, progress reports are a very user-facing feature, it's 
not going to break anyone's scripts.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

2015-07-03 Thread Andres Freund

On 2015-07-02 23:56:16 +0300, Heikki Linnakangas wrote:
> On 04/10/2015 01:46 PM, chenhj wrote:
> >Result(execute time):
> >default(8K),  7.370s
> >set SO_SNDBUF to 32K, 4.159s(the current implement)
> >set SO_SNDBUF to 64K, 2.875s
> >set SO_SNDBUF to 128K,  1.593s
> >set SO_SNDBUF to 256K,  1.324s
> 
> I was about to commit the attached, but when I tested this between my
> Windows 8.1 virtual machine and Linux host, I was not able to see any
> performance difference. It may be because the case is hobbled by other
> inefficiencies, in the virtualization or somewhere else, but I wonder if
> others can reproduce the speedup?

Given that too small sockets incur significantly smaller latency bumps
in a virtualized environment than in a real network, and hit another set
of buffers inside the virtualization technology,, I'm not particularly
surprised by that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

2015-07-03 Thread Michael Paquier

On Fri, Jul 3, 2015 at 5:56 AM, Heikki Linnakangas wrote:
> On 04/10/2015 01:46 PM, chenhj wrote:
> I was about to commit the attached, but when I tested this between my
> Windows 8.1 virtual machine and Linux host, I was not able to see any
> performance difference. It may be because the case is hobbled by other
> inefficiencies, in the virtualization or somewhere else, but I wonder if
> others can reproduce the speedup?

I just gave this a shot in a 8.1 VM, but I could not reproduce a
speedup of more than a couple of percents (sorry no servers
available), still this seemed to be some noise.

The approach taken by this patch sounds safe enough to me that it
should be applied. I mean, we know that on Win8/2k12 the default
socket buffer size is 64k so reducing it to 32k would hurt
performance.

By the way, perhaps the link mentioned in this code should be updated
to this one, visibly microsoft has changed an bit the URL shape:
https://support.microsoft.com/en-us/kb/823764
A refresh would not hurt.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: remove nclients/nthreads constraint from pgbench

2015-07-03 Thread Fabien COELHO




Indeed. v3 attached fixed the case when nthreads > nclients.


Ok, committed. Thanks!


Thanks!

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Resource Owner reassign Locks

2015-07-03 Thread Andres Freund

On 2015-06-07 13:44:08 -0700, Jeff Janes wrote:
> I'd like to advocate for back-patching this to 9.0, 9.1, and 9.2.  It has
> run without problems for a while now, and it can be considered a bug that
> systems with a very large number of objects cannot be upgraded in a
> reasonable time.

In that case, how about working on a version for <= 9.2 (single one
should suffice)? This will likely include a bunch of wrapper functions
to avoid changing the API in the back branches.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

2015-07-03 Thread Fabien COELHO



Hello Andres,


In conclusion, and very egoistically, I would prefer if this patch could
wait for the checkpoint scheduling patch to be considered, as it would
basically invalidate the X00 hours of performance tests I ran:-)


These two patches target pretty independent mechanics. If you patch were
significantly influenced by this something would be wrong. It might
decrease the benefit of your patch a mite, but that's not really a
problem.


That is not the issue I see. On the principle of performance testing it 
really means that I should rerun the tests, even if I expect that the 
overall influence would be pretty small in this case. This is my egoistic 
argument. Well, probably I would just rerun a few cases to check that the 
impact is "mite", as you said, not all cases.


Another point is that I'm not sure that this patch is ripe, in particular 
I'm skeptical about the hardcoded 1.5 without further testing. Maybe it is 
good, maybe 1.3 or 1.6 is better, maybe it depends and it should just be a 
guc with some advises about how to set it. So I really think that it needs 
more performance figures than "it has a positive effect on one load".


Well, this is just my opinion, no need to care too much about it:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: remove nclients/nthreads constraint from pgbench

2015-07-03 Thread Heikki Linnakangas


On 07/03/2015 07:50 AM, Fabien COELHO wrote:

This doesn't behave correctly if you set -j to greater than -c, and also use
the rate limit option:


Indeed. v3 attached fixed the case when nthreads > nclients.


Ok, committed. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

2015-07-03 Thread Fabien COELHO



power 1,5 is almost certainly not right for all cases, but it is simple 
and better.


It is better "in some cases", as I've been told on my patch. If you have a 
separate disk for WAL writes the power formula may just degrade 
performance, or maybe not, or not too much, or it really should be a guc.


Well, I just think that it needs more performance testing with various 
loads and sizes, really. I'm not against this patch at all.



And easy to remove if something even better arrives.

I don't see the two patches being in conflict.


They are not "in conflict" from a git point of view, or even so it would 
be trivial to solve.


They are in conflict as the patch changes the checkpoint load 
significantly, which would mean that my X00 hours of performance testing 
on the checkpoint scheduler should more or less be run again. Ok, it is 
somehow egoistic, but I'm trying to avoid wasting people time.


Another point is that I'm not sure I understand the decision process: for 
some patch in some area extensive performance tests are required, and for 
other patches in the same area they would not be.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-03 Thread Beena Emerson

Hello,

This has been registered in the next 2015-09 CF since majority are in favor
of adding this multiple sync replication feature (with quorum/priority).

New patch will be submitted once we have reached a consensus on the design.

--
Beena Emerson

Re: [HACKERS] Determine operator from it's function

2015-07-03 Thread Heikki Linnakangas


On 07/03/2015 01:20 AM, Jim Nasby wrote:

Is there a way to determine the operator that resulted in calling the
operator function? I thought fcinfo->flinfo->fn_expr might get set to
the OpExpr, but seems it can be a FuncExpr even when called via an
operator...


Don't think there is. Why do you need to know?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] copy.c handling for RLS is insecure

2015-07-03 Thread Noah Misch

On Tue, Dec 02, 2014 at 11:32:27AM -0500, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost  wrote:
> > > Alright, I've done the change to use the RangeVar from CopyStmt, but
> > > also added a check wherein we verify that the relation's OID returned
> > > from the planned query is the same as the relation's OID that we did the
> > > RLS check on- if they're different, we throw an error.  Please let me
> > > know if there are any remaining concerns.

Here is the check in question (added in commit 143b39c):

plan = planner(query, 0, NULL);

/*
 * If we were passed in a relid, make sure we got the same one 
back
 * after planning out the query.  It's possible that it changed
 * between when we checked the policies on the table and 
decided to
 * use a query and now.
 */
if (queryRelId != InvalidOid)
{
Oid relid = 
linitial_oid(plan->relationOids);

/*
 * There should only be one relationOid in this case, 
since we
 * will only get here when we have changed the command 
for the
 * user from a "COPY relation TO" to "COPY (SELECT * 
FROM
 * relation) TO", to allow row level security policies 
to be
 * applied.
 */
Assert(list_length(plan->relationOids) == 1);

if (relid != queryRelId)
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("relation referenced by COPY statement 
has changed")));
}

> > That's clearly an improvement, but I'm not sure it's water-tight.
> > What if the name that originally referenced a table ended up
> > referencing a view?  Then you could get
> > list_length(plan->relationOids) != 1.
> 
> I'll test it out and see what happens.  Certainly a good question and
> if there's an issue there then I'll get it addressed.

Yes, it can be made to reference a view and trip the assertion.

> > (And, in that case, I also wonder if you could get
> > eval_const_expressions() to do evil things on your behalf while
> > planning.)
> 
> If it can be made to reference a view then there's an issue as the view
> might include a function call itself which is provided by the attacker..

Indeed.  As the parenthetical remark supposed, the check happens too late to
prevent a security breach.  planner() has run eval_const_expressions(),
executing code of the view owner's choosing.

> Clearly, if we found a relation originally then we need that same
> relation with the same OID after the conversion to a query.

That is necessary but not sufficient.  CREATE RULE can convert a table to a
view without changing the OID, thereby fooling the check.  Test procedure:

-- as superuser (or createrole)
create user blackhat;
create user alice;

-- as blackhat
begin;
create table exploit_rls_copy (c int);
alter table exploit_rls_copy enable row level security;
grant select on exploit_rls_copy to public;
commit;

-- as alice
-- first, set breakpoint on BeginCopy
copy exploit_rls_copy to stdout;

-- as blackhat
begin;
create or replace function leak() returns int immutable as $$begin
raise notice 'in leak()'; return 7; end$$ language plpgsql;
create rule "_RETURN" as on select to exploit_rls_copy do instead
select leak() as c from (values (0)) dummy;
commit;

-- Release breakpoint.  leak() function call happens.  After that, assertion
-- fires if enabled.  ERROR does not fire in any case.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

78 matches

Mail list logo