Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-22 Thread David Rowley
On 20 January 2015 at 17:10, Peter Geoghegan  wrote:

> On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier
>  wrote:
>
> > With your patch applied, the failure with MSVC disappeared, but there
> > is still a warning showing up:
> > (ClCompile target) ->
> >   src\backend\lib\hyperloglog.c(73): warning C4334: '<<' : result of
> > 32-bit shift implicitly converted to 64 bits (was 64-bit shift
> > intended?
>
> That seems harmless. I suggest an explicit cast to "Size" here.
>

This caught my eye too.

I agree about casting to Size.

Patch attached.

Regards

David Rowley


hyperloglog_bitshift_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-22 Thread Noah Misch
On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote:
> The following case has just been brought to my attention (look at the
> differing number of backslashes):
> 
>andrew=# select jsonb '"\\u"';
>   jsonb
>--
>  "\u"
>(1 row)
> 
>andrew=# select jsonb '"\u"';
>   jsonb
>--
>  "\u"
>(1 row)

A mess indeed.  The input is unambiguous, but the jsonb representation can't
distinguish "\u" from "\\u".  Some operations on the original json
type have similar problems, since they use an in-memory binary representation
with the same shortcoming:

[local] test=# select json_array_element_text($$["\u"]$$, 0) =
test-# json_array_element_text($$["\\u"]$$, 0);
 ?column? 
--
 t
(1 row)

> Things get worse, though. On output, '\uabcd' for any four hex digits is
> recognized as a unicode escape, and thus the backslash is not escaped, so
> that we get:
> 
>andrew=# select jsonb '"\\uabcd"';
>   jsonb
>--
>  "\uabcd"
>(1 row)
> 
> 
> We could probably fix this fairly easily for non- U+ cases by having
> jsonb_to_cstring use a different escape_json routine.

Sounds reasonable.  For 9.4.1, before more people upgrade?

> But it's a mess, sadly, and I'm not sure what a good fix for the U+ case
> would look like.

Agreed.  When a string unescape algorithm removes some kinds of backslash
escapes and not others, it's nigh inevitable that two semantically-distinct
inputs can yield the same output.  json_lex_string() fell into that trap by
making an exception for \u.  To fix this, the result needs to be fully
unescaped (\u converted to the NUL byte) or retain all backslash escapes.
(Changing that either way is no fun now that an on-disk format is at stake.)

> Maybe we should detect such input and emit a warning of
> ambiguity? It's likely to be rare enough, but clearly not as rare as we'd
> like, since this is a report from the field.

Perhaps.  Something like "WARNING:  jsonb cannot represent \\u; reading as
\u"?  Alas, but I do prefer that to silent data corruption.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread Amit Kapila
On Fri, Jan 23, 2015 at 3:49 AM, Tom Lane  wrote:
>
> I know what the proposal is.  What I am questioning is the use-case that
> justifies having us build and support all this extra mechanism.  How often
> does anyone need to know what the "next down" variable value would be?

I would say not that often as I think nobody changes the settings in
configuration files every now and then, but it could still be meaningful
in situations as described upthread by Sawada.

I think similar to this, pg_reload_conf() or many such things will be used
not that frequently, it seems to me that here important point to consider
is that whether such a view could serve any purpose for real users?

If we see multiple value for same config parameter was possible even
previous to Alter System and there doesn't seem to be much need/demand
for such a view and the reason could be that user has no way of changing
the setting at file level with command, but now as we provide a way it
could be useful in certain cases when user want to retain previous
values (value in postgresql.conf).

I understand that usage of such a view is not that high, but it could be
meaningful in some cases.

> And if they do need to know whether a variable is set in postgresql.conf,
> how often wouldn't they just resort to "grep" instead?
>

Do always user/dba (having superuser privilege) access to postgresql.conf
file?   It seems to me even if they have access, getting the information via
SQL would be more appealing.

> (Among other
> points, grep would succeed at noticing commented-out entries, which this
> mechanism would not.)

> GUC has existed in more or less its current state for about 15 years,
> and I don't recall a lot of complaints that would be solved by this.
> Furthermore, given that ALTER SYSTEM was sold to us as more or less
> obsoleting manual editing of postgresql.conf, I rather doubt that it's
> changed the basis of discussion all that much.
>

By providing such a view I don't mean to recommend the user to change
the settings by editing postgresql.conf manually and then use Alter System
for other cases, rather it could be used for some cases like for default
values
or if in rare cases user has changed the parameters manually.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread David Steele
On 1/22/15 10:05 PM, Stephen Frost wrote:
>> In addition, there is a possible race condition in rsync where a file
>> that is modified in the same second after rsync starts to copy will not
>> be picked up in a subsequent rsync unless --checksum is used.  This is
>> fairly easy to prove and is shown here:
>>
>> https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667
> Right, though that isn't really an issue in this specific case- we're
> talking about post-pg_upgrade but before the upgraded cluster has
> actually been started, so nothing should be modifying these files.

Indeed.  This was really directed more at what Bruce said:

I am thinking the fix for standys would be similar to what we recommand
for upgrades with link mode using a rsync-created copy, e.g. use rsync
while the master is running to create a copy of the standby, then shut
down the master and run rsync again.  However, at that point, you might
as well just take a base backup and be done with it.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Stephen Frost
* David Steele (da...@pgmasters.net) wrote:
> On 1/22/15 8:54 PM, Stephen Frost wrote:
> > The problem, as mentioned elsewhere, is that you have to checksum all
> > the files because the timestamps will differ.  You can actually get
> > around that with rsync if you really want though- tell it to only look
> > at file sizes instead of size+time by passing in --size-only.  I have to
> > admit that for *my* taste, at least, that's getting pretty darn
> > optimistic.  It *should* work, but I'd definitely recommend testing it
> > about a billion times in various ways before trusting it or recommending
> > it to anyone else.  I expect you'd need --inplace also, for cases where
> > the sizes are different and rsync wants to modify the file on the
> > destination to match the one on the source.
>
> I would definitely not feel comfortable using --size-only.

Yeah, it also occurs to me that if any of the catalog tables end up
being the same size between the master and the replica that they
wouldn't get copied and that'd make for one very interesting result, and
not a good one.

> In addition, there is a possible race condition in rsync where a file
> that is modified in the same second after rsync starts to copy will not
> be picked up in a subsequent rsync unless --checksum is used.  This is
> fairly easy to prove and is shown here:
> 
> https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.

> That means the rsync hot, then rsync cold method of updating a standby
> is not *guaranteed* to work unless checksums are used.  This may seem
> like an edge case, but for a small, active database it looks like it
> could be a real issue.

That's certainly a good point though.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel Seq Scan

2015-01-22 Thread Josh Berkus
On 01/22/2015 05:53 AM, Robert Haas wrote:
> Also, I think testing with 2 workers is probably not enough.  I think
> we should test with 8 or even 16.

FWIW, based on my experience there will also be demand to use parallel
query using 4 workers, particularly on AWS.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread David Steele
On 1/22/15 8:54 PM, Stephen Frost wrote:
> The problem, as mentioned elsewhere, is that you have to checksum all
> the files because the timestamps will differ.  You can actually get
> around that with rsync if you really want though- tell it to only look
> at file sizes instead of size+time by passing in --size-only.  I have to
> admit that for *my* taste, at least, that's getting pretty darn
> optimistic.  It *should* work, but I'd definitely recommend testing it
> about a billion times in various ways before trusting it or recommending
> it to anyone else.  I expect you'd need --inplace also, for cases where
> the sizes are different and rsync wants to modify the file on the
> destination to match the one on the source.
>
I would definitely not feel comfortable using --size-only.

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used.  This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

That means the rsync hot, then rsync cold method of updating a standby
is not *guaranteed* to work unless checksums are used.  This may seem
like an edge case, but for a small, active database it looks like it
could be a real issue.

--
- David Steele
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-22 Thread Petr Jelinek

On 23/01/15 00:40, Andreas Karlsson wrote:


- Renamed some things from int12 to int128, there are still some places
with int16 which I am not sure what to do with.



I'd vote for renaming them to int128 too, there is enough C functions 
that user int16 for 16bit integer that this is going to be confusing 
otherwise.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> > Or do you - as the text edited in your patch, but not the quote above -
> > mean to run pg_upgrade just on the primary and then rsync?
> 
> No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be.  If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links.  rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ.  You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.  I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic.  It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else.  I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] tracking commit timestamps

2015-01-22 Thread Petr Jelinek

On 05/01/15 17:50, Petr Jelinek wrote:

On 05/01/15 16:17, Petr Jelinek wrote:

On 05/01/15 07:28, Fujii Masao wrote:

On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao 
wrote:

On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
 wrote:

Pushed with some extra cosmetic tweaks.


I got the following assertion failure when I executed
pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion("!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0)))", File: "commit_ts.c",
Line: 315)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the
standby
#5. execute "select pg_xact_commit_timestamp('1000'::xid)" in the
standby

BTW, at the step #4, I got the following log messages. This might be
a hint for
this problem.

LOG:  file "pg_commit_ts/" doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384


This problem still happens in the master.

Regards,



Attached patch fixes it, I am not sure how happy I am with the way I did
it though.



Added more comments and made the error message bit clearer.



Fujii, Alvaro, did one of you had chance to look at this fix?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Bruce Momjian
On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:
> > It is possible to upgrade on pg_upgrade on streaming standby servers by
> > making them master servers, running pg_upgrade on them, then shuting
> > down all servers and using rsync to make the standby servers match the
> > real master.
> 
> Isn't that a pretty crazy procedure? If you need to shut down all

Yes, it is crazy, but so is pg_upgrade.  ;-)

> servers anyway, you can just rsync after having run pg_upgrade on the
> master, no? Rsync won't really transfer less just because you ran a
> similar thing on the standby.

Uh, yeah, it will, because the files can get renamed as part of the
upgrade (relfilenode now matches oid), so by running the upgrade, file
names are going to match up.  I didn't think rsync could handle renaming
of files without recopying the entire file.

> Even if this would allow to avoid some traffic for fsync: There's
> absolutely no guarantee that the standby's pg_upgrade results in a all
> that similar data directory. Far from everything in postgres is
> deterministic - it's easy to hit timing differences that result in
> noticeable differences.

Right, some non-deterministic things would change, but I thought
runnning upgrade on the standby would help.  However, now that I think
of it, we don't preserver the database directory name and assume
dbs will will get the same oid and therefore same database directory
name on both, but if you use -j, things are going to happen in random
order.  Oops.

Oh well.

> Or do you - as the text edited in your patch, but not the quote above -
> mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I am thinking the fix for standys would be similar to what we recommand
for upgrades with link mode using a rsync-created copy, e.g. use rsync
while the master is running to create a copy of the standby, then shut
down the master and run rsync again.  However, at that point, you might
as well just take a base backup and be done with it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor issues with code comments related to abbreviated keys

2015-01-22 Thread Peter Geoghegan
Attached patch fixes minor issues in code comments that relate to
abbreviated keys.

-- 
Peter Geoghegan
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index c79b641..cfa1921 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -2088,7 +2088,7 @@ bttext_abbrev_convert(Datum original, SortSupport ssup)
 	 *
 	 * First, Hash key proper, or a significant fraction of it.  Mix in length
 	 * in order to compensate for cases where differences are past
-	 * CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing.
+	 * PG_CACHE_LINE_SIZE bytes, so as to limit the overhead of hashing.
 	 */
 	hash = hash_any((unsigned char *) authoritative_data,
 	Min(len, PG_CACHE_LINE_SIZE));
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 62fedfa..44c596f 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -165,8 +165,8 @@ typedef struct SortSupportData
 	 * may set it to NULL to indicate abbreviation should not be used (which is
 	 * something sortsupport routines need not concern themselves with).
 	 * However, sortsupport routines must not set it when it is immediately
-	 * established that abbreviation should not proceed (for abbreviation
-	 * calls, or platform-specific impediments to using abbreviation).
+	 * established that abbreviation should not proceed (e.g., for !abbreviate
+	 * calls, or due to platform-specific impediments to using abbreviation).
 	 */
 	Datum			(*abbrev_converter) (Datum original, SortSupport ssup);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Bruce Momjian
On Thu, Jan 22, 2015 at 06:04:24PM -0600, Jim Nasby wrote:
> On 1/22/15 5:43 PM, Bruce Momjian wrote:
> >This brings up the other problem that the mod times of the files
> >are likely to be different between master and slave --- should I
> >recommend to only use rsync --checksum?
>
> I don't think so. AIUI if the timestamps are different the very next
> thing it does is run the checksum (which is expensive). So --checksum
> is just going to hurt.

Oh, OK, good.

> >I am going to now conclude that rsync is never going to work for
> >this, unless we have pg_upgrade preserve relfilenodes as well.
> >However, I am not even sure that is possible due to conflicts with
> >system table relfilenodes created in the new cluster.
>
> We've previously talked about required steps before an upgrade;
> perhaps we need a way to force an OID/relfilenode change on the old
> server prior to upgrade.

Actually, the idea I had forgotten is that we are not rsyncing between
old and new clusters here, but between two servers who are both new
after running pg_upgrade.  Their relfilenodes match their oid, and the
oids match, so we should be fine.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Andres Freund
On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:
> It is possible to upgrade on pg_upgrade on streaming standby servers by
> making them master servers, running pg_upgrade on them, then shuting
> down all servers and using rsync to make the standby servers match the
> real master.

Isn't that a pretty crazy procedure? If you need to shut down all
servers anyway, you can just rsync after having run pg_upgrade on the
master, no? Rsync won't really transfer less just because you ran a
similar thing on the standby.

Even if this would allow to avoid some traffic for fsync: There's
absolutely no guarantee that the standby's pg_upgrade results in a all
that similar data directory. Far from everything in postgres is
deterministic - it's easy to hit timing differences that result in
noticeable differences.

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Jim Nasby

On 1/22/15 5:43 PM, Bruce Momjian wrote:

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?


I don't think so. AIUI if the timestamps are different the very next thing it 
does is run the checksum (which is expensive). So --checksum is just going to 
hurt.


I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well.  However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.


We've previously talked about required steps before an upgrade; perhaps we need 
a way to force an OID/relfilenode change on the old server prior to upgrade.

Or, thinking outside the box here... could this type of stuff be done in 
postgres itself so we could generate wal that's shipped to standby's? That 
would allow doing this as part of the formal upgrade process without the need 
for preliminary steps.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-branch update releases scheduled

2015-01-22 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 3:51 PM, Tom Lane  wrote:
> So let's get on with it.  We're not holding up releases for patches
> that may or may not materialize.


I don't disagree. Just pointing out that it's a consideration.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-branch update releases scheduled

2015-01-22 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, Jan 22, 2015 at 03:28:39PM -0800, Peter Geoghegan wrote:
>> On Thu, Jan 22, 2015 at 3:23 PM, Tom Lane  wrote:
>>> We're really rather overdue for updates of the pre-9.4 back branches,
>>> and 9.4 itself has probably baked for long enough to justify a 9.4.1.
>>> Accordingly, the core committee has agreed to wrap releases the week
>>> after next; that's wrap Mon Feb 2 for public announcement Thu Feb 5.

>> I hope that gives us enough time to fix the issue in question in the
>> 9.4.0 problem report thread "hung backends stuck in spinlock heavy
>> endless loop".

> The JSON/JSONB escape thing would be nice to fix too, before the
> existing behavior becomes too baked into applications.

So let's get on with it.  We're not holding up releases for patches
that may or may not materialize.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-branch update releases scheduled

2015-01-22 Thread Bruce Momjian
On Thu, Jan 22, 2015 at 03:28:39PM -0800, Peter Geoghegan wrote:
> On Thu, Jan 22, 2015 at 3:23 PM, Tom Lane  wrote:
> > We're really rather overdue for updates of the pre-9.4 back branches,
> > and 9.4 itself has probably baked for long enough to justify a 9.4.1.
> > Accordingly, the core committee has agreed to wrap releases the week
> > after next; that's wrap Mon Feb 2 for public announcement Thu Feb 5.
> 
> I hope that gives us enough time to fix the issue in question in the
> 9.4.0 problem report thread "hung backends stuck in spinlock heavy
> endless loop".

The JSON/JSONB escape thing would be nice to fix too, before the
existing behavior becomes too baked into applications.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Bruce Momjian
On Thu, Jan 22, 2015 at 10:48:37PM +0200, Heikki Linnakangas wrote:
> >>>  * If we need to protect hint bit updates from torn writes, 
> >>> WAL-log a
> >>>  * full page image of the page. This full page image is only 
> >>> necessary
> >>>  * if the hint bit update is the first change to the page since 
> >>> the
> >>>  * last checkpoint.
> >>>  *
> >>>  * We don't check full_page_writes here because that logic is 
> >>> included
> >>>  * when we call XLogInsert() since the value changes dynamically.
> >>>  */
> >>> if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
> >>> {
> >>> /*
> >>>  * If we're in recovery we cannot dirty a page because of a 
> >>> hint.
> >>>  * We can set the hint, just not dirty the page as a result 
> >>> so the
> >>>  * hint is lost when we evict the page or shutdown.
> >>>  *
> >>>  * See src/backend/storage/page/README for longer discussion.
> >>>  */
> >>> if (RecoveryInProgress())
> >>> return;
> >
> >What if XLogHintBitIsNeeded is false? That would be the case if we're not 
> >wall logging hints *on the standby*.
> 
> Then the page will be updated without writing a WAL record. Just
> like in the master, if wal_log_hints is off. wal_log_hints works the
> same on the master or the standby.

[ see below for why this entire idea might not work ]

OK, I was confused by your previous "no".  It means we do update hint
bits on read-only slave queries --- we just don't WAL log them.  In
fact, we can't update hint bits on the standby if we are wal logging
them  is that right?

My text was saying:

these differences can be reduced by using a fresh standby and by
enabling . (While
wal_log_hints transfers hint bits from the primary to
standbys, additional hint bits are still set on the standbys by
read-only queries.)

meaning if you don't run any read-only queries on the standby, the files
will be same on master/standby because the hint bits will be the same,
and rsync will not copy the files.

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?

I would really like to get a way to pg_upgrade the standbys but we have
never really be able to get a solution.  Ideally we would update just
the system table files, and if the order of pg_upgrade file renames is
exactly the same, everything else would match, but I can't imagine what
such an API would look like.  Have pg_upgrade spit out a list of files
to be copied?

In fact, these are the relfilenodes pg_upgrade preserves:

 *  While pg_class.oid and pg_class.relfilenode are initially the same
 *  in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM
 *  FULL.  In the new cluster, pg_class.oid and pg_class.relfilenode will
 *  be the same and will match the old pg_class.oid value.  Because of
 *  this, old/new pg_class.relfilenode values will not match if CLUSTER,
 *  REINDEX, or VACUUM FULL have been performed in the old cluster.
 *
 *  We control all assignments of pg_type.oid because these oids are stored
 *  in user composite type values.
 *
 *  We control all assignments of pg_enum.oid because these oids are stored
 *  in user tables as enum values.
 *
 *  We control all assignments of pg_authid.oid because these oids are stored
 *  in pg_largeobject_metadata.

so if the table/index relfilenodes no longer match the oid on the old
cluster, due to CLUSTER, REINDEX, or VACUUM FULL, the file name will not
match on the new cluster and rsync will copy the entire file.  In fact,
rsync is going to copy it to the wrong file name, and delete the right
file.  

I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well.  However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-22 Thread Andreas Karlsson

A new version of the patch is attached, which changes the following:

- Changed from using __int128_t to __int128.
- Actually compiles and runs code in configure to see that int128 works.
- No longer tests for __uint128_t.
- Updated pg_config.h.win32
- Renamed some things from int12 to int128, there are still some places 
with int16 which I am not sure what to do with.


A remaining issue is what estimate we should pick for the size of the 
aggregate state. This patch uses the size of the int128 version for the 
estimate, but I have no strong opinions on which we should use.


--
Andreas Karlsson

diff --git a/configure b/configure
index 8490eb7..d9b0e8f 100755
--- a/configure
+++ b/configure
@@ -13743,6 +13743,49 @@ _ACEOF
 fi
 
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking __int128" >&5
+$as_echo_n "checking __int128... " >&6; }
+if test "$cross_compiling" = yes; then :
+  have___int128=no
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+__int128 a = 2001;
+__int128 b = 4005;
+
+int does_int128_work()
+{
+  __int128 c,d;
+
+  c = a * b;
+  d = (c + b) / b;
+  if (d != a+1)
+return 0;
+  return 1;
+}
+main() {
+  exit(! does_int128_work());
+}
+
+_ACEOF
+if ac_fn_c_try_run "$LINENO"; then :
+  have___int128=yes
+else
+  have___int128=no
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
+fi
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $have___int128" >&5
+$as_echo "$have___int128" >&6; }
+if test x"$have___int128" = xyes ; then
+
+$as_echo "#define HAVE___INT128 1" >>confdefs.h
+
+fi
+
 # We also check for sig_atomic_t, which *should* be defined per ANSI
 # C, but is missing on some old platforms.
 ac_fn_c_check_type "$LINENO" "sig_atomic_t" "ac_cv_type_sig_atomic_t" "#include 
diff --git a/configure.in b/configure.in
index b4bd09e..189e2f0 100644
--- a/configure.in
+++ b/configure.in
@@ -1760,6 +1760,30 @@ AC_DEFINE_UNQUOTED(MAXIMUM_ALIGNOF, $MAX_ALIGNOF, [Define as the maximum alignme
 AC_CHECK_TYPES([int8, uint8, int64, uint64], [], [],
 [#include ])
 
+AC_MSG_CHECKING([__int128])
+AC_TRY_RUN([
+__int128 a = 2001;
+__int128 b = 4005;
+
+int does_int128_work()
+{
+  __int128 c,d;
+
+  c = a * b;
+  d = (c + b) / b;
+  if (d != a+1)
+return 0;
+  return 1;
+}
+main() {
+  exit(! does_int128_work());
+}
+  ], [have___int128=yes], [have___int128=no], [have___int128=no])
+AC_MSG_RESULT([$have___int128])
+if test x"$have___int128" = xyes ; then
+  AC_DEFINE(HAVE___INT128, 1, [Define to 1 if the system has the type `__int128'.])
+fi
+
 # We also check for sig_atomic_t, which *should* be defined per ANSI
 # C, but is missing on some old platforms.
 AC_CHECK_TYPES(sig_atomic_t, [], [], [#include ])
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index a8609e8..8344343 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -402,6 +402,9 @@ static void apply_typmod(NumericVar *var, int32 typmod);
 static int32 numericvar_to_int4(NumericVar *var);
 static bool numericvar_to_int8(NumericVar *var, int64 *result);
 static void int8_to_numericvar(int64 val, NumericVar *var);
+#ifdef HAVE_INT128
+static void int16_to_numericvar(int128 val, NumericVar *var);
+#endif
 static double numeric_to_double_no_overflow(Numeric num);
 static double numericvar_to_double_no_overflow(NumericVar *var);
 
@@ -2659,6 +2662,9 @@ numeric_float4(PG_FUNCTION_ARGS)
  * Actually, it's a pointer to a NumericAggState allocated in the aggregate
  * context.  The digit buffers for the NumericVars will be there too.
  *
+ * On platforms which support 128-bit integers some aggergates instead use a
+ * 128-bit integer based transition datatype to speed up calculations.
+ *
  * --
  */
 
@@ -2917,6 +2923,65 @@ numeric_accum_inv(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(state);
 }
 
+#ifdef HAVE_INT128
+typedef struct Int128AggState
+{
+	bool	calcSumX2;	/* if true, calculate sumX2 */
+	int64	N;			/* count of processed numbers */
+	int128	sumX;		/* sum of processed numbers */
+	int128	sumX2;		/* sum of squares of processed numbers */
+} Int128AggState;
+
+/*
+ * Prepare state data for a 128-bit aggregate function that needs to compute
+ * sum, count and optionally sum of squares of the input.
+ */
+static Int128AggState *
+makeInt128AggState(FunctionCallInfo fcinfo, bool calcSumX2)
+{
+	Int128AggState *state;
+	MemoryContext agg_context;
+	MemoryContext old_context;
+
+	if (!AggCheckCallContext(fcinfo, &agg_context))
+		elog(ERROR, "aggregate function called in non-aggregate context");
+
+	old_context = MemoryContextSwitchTo(agg_context);
+
+	state = (Int128AggState *) palloc0(sizeof(Int128AggState));
+	state->calcSumX2 = calcSumX2;
+
+	MemoryContextSwitchTo(old_context);
+
+	return state;
+}
+
+/*
+ * Accumulate a new input value for 128-bit aggregate functions.
+ 

Re: [HACKERS] Back-branch update releases scheduled

2015-01-22 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 3:23 PM, Tom Lane  wrote:
> We're really rather overdue for updates of the pre-9.4 back branches,
> and 9.4 itself has probably baked for long enough to justify a 9.4.1.
> Accordingly, the core committee has agreed to wrap releases the week
> after next; that's wrap Mon Feb 2 for public announcement Thu Feb 5.

I hope that gives us enough time to fix the issue in question in the
9.4.0 problem report thread "hung backends stuck in spinlock heavy
endless loop".

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-22 Thread Andreas Karlsson

On 01/11/2015 02:36 AM, Andres Freund wrote:

a) Afaics only __int128/unsigned __int128 is defined. See
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html

b) I'm doubtful that AC_CHECK_TYPES is a sufficiently good test on all
platforms. IIRC gcc will generate calls to functions to do the actual
arithmetic, and I don't think they're guranteed to be available on
platforms. That how it .e.g. behaves for atomic operations. So my
guess is that you'll need to check that a program that does actual
arithmetic still links.

c) Personally I don't see the point of testing __uint128_t. That's just
a pointless test that makes configure run for longer.


The next version will fix all these issues.


Hm. It might be nicer to move the if (!state) elog() outside the ifdef,
and add curly parens inside the ifdef.


Since that change only really works for the *_inv functions I decided to 
leave them all as-is for consistency.



--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -678,6 +678,12 @@
  /* Define to 1 if your compiler understands __VA_ARGS__ in macros. */
  #undef HAVE__VA_ARGS


z> +/* Define to 1 if the system has the type `__int128_t'. */

+#undef HAVE___INT128_T
+
+/* Define to 1 if the system has the type `__uint128_t'. */
+#undef HAVE___UINT128_T


pg_config.h.win32 should be updated as well.


Will be fixed in the next version.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Back-branch update releases scheduled

2015-01-22 Thread Tom Lane
We're really rather overdue for updates of the pre-9.4 back branches,
and 9.4 itself has probably baked for long enough to justify a 9.4.1.
Accordingly, the core committee has agreed to wrap releases the week
after next; that's wrap Mon Feb 2 for public announcement Thu Feb 5.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 12:34 PM, Robert Haas  wrote:
> This isn't really Windows-specific.  The root of the problem is that
> when LC_COLLATE=C you were trying to use strxfrm() for the abbreviated
> key even though memcmp() is the authoritative comparator in that case.
> Exactly which platforms happened to blow up as a result of that is
> kind of beside the point.

I don't see how that could be a problem. Even if the strxfrm() blob
wasn't identical to the original string (I think it should always be,
accept maybe on MacOSX), it's still reasonable to assume that there is
equivalent behavior across the C locale and locales with collations
like en_US.UTF-8. It wasn't as if I was using strxfrm() within
bttextfastcmp_c() -- just within bttext_abbrev_convert(), to form an
abbreviated key for text that uses the C locale.

The C locale is just another locale - AFAICT, the only reason we have
treated it differently in PostgreSQL is because that tended to avoid
needing to copy and NUL-terminate, which strcoll() requires. It might
be that that doesn't work out for some reason (but not because
strxfrm() will not have behavior equivalent to memcpy() with the C
locale -- I was *not* relying on that).

That having been said, it's clearer to continue to handle each case (C
locale vs other locales) separately within the new
bttext_abbrev_convert() function, just to be consistent, but also to
avoid NUL-terminating the text strings to pass to strxfrm(), which as
you point out is an avoidable cost. So, I'm not asking you to restore
the terser uniform use of strxfrm() we had before, but, for what it's
worth, that was not the real issue. The real issue was that strxfrm()
spuriously used the wrong locale, as you said. Provided strxfrm() had
the correct locale (the C locale), then AFAICT it ought to have been
fine, regardless of whether or not it then behave exactly equivalently
to memcpy().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread Josh Berkus
On 01/22/2015 02:09 PM, David Johnston wrote:
> ​The proposal provides for SQL access to all possible sources of
> variable value setting and, ideally, a means of ordering them in
> priority order, so that a search for TimeZone would return two records,
> one for postgresql.auto.conf and one for postgresql.conf - which are
> numbered 1 and 2 respectively - so that in looking at that result if the
> postgresql.auto.conf entry were to be removed the user would know that
> what the value is in postgresql.conf that would become active. 
> Furthermore, if postgresql.conf has a setting AND there is a mapping in
> an #included file that information would be accessible via SQL as well.

Wow.  Um, I can't imagine any use for that which would justify the
overhead.  And I'm practically the "settings geek".

Note that a single file can have multiple copies of the same GUC, plus
there's GUCs set interactively, as well as in the user and database
properties.  So you're looking at a lot of different "versions".

I think you're in a position of needing to interest someone else in this
issue enough to produce a patch to argue about.  I'm not seeing a lot of
interest in it here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread David Johnston
On Thu, Jan 22, 2015 at 3:19 PM, Tom Lane  wrote:

> David Johnston  writes:
> > On Thu, Jan 22, 2015 at 3:04 PM, Tom Lane  wrote:
> >> Is that a requirement, and if so why?  Because this proposal doesn't
> >> guarantee any such knowledge AFAICS.
>
> > ​The proposal provides for SQL access to all possible sources of variable
> > value setting and, ideally, a means of ordering them in priority order,
> so
> > that a search for TimeZone would return two records, one for
> > postgresql.auto.conf and one for postgresql.conf - which are numbered 1
> and
> > 2 respectively - so that in looking at that result if the
> > postgresql.auto.conf entry were to be removed the user would know that
> what
> > the value is in postgresql.conf that would become active.  Furthermore,
> if
> > postgresql.conf has a setting AND there is a mapping in an #included file
> > that information would be accessible via SQL as well.
>
> I know what the proposal is.  What I am questioning is the use-case that
> justifies having us build and support all this extra mechanism.  How often
> does anyone need to know what the "next down" variable value would be?
> And if they do need to know whether a variable is set in postgresql.conf,
> how often wouldn't they just resort to "grep" instead?  (Among other
> points, grep would succeed at noticing commented-out entries, which this
> mechanism would not.)
>
> GUC has existed in more or less its current state for about 15 years,
> and I don't recall a lot of complaints that would be solved by this.
> Furthermore, given that ALTER SYSTEM was sold to us as more or less
> obsoleting manual editing of postgresql.conf, I rather doubt that it's
> changed the basis of discussion all that much.
>
>
​i doubt we'd actually see many complaints since, like you say, people are
likely to just use a different technique.  The only thing PostgreSQL itself
needs to provide is a master inventory of seen/known settings; the user
interface can be left up to other layers (psql, pgadmin, extensions,
etc...).  It falls into making the system more user friendly.  But maybe
the answer for those users is that if you setup is so complex this would be
helpful you probably need to rethink your setup.  Then again, if I only
interact with the via SQL at least can issue RESET ​and know the end result
- after a config reload - without having to log into the server and grep
the config file - or know what the system defaults are for settings that do
not appear explicitly in postgresql.conf; all without having to refer to
documentation as well.

I'm doubtful it is going to interest any of the core hackers to put this in
place but it at least warrants a couple of passes of brain-storming which
can then be memorialized on the Wiki-ToDo.

David J.


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread Tom Lane
David Johnston  writes:
> On Thu, Jan 22, 2015 at 3:04 PM, Tom Lane  wrote:
>> Is that a requirement, and if so why?  Because this proposal doesn't
>> guarantee any such knowledge AFAICS.

> ​The proposal provides for SQL access to all possible sources of variable
> value setting and, ideally, a means of ordering them in priority order, so
> that a search for TimeZone would return two records, one for
> postgresql.auto.conf and one for postgresql.conf - which are numbered 1 and
> 2 respectively - so that in looking at that result if the
> postgresql.auto.conf entry were to be removed the user would know that what
> the value is in postgresql.conf that would become active.  Furthermore, if
> postgresql.conf has a setting AND there is a mapping in an #included file
> that information would be accessible via SQL as well.

I know what the proposal is.  What I am questioning is the use-case that
justifies having us build and support all this extra mechanism.  How often
does anyone need to know what the "next down" variable value would be?
And if they do need to know whether a variable is set in postgresql.conf,
how often wouldn't they just resort to "grep" instead?  (Among other
points, grep would succeed at noticing commented-out entries, which this
mechanism would not.)

GUC has existed in more or less its current state for about 15 years,
and I don't recall a lot of complaints that would be solved by this.
Furthermore, given that ALTER SYSTEM was sold to us as more or less
obsoleting manual editing of postgresql.conf, I rather doubt that it's
changed the basis of discussion all that much.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread David Johnston
On Thu, Jan 22, 2015 at 3:04 PM, Tom Lane  wrote:

> David G Johnston  writes:
> > Tom Lane-2 wrote
> >> regression=# alter system reset timezone;
> >> ALTER SYSTEM
> >> regression=# select pg_reload_conf();
>
> > How does someone know that performing the above commands will result in
> the
> > TimeZone setting being changed from Asia/Shanghai to US/Eastern?
>
> Is that a requirement, and if so why?  Because this proposal doesn't
> guarantee any such knowledge AFAICS.
>
>
>
​The proposal provides for SQL access to all possible sources of variable
value setting and, ideally, a means of ordering them in priority order, so
that a search for TimeZone would return two records, one for
postgresql.auto.conf and one for postgresql.conf - which are numbered 1 and
2 respectively - so that in looking at that result if the
postgresql.auto.conf entry were to be removed the user would know that what
the value is in postgresql.conf that would become active.  Furthermore, if
postgresql.conf has a setting AND there is a mapping in an #included file
that information would be accessible via SQL as well.

David J.
​


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread Tom Lane
David G Johnston  writes:
> Tom Lane-2 wrote
>> regression=# alter system reset timezone;
>> ALTER SYSTEM
>> regression=# select pg_reload_conf();

> How does someone know that performing the above commands will result in the
> TimeZone setting being changed from Asia/Shanghai to US/Eastern?

Is that a requirement, and if so why?  Because this proposal doesn't
guarantee any such knowledge AFAICS.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-22 Thread Andres Freund
On 2015-01-22 16:38:49 -0500, Stephen Frost wrote:
> Andres,
> 
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > 1) Make do_pg_start_backup() acquire a SHARE lock on
> >pg_database. That'll prevent it from starting while a movedb() is
> >still in progress. Then additionally add pg_backup_in_progress()
> >function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup ||
> >XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and
> >movedb() to error out if a backup is in progress.
> > 
> >Not pretty, but sounds doable.
> > 
> >We've discussed trying to sleep instead of erroring out in movedb(),
> >while a base backup is in progress, but that's nontrivial. I also
> >don't think ALTER DATABASE is ever intentionally run at the time of a
> >base backup.
> > 
> > 2) Make movedb() (and possibly created(), not sure yet) use proper WAL
> >logging and log the whole copied data. I think this is the right long
> >term fix and would end up being much more reliable. But it either
> >requires some uglyness during redo (creating nonexistant database
> >directories on the fly during redo) or new wal records.
> > 
> >Doable, but probably too large/invasive to backpatch.
> > 
> > Thanks for Alvaro and Petr for discussing the problem.
> > 
> > I lean towards doing 1) in all branches and then doing 2) in master.
> 
> I'm trying to figure out why you'd do '2' in master when in discussion
> of '1' you say "I also don't think ALTER DATABASE is even intentionally
> run at the time of a base backup."  I agree with that sentiment and am
> inclined to say that '1' is good enough throughout.

Because the way it currently works is a major crock. It's more luck than
anything that it actually somewhat works. We normally rely on WAL to
bring us into a consistent state. But around CREATE/MOVE/DROP DATABASE
we've ignored that.

My point about not intentionally running it at the same isn't that it's
not happening, it's that it's not intended to happen at the same
time. But most sane sites these days actually do use automated
basebackups - making it much harder to be safe against this.



And. Hm. The difficulty of the current method is evidenced by the fact
that so far nodoby recognized that 1) as described above isn't actually
safe.  It fails to protect against basebackups on a standby as its
XLogCtl state will obviously not be visible on the master.

For exclusive basebackups (i.e. SELECT pg_start/stop_backup()) we can't
rely on locking because these commands can happen in independent
sessions. But I think we can in the builtin nonexclusive basebackups, as
it's run in one session. So I guess we could rely on recovery conflicts
not allowing the XLOG_DBASE_CREATE/DROP to replicate. It's nasty that a
basebackup can suddenly stop replication from progressing though :(.
Additionally it'd not protect stuff like pgespresso (an extension for
nonexclusive standby basebackups).

> The other question is- what about AT .. ST?  That is, ALTER TABLE .. SET
> TABLESPACE.  Do we do things differently there or is there a similar
> issue?

No issue, because it actually is implemented halfway sanely sanely and
uses WAL logging.  I personally don't like at all that it uses
FlushRelationBuffers() and reads the tables on the smgr level instead of
using the buffer manager and a bulk state, but ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-22 Thread Merlin Moncure
On Fri, Jan 16, 2015 at 5:20 PM, Peter Geoghegan  wrote:
> On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure  wrote:
>> ISTM the next step is to bisect the problem down over the weekend in
>> order to to narrow the search.  If that doesn't turn up anything
>> productive I'll look into taking other steps.
>
> That might be the quickest way to do it, provided you can isolate the
> bug fairly reliably. It might be a bit tricky to write a shell script
> that assumes a certain amount of time having passed without the bug
> tripping indicates that it doesn't exist, and have that work
> consistently. I'm slightly concerned that you'll hit other bugs that
> have since been fixed, given the large number of possible symptoms
> here.

Quick update:  not done yet, but I'm making consistent progress, with
several false starts.  (for example, I had a .conf problem with the
new dynamic shared memory setting and git merrily bisected down to the
introduction of the feature.).
I have to triple check everything :(. The problem is generally
reproducible but I get false negatives that throws off the bisection.
I estimate that early next week I'll have it narrowed down
significantly if not to the exact offending revision.

So far, the 'nasty' damage seems to generally if not always follow a
checksum failure and the checksum failures are always numerically
adjacent.  For example:

[cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING:  page
verification failed, calculated checksum 9465 but expected 9477 at
character 20
[cds2 21202 2015-01-22 13:10:18.172 CST 3196]WARNING:  page
verification failed, calculated checksum 61889 but expected 61903 at
character 20
[cds2 29153 2015-01-22 14:49:04.831 CST 4803]WARNING:  page
verification failed, calculated checksum 27311 but expected 27316

I'm not up on the intricacies of our checksum algorithm but this is
making me suspicious that we are looking at a improperly flipped
visibility bit via some obscure problem -- almost certainly with
vacuum playing a role.  This fits the profile of catastrophic damage
that masquerades as numerous other problems.  Or, perhaps, something
is flipping what it thinks is a visibility bit but on the wrong page.

I still haven't categorically ruled out pl/sh yet; that's something to
keep in mind.

In the 'plus' category, aside from flushing out this issue, I've had
zero runtime problems so far aside from the mains problem; bisection
(at least on the 'bad' side) has been reliably engaged by simply
counting the number of warnings/errors/etc in the log.  That's really
impressive.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread David G Johnston
Tom Lane-2 wrote
> regression=# alter system reset timezone;
> ALTER SYSTEM
> regression=# select pg_reload_conf();

How does someone know that performing the above commands will result in the
TimeZone setting being changed from Asia/Shanghai to US/Eastern?

David J.




--
View this message in context: 
http://postgresql.nabble.com/Proposal-knowing-detail-of-config-files-via-SQL-tp5835075p5835142.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user/role CURRENT_USER

2015-01-22 Thread Alvaro Herrera
Looking at this a bit, I'm not sure completely replacing RoleId with a
node is the best idea; some of the users of that production in the
grammar are okay with accepting only normal strings as names, and don't
need all the CURRENT_USER etc stuff.  Maybe we need a new production,
say RoleSpec, and we modify the few productions that need the extra
flexibility?  For instance we could have ALTER USER RoleSpec instead of
ALTER USER RoleId.  But we would keep CREATE USER RoleId, because it
doesn't make any sense to accept CREATE USER CURRENT_USER.

I think that would lead to a less invasive patch also.

Am I making sense?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-22 Thread Stephen Frost
Andres,

* Andres Freund (and...@2ndquadrant.com) wrote:
> 1) Make do_pg_start_backup() acquire a SHARE lock on
>pg_database. That'll prevent it from starting while a movedb() is
>still in progress. Then additionally add pg_backup_in_progress()
>function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup ||
>XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and
>movedb() to error out if a backup is in progress.
> 
>Not pretty, but sounds doable.
> 
>We've discussed trying to sleep instead of erroring out in movedb(),
>while a base backup is in progress, but that's nontrivial. I also
>don't think ALTER DATABASE is ever intentionally run at the time of a
>base backup.
> 
> 2) Make movedb() (and possibly created(), not sure yet) use proper WAL
>logging and log the whole copied data. I think this is the right long
>term fix and would end up being much more reliable. But it either
>requires some uglyness during redo (creating nonexistant database
>directories on the fly during redo) or new wal records.
> 
>Doable, but probably too large/invasive to backpatch.
> 
> Thanks for Alvaro and Petr for discussing the problem.
> 
> I lean towards doing 1) in all branches and then doing 2) in master.

I'm trying to figure out why you'd do '2' in master when in discussion
of '1' you say "I also don't think ALTER DATABASE is even intentionally
run at the time of a base backup."  I agree with that sentiment and am
inclined to say that '1' is good enough throughout.

Another thought would be to offer both- perhaps an AD .. ST ..
CONCURRENTLY option which would WAL.  Or a way to say "my backup is more
important than some AD .. ST ..", which I could see some admins wanting.

The other question is- what about AT .. ST?  That is, ALTER TABLE .. SET
TABLESPACE.  Do we do things differently there or is there a similar
issue?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-22 Thread Andreas Karlsson

On 01/20/2015 10:08 AM, Noah Misch wrote:

Fair enough.  It did reinforce pg_get_constraintdef() as a subroutine of
pg_dump rather than an independent, rigorous interface.  It perhaps made the
function worse for non-pg_dump callers.  In that vein, each one of these hacks
has a cost.  One could make a reasonable argument that ALTER TRIGGER RENAME
locking is not important enough to justify spreading the hack from
pg_get_constraintdef() to pg_get_triggerdef().  Lowering the CREATE TRIGGER
lock level does not require any ruleutils.c change for the benefit of pg_dump,
because pg_dump won't see the pg_trigger row of a too-recent trigger.


I agree with this view, and am not sure myself that it is worth lowering 
the lock level of ALTER TRIGGER RENAME. I have attached a patch without 
the changes to ALTER TRIGGER and ruleutils.c and also fixes the comment 
issues noted by Andres.


--
Andreas Karlsson
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index a0d6867..fc86224 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -908,9 +908,9 @@ ERROR:  could not serialize access due to read/write dependencies among transact
 
 
 
- This lock mode is not automatically acquired by any
- PostgreSQL command.
-
+ Acquired by CREATE TRIGGER and many forms of
+ ALTER TABLE (see ).
+>

   
 
@@ -957,9 +957,9 @@ ERROR:  could not serialize access due to read/write dependencies among transact
  TRUNCATE, REINDEX,
  CLUSTER, and VACUUM FULL
  commands. Many forms of ALTER TABLE also acquire
- a lock at this level (see ).
- This is also the default lock mode for LOCK TABLE
- statements that do not specify a mode explicitly.
+ a lock at this level. This is also the default lock mode for
+ LOCK TABLE statements that do not specify
+ a mode explicitly.
 

   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index b3a4970..f5bbfcd 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -406,6 +406,9 @@ ALTER TABLE ALL IN TABLESPACE name
   mode, and triggers configured as ENABLE ALWAYS will
   fire regardless of the current replication mode.
  
+ 
+  This command acquires a SHARE ROW EXCLUSIVE lock.
+ 
 

 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 66d5083..08aa71b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2858,13 +2858,8 @@ AlterTableGetLockLevel(List *cmds)
 break;
 
 /*
- * These subcommands affect write operations only. XXX
- * Theoretically, these could be ShareRowExclusiveLock.
+ * These subcommands affect write operations only.
  */
-			case AT_ColumnDefault:
-			case AT_ProcessedConstraint:		/* becomes AT_AddConstraint */
-			case AT_AddConstraintRecurse:		/* becomes AT_AddConstraint */
-			case AT_ReAddConstraint:	/* becomes AT_AddConstraint */
 			case AT_EnableTrig:
 			case AT_EnableAlwaysTrig:
 			case AT_EnableReplicaTrig:
@@ -2873,6 +2868,17 @@ AlterTableGetLockLevel(List *cmds)
 			case AT_DisableTrig:
 			case AT_DisableTrigAll:
 			case AT_DisableTrigUser:
+cmd_lockmode = ShareRowExclusiveLock;
+break;
+
+/*
+ * These subcommands affect write operations only. XXX
+ * Theoretically, these could be ShareRowExclusiveLock.
+ */
+			case AT_ColumnDefault:
+			case AT_ProcessedConstraint:		/* becomes AT_AddConstraint */
+			case AT_AddConstraintRecurse:		/* becomes AT_AddConstraint */
+			case AT_ReAddConstraint:	/* becomes AT_AddConstraint */
 			case AT_AlterConstraint:
 			case AT_AddIndex:	/* from ADD CONSTRAINT */
 			case AT_AddIndexConstraint:
@@ -2909,11 +2915,9 @@ AlterTableGetLockLevel(List *cmds)
 			/*
 			 * We add triggers to both tables when we add a
 			 * Foreign Key, so the lock level must be at least
-			 * as strong as CREATE TRIGGER. XXX Might be set
-			 * down to ShareRowExclusiveLock though trigger
-			 * info is accessed by pg_get_triggerdef
+			 * as strong as CREATE TRIGGER.
 			 */
-			cmd_lockmode = AccessExclusiveLock;
+			cmd_lockmode = ShareRowExclusiveLock;
 			break;
 
 		default:
@@ -6030,16 +6034,13 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
 	ListCell   *old_pfeqop_item = list_head(fkconstraint->old_conpfeqop);
 
 	/*
-	 * Grab an exclusive lock on the pk table, so that someone doesn't delete
-	 * rows out from under us. (Although a lesser lock would do for that
-	 * purpose, we'll need exclusive lock anyway to add triggers to the pk
-	 * table; trying to start with a lesser lock will just create a risk of
-	 * deadlock.)
+	 * Grab ShareRowExclusiveLock on the pk table, so that someone doesn't
+	 * delete rows out from under us.
 	 */
 	if (OidIsValid(fkconstraint->old_pktable_oid))
-		p

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread Tom Lane
Sawada Masahiko  writes:
> As per discussion
> ,
> I would like to proposal new view like pg_file_settings to know detail
> of config file via SQL.

> - Background
> In 9.4 postgresql.auto.conf is added to support ALTER SYSTEM command
> and that config file is loaded after whenever postgresql.conf is
> loaded.
> That is, postgresql.auto.conf is quite high priority so that the value
> in postgresql.conf can not work at all if DBA set it manually.
> ALTER SYSTEM RESET command can remove the unnecessary value in
> postgresql.auto.conf but  there are no way to know about where the
> value has came from.
> (They can only give the information about the setting in last file it
> is present.)

> - Solution
> The patch not is implemented yet, just proposing now.
> I'm imaging that we can have new pg_file_settings view has following
> column to store current assigned value in config file.
>  - guc value name
>  - guc value
>  - config file path (e.g. /opt/data/postgresql.sql,
> /opt/data/postgresql.auto.conf, /opt/hoge.conf)
> This view could be convenient for DBA to decide if the
> postgresql.auto.conf is useful or not and if it's not useful then DBA
> could use ALTER SYSTEM .. RESET command to remove the same from
> postgresql.auto.conf.

> Also other idea is to add additional columns existing view
> (pg_settings), according to prev discussion.

> Please give me comments.

I still don't understand what problem you think needs to be solved.
It's already perfectly clear from the pg_settings view when a value
came from postgresql.auto.conf.  For instance:


regression=# select name,setting,source,sourcefile,sourceline from pg_settings 
where name = 'TimeZone';
   name   |  setting   |   source   |   sourcefile  
  | sourceline 
--+++-+
 TimeZone | US/Eastern | configuration file | 
/home/postgres/testversion/data/postgresql.conf |531
(1 row)

regression=# alter system set timezone = 'Asia/Shanghai';
ALTER SYSTEM
regression=# select pg_reload_conf();
 pg_reload_conf 

 t
(1 row)

regression=# select name,setting,source,sourcefile,sourceline from pg_settings 
where name = 'TimeZone';
   name   |setting|   source   |  
sourcefile  | sourceline 
--+---++--+
 TimeZone | Asia/Shanghai | configuration file | 
/home/postgres/testversion/data/postgresql.auto.conf |  3
(1 row)

regression=# alter system reset timezone;
ALTER SYSTEM
regression=# select pg_reload_conf();
 pg_reload_conf 

 t
(1 row)

regression=# select name,setting,source,sourcefile,sourceline from pg_settings 
where name = 'TimeZone';
   name   |  setting   |   source   |   sourcefile  
  | sourceline 
--+++-+
 TimeZone | US/Eastern | configuration file | 
/home/postgres/testversion/data/postgresql.conf |531
(1 row)


What else is needed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-22 Thread Andres Freund
On 2015-01-22 14:42:18 -0600, Jim Nasby wrote:
> On 1/22/15 1:43 PM, Alvaro Herrera wrote:
> >Andres Freund wrote:
> >
> >>2) Make movedb() (and possibly created(), not sure yet) use proper WAL
> >>logging and log the whole copied data. I think this is the right long
> >>term fix and would end up being much more reliable. But it either
> >>requires some uglyness during redo (creating nonexistant database
> >>directories on the fly during redo) or new wal records.
> >>
> >>Doable, but probably too large/invasive to backpatch.
> >
> >Not to mention the extra WAL traffic ...
> 
> Yeah, I don't know that we actually want #2. It's bad enough to copy
> an entire database locally

The local copy is pretty much fundamental. Given that tablespaces
usually will be on different filesystems there's not much else we can
do.

If we want a optimization for moving databases across tablespaces if
both are on the same filesystems and wal_level < archive, we can do
that. But that's pretty independent. And I doubt it's worthwile the
developer time.

> , but to then put it's entire contents into WAL? Blech.

Besides actually having a chance of being correct, doing so will save
having to do two checkpoints inside movedb(). I think it's pretty likely
that that actually saves overall IO, even including the WAL
writes. Especially if there's other databases in the cluster at the same
time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Heikki Linnakangas

On 01/22/2015 10:34 PM, Jim Nasby wrote:

On 1/22/15 2:19 PM, Heikki Linnakangas wrote:

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.


No. See comments in MarkBufferDirtyHint:


 /*
  * If we need to protect hint bit updates from torn writes, WAL-log a
  * full page image of the page. This full page image is only necessary
  * if the hint bit update is the first change to the page since the
  * last checkpoint.
  *
  * We don't check full_page_writes here because that logic is included
  * when we call XLogInsert() since the value changes dynamically.
  */
 if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
 {
 /*
  * If we're in recovery we cannot dirty a page because of a hint.
  * We can set the hint, just not dirty the page as a result so the
  * hint is lost when we evict the page or shutdown.
  *
  * See src/backend/storage/page/README for longer discussion.
  */
 if (RecoveryInProgress())
 return;


What if XLogHintBitIsNeeded is false? That would be the case if we're not wall 
logging hints *on the standby*.


Then the page will be updated without writing a WAL record. Just like in 
the master, if wal_log_hints is off. wal_log_hints works the same on the 
master or the standby.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-22 Thread Jim Nasby

On 1/22/15 1:43 PM, Alvaro Herrera wrote:

Andres Freund wrote:


2) Make movedb() (and possibly created(), not sure yet) use proper WAL
logging and log the whole copied data. I think this is the right long
term fix and would end up being much more reliable. But it either
requires some uglyness during redo (creating nonexistant database
directories on the fly during redo) or new wal records.

Doable, but probably too large/invasive to backpatch.


Not to mention the extra WAL traffic ...


Yeah, I don't know that we actually want #2. It's bad enough to copy an entire 
database locally, but to then put it's entire contents into WAL? Blech.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Jim Nasby

On 1/22/15 2:19 PM, Heikki Linnakangas wrote:

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.


No. See comments in MarkBufferDirtyHint:


/*
 * If we need to protect hint bit updates from torn writes, WAL-log a
 * full page image of the page. This full page image is only necessary
 * if the hint bit update is the first change to the page since the
 * last checkpoint.
 *
 * We don't check full_page_writes here because that logic is included
 * when we call XLogInsert() since the value changes dynamically.
 */
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
 * If we're in recovery we cannot dirty a page because of a hint.
 * We can set the hint, just not dirty the page as a result so the
 * hint is lost when we evict the page or shutdown.
 *
 * See src/backend/storage/page/README for longer discussion.
 */
if (RecoveryInProgress())
return;


What if XLogHintBitIsNeeded is false? That would be the case if we're not wall 
logging hints *on the standby*.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 1:00 PM, Peter Geoghegan  wrote:
> On Thu, Jan 22, 2015 at 9:55 AM, Robert Haas  wrote:
>> Stay tuned for more exciting dispatches from the department of abbreviated 
>> keys!
>
> I certainly suspected that we'd have problems with Windows, but
> nothing this bad.

This isn't really Windows-specific.  The root of the problem is that
when LC_COLLATE=C you were trying to use strxfrm() for the abbreviated
key even though memcmp() is the authoritative comparator in that case.
Exactly which platforms happened to blow up as a result of that is
kind of beside the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-22 Thread Heikki Linnakangas

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.


No. See comments in MarkBufferDirtyHint:


/*
 * If we need to protect hint bit updates from torn writes, 
WAL-log a
 * full page image of the page. This full page image is only 
necessary
 * if the hint bit update is the first change to the page since 
the
 * last checkpoint.
 *
 * We don't check full_page_writes here because that logic is 
included
 * when we call XLogInsert() since the value changes 
dynamically.
 */
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
 * If we're in recovery we cannot dirty a page because 
of a hint.
 * We can set the hint, just not dirty the page as a 
result so the
 * hint is lost when we evict the page or shutdown.
 *
 * See src/backend/storage/page/README for longer 
discussion.
 */
if (RecoveryInProgress())
return;



- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-01-22 Thread Alvaro Herrera
Can you please break up this patch?  I think I see three patches,

1. add sql-callable functions such as inet_merge, network_overright, etc
etc.  These need documentation and a trivial regression test
somewhere.

2. necessary changes to header files (skey.h etc)

3. the inclusion opclass itself

Thanks

BTW the main idea behind having opcinfo return the type oid was to tell
the index what was stored in the index.  If that doesn't work right now,
maybe it needs some tweak to the brin framework code.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: plpgsql - Assert statement

2015-01-22 Thread Pavel Stehule
Hi

here is updated patch

2015-01-21 23:28 GMT+01:00 Jim Nasby :

> On 1/21/15 3:10 PM, Pavel Stehule wrote:
>
>>
>> is there some agreement on this behave of ASSERT statement?
>>
>> I would to assign last patch to next commitfest.
>>
>> Possible changes:
>>
>> * I would to simplify a behave of evaluating of message expression -
>> probably I disallow NULL there.
>>
>
> Well, the only thing I could see you doing there is throwing a different
> error if the hint is null. I don't see that as an improvement. I'd just
> leave it as-is.
>

I enabled a NULL - but enforced a WARNING before.


>
>  * GUC enable_asserts will be supported
>>
>
> That would be good. Would that allow for enabling/disabling on a
> per-function basis too?
>

sure - there is only question if we develop a #option
enable|disable_asserts. I have no string idea.


>
>  * a assert exception should not be handled by PLpgSQL handler -- like
>> CANCEL exception
>>
>
> +1
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>
commit 60f85cb5f284bf812ee73d321850d25313d19d65
Author: Pavel Stehule 
Date:   Thu Jan 22 20:56:31 2015 +0100

initial

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bcb106..5663983 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6912,6 +6912,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 
 
 
+ 
+  enable_user_asserts (boolean)
+  
+   enable_user_asserts configuration parameter
+  
+  
+  
+   
+If true, any user assertions are evaluated.  By default, this 
+is set to true.
+   
+  
+ 
+
  
   exit_on_error (boolean)
   
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 69a0885..26f7eba 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3372,6 +3372,9 @@ END LOOP  label ;
   
Errors and Messages
 
+  
+RAISE statement
+

 RAISE

@@ -3564,7 +3567,33 @@ RAISE unique_violation USING MESSAGE = 'Duplicate user ID: ' || user_id;
  the whole category.
 

+  
+
+  
+ASSERT statement
 
+   
+ASSERT
+   
+
+   
+assertions
+in PL/pgSQL
+   
+
+   
+Use the ASSERT statement to ensure so expected
+predicate is allways true. If the predicate is false or is null,
+then a assertion exception is raised.
+
+
+ASSERT expression , message expression ;
+
+
+The user assertions can be enabled or disabled by the 
+.
+   
+  
  
 
  
diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
index 28c8c40..da12428 100644
--- a/src/backend/utils/errcodes.txt
+++ b/src/backend/utils/errcodes.txt
@@ -454,6 +454,7 @@ PEERRCODE_PLPGSQL_ERROR  plp
 P0001EERRCODE_RAISE_EXCEPTIONraise_exception
 P0002EERRCODE_NO_DATA_FOUND  no_data_found
 P0003EERRCODE_TOO_MANY_ROWS  too_many_rows
+P0004EERRCODE_ASSERT_EXCEPTION   assert_exception
 
 Section: Class XX - Internal Error
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c35867b..cd55e94 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -99,6 +99,7 @@ bool		IsBinaryUpgrade = false;
 bool		IsBackgroundWorker = false;
 
 bool		ExitOnAnyError = false;
+bool		enable_user_asserts = true;
 
 int			DateStyle = USE_ISO_DATES;
 int			DateOrder = DATEORDER_MDY;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f6df077..b3a2660 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -980,6 +980,15 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
+		{"enable_user_asserts", PGC_USERSET, ERROR_HANDLING_OPTIONS,
+			gettext_noop("Enable user asserts checks."),
+			NULL
+		},
+		&enable_user_asserts,
+		true,
+		NULL, NULL, NULL
+	},
+	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
 			NULL
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 6e33a17..fab3e8a 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -137,6 +137,7 @@ extern bool IsBackgroundWorker;
 extern PGDLLIMPORT bool IsBinaryUpgrade;
 
 extern bool ExitOnAnyError;
+extern bool enable_user_asserts;
 
 extern PGDLLIMPORT char *DataDir;
 
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index ae5421f..237f453 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -136,6 +136,8 @@ static int exec_stmt_return_query(PLpgSQL_execstate *estate,
 	   PLpgSQL_stmt_return_query *stmt);
 static int exec_stmt_raise(PLpgSQL_execstate *estate,
 PLpgSQL_stmt_raise *stmt);
+static int exec_s

Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-22 Thread Alvaro Herrera
Andres Freund wrote:

> 2) Make movedb() (and possibly created(), not sure yet) use proper WAL
>logging and log the whole copied data. I think this is the right long
>term fix and would end up being much more reliable. But it either
>requires some uglyness during redo (creating nonexistant database
>directories on the fly during redo) or new wal records.
> 
>Doable, but probably too large/invasive to backpatch.

Not to mention the extra WAL traffic ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread Jim Nasby

On 1/22/15 11:13 AM, Sawada Masahiko wrote:

Hi,

As per discussion
,
I would like to proposal new view like pg_file_settings to know detail
of config file via SQL.

- Background
In 9.4 postgresql.auto.conf is added to support ALTER SYSTEM command
and that config file is loaded after whenever postgresql.conf is
loaded.
That is, postgresql.auto.conf is quite high priority so that the value
in postgresql.conf can not work at all if DBA set it manually.
ALTER SYSTEM RESET command can remove the unnecessary value in
postgresql.auto.conf but  there are no way to know about where the
value has came from.
(They can only give the information about the setting in last file it
is present.)

- Solution
The patch not is implemented yet, just proposing now.
I'm imaging that we can have new pg_file_settings view has following
column to store current assigned value in config file.
  - guc value name
  - guc value
  - config file path (e.g. /opt/data/postgresql.sql,
/opt/data/postgresql.auto.conf, /opt/hoge.conf)
This view could be convenient for DBA to decide if the
postgresql.auto.conf is useful or not and if it's not useful then DBA
could use ALTER SYSTEM .. RESET command to remove the same from
postgresql.auto.conf.


Would this view have a row for every option in a config file? IE: if you set 
something in both postgresql.conf and postgresql.auto.conf, would it show up 
twice? I think it should, and that there should be a way to see which setting 
is actually in effect.

It looks like you're attempting to handle #include, yes?


Also other idea is to add additional columns existing view
(pg_settings), according to prev discussion.


I think it would be useful to have a separate view that shows all occurrences 
of a setting. I recall some comment about source_file and source_line not 
always being correct in pg_settings; if that's true we should fix that.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade and rsync

2015-01-22 Thread Bruce Momjian
It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

However, Josh Berkus reported that he found that hint bits set on the
master server but not on the standby servers made rsync too slow.

The attached pg_upgrade doc patch recommends using wal_log_hints to make
hint bits on the standby match the master.  One question I have is
whether hint bits are set by read-only transactions on standby servers.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..07cb1f9
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** psql --username postgres --file script.s
*** 553,559 
 is to upgrade the primary and use rsync to rebuild the
 standbys.  You can run rsync while the primary is down,
 or as part of a base backup ()
!which overwrites the old standby cluster.

  

--- 553,565 
 is to upgrade the primary and use rsync to rebuild the
 standbys.  You can run rsync while the primary is down,
 or as part of a base backup ()
!which overwrites the old standby cluster.  Hint bit differences
!between the primary and standby can cause rsync to copy
!additional data —  these differences can be reduced by using
!a fresh standby and by enabling .
!(While wal_log_hints transfers hint bits from the primary
!to standbys, additional hint bits are still set on the standbys by
!read-only queries.)

  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-22 Thread Andres Freund
Hi,

On 2015-01-20 16:28:19 +0100, Andres Freund wrote:
> I'm analyzing a problem in which a customer had a pg_basebackup (from
> standby) created 9.2 cluster that failed with "WAL contains references to
> invalid pages". The failed record was a "xlog redo visible"
> i.e. XLOG_HEAP2_VISIBLE.
>
> First I thought there might be another bug along the line of
> 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
> case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
> seem to have any problems.
>
> Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
> the basebackup was started and finished *before* pg_basebackup finished.
>
> movedb() basically works in these steps:
> 1) lock out users of the database
> 2) RequestCheckpoint(IMMEDIATE|WAIT)
> 3) DropDatabaseBuffers()
> 4) copydir()
> 5) XLogInsert(XLOG_DBASE_CREATE)
> 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
> 7) rmtree(src_dbpath)
> 8) XLogInsert(XLOG_DBASE_DROP)
> 9) unlock database
>
> If a basebackup starts while 4) is in progress and continues until 7)
> happens I think a pretty wide race opens: The basebackup can end up with
> a partial copy of the database in the old tablespace because the
> rmtree(old_path) concurrently was in progress.  Normally such races are
> fixed during replay. But in this case, the replay of the
> XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
> fixing nothing.
>
> Besides making AD .. ST use sane WAL logging, which doesn't seem
> backpatchable, I don't see what could be done against this except
> somehow making basebackups fail if a AD .. ST is in progress. Which
> doesn't look entirely trivial either.

I basically have two ideas to fix this.

1) Make do_pg_start_backup() acquire a SHARE lock on
   pg_database. That'll prevent it from starting while a movedb() is
   still in progress. Then additionally add pg_backup_in_progress()
   function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup ||
   XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and
   movedb() to error out if a backup is in progress.

   Not pretty, but sounds doable.

   We've discussed trying to sleep instead of erroring out in movedb(),
   while a base backup is in progress, but that's nontrivial. I also
   don't think ALTER DATABASE is ever intentionally run at the time of a
   base backup.

2) Make movedb() (and possibly created(), not sure yet) use proper WAL
   logging and log the whole copied data. I think this is the right long
   term fix and would end up being much more reliable. But it either
   requires some uglyness during redo (creating nonexistant database
   directories on the fly during redo) or new wal records.

   Doable, but probably too large/invasive to backpatch.

Thanks for Alvaro and Petr for discussing the problem.

I lean towards doing 1) in all branches and then doing 2) in master.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2015-01-22 Thread Bruce Momjian
On Fri, Jan  9, 2015 at 03:04:19PM -0500, Bruce Momjian wrote:
> > > Uh, where are we on this?
> > 
> > I think someone needs to take Tom's proposed language and make it into
> > a patch.  And figure out which other functions in the documentation
> > need similar updates.
> 
> I have developed such a patch --- attached.  I didn't see any other
> mentions of blocking in the docs.

Patch applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-01-22 Thread Alvaro Herrera

Hi,

Thanks for the excellent work on the new commitfest app.  It looks
awesome so far, though I'm betting the commitfest manager is the one who
reaps the most benefits.

Wanted to highlight this request:

Andres Freund wrote:
> What I'm also missing from the old app is that previously 'reviews'
> could explicitly be linked from the app. Now there's a list of emails in
> the thread, nice!, but in bigger threads that really doesn't help to
> find the actual review.

Note for instance the "BRIN inclusion operator class" patch here,
https://commitfest.postgresql.org/3/31/

The link points to the generic BRIN thread I started (you can see it in
the history).  If you list all attachments you can see that all the BRIN
patches are linked; Emre's patch is there, it seems, only because it's
the one most recently posted.

Not sure what to do here, but it's somewhat annoying.


Also, I was pretty used to offline operation with the old one: I could
load the page, for instance
https://commitfest-old.postgresql.org/action/commitfest_view?id=25
and the "patch" links alongside each patch had the message-ids with
which I could search my local mbox.  In the new one, the only way I can
get the message-id is by opening the patch page.  It would be pretty
useful to have a "copy message-id to clipboard" button, or something
similar, so that I could transfer operation from the web browser to my
mail client.  Having one message-id per patch in the commitfest summary
page is enough for my use case (probably pointing to the most recent
attachment in the linked threads.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Peter Geoghegan
On Thu, Jan 22, 2015 at 9:55 AM, Robert Haas  wrote:
> Stay tuned for more exciting dispatches from the department of abbreviated 
> keys!

I certainly suspected that we'd have problems with Windows, but
nothing this bad.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 12:17 PM, Robert Haas  wrote:
> Long story short: this was broken.  It may be that when the dust
> settles we can try re-enabling this on Windows.   It might work now
> that this issue is (hopefully) fixed.

Uggh.  That still wasn't right; I've pushed commit
d060e07fa919e0eb681e2fa2cfbe63d6c40eb2cf to try to fix it.

Stay tuned for more exciting dispatches from the department of abbreviated keys!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-22 Thread Alvaro Herrera
Here's v23.

I reworked a number of things.  First, I changed trivial stuff like
grouping all the vacuuming options in a struct, to avoid passing an
excessive number of arguments to functions.  full, freeze, analyze_only,
and_analyze and verbose are all in a single struct now.  Also, the
stage_commands and stage_messages was duplicated by your patch; I moved
them to a file-level static struct.

I made prepare_command reset the string buffer and receive an optional
table name, so that it can append it to the generated command, and
append the semicolon as well.  Forcing the callers to reset the string
before calling, and having them add the table name and semicolon
afterwards was awkward and unnecessarily verbose.

You had a new in_abort() function in common.c which seems an unnecessary
layer; in its place I just exported the inAbort boolean flag it was
returning, and renamed to CancelRequested.

I was then troubled by the fact that vacuum_one_database() was being
called in a loop by main() when multiple tables are vacuumed, but
vacuum_parallel() was doing the loop internally.  I found this
discrepancy confusing, so I renamed that new function to
vacuum_one_database_parallel and modified the original
vacuum_one_database to do the loop internally as well.  Now they are, in
essence, a mirror of each other, one doing the parallel stuff and one
doing it serially.  This seems to make more sense to me -- but see
below.

I also modified some underlying stuff like GetIdleSlot returning a
ParallelSlot pointer instead of an array index.  Since its caller always
has to dereference the array with the given index, it makes more sense
to return the right element pointer instead, so I made it do that.
Also, that way, instead of returning NO_SLOT in case of error it can
just return NULL; no need for extra cognitive burden.

I also changed select_loop.  In your patch it had two implementations,
one WIN32 and another one for the rest.  It looks nicer to me to have
only one with small exceptions in the places that need it.  (I haven't
tested the WIN32 path.)  Also, instead of returning ERROR_IN_ABORT I
made it set a boolean flag in case of error, which seems cleaner.

I changed GetQueryResult as I described in a previous message.


There are two things that continue to bother me and I would like you,
dear patch author, to change them before committing this patch:

1. I don't like having vacuum_one_database() and a separate
vacuum_one_database_parallel().  I think we should merge them into one
function, which does either thing according to parameters.  There's
plenty in there that's duplicated.

2. in particular, the above means that run_parallel_vacuum can no longer
exist as it is.  Right now vacuum_one_database_parallel relies on
run_parallel_vacuum to do the actual job parallellization.  I would like
to have that looping in the improved vacuum_one_database() function
instead.


Looking forward to v24,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 3ecd999..211235a 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -204,6 +204,25 @@ PostgreSQL documentation
  
 
  
+  -j jobs
+  --jobs=njobs
+  
+   
+This option will enable the vacuum operation to run on concurrent
+connections. Maximum number of tables can be vacuumed concurrently
+is equal to number of jobs. If number of jobs given is more than
+number of tables then number of jobs will be set to number of tables.
+   
+   
+vacuumdb will open
+ njobs connections to the
+database, so make sure your 
+setting is high enough to accommodate all connections.
+   
+  
+ 
+
+ 
   --analyze-in-stages
   

diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index d942a75..1bf7611 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -1160,7 +1160,7 @@ select_loop(int maxFd, fd_set *workerset)
 		i = select(maxFd + 1, workerset, NULL, NULL, NULL);
 
 		/*
-		 * If we Ctrl-C the master process , it's likely that we interrupt
+		 * If we Ctrl-C the master process, it's likely that we interrupt
 		 * select() here. The signal handler will set wantAbort == true and
 		 * the shutdown journey starts from here. Note that we'll come back
 		 * here later when we tell all workers to terminate and read their
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 6bfe2e6..da142aa 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -19,10 +19,9 @@
 
 #include "common.h"
 
-static void SetCancelConn(PGconn *conn);
-static void ResetCancelConn(void);
 
 static PGcancel *volatile cancelConn = NULL;
+bool CancelRequested = false;
 
 #ifdef WIN32
 static CRITICAL_SECTION cancelConnLock;
@@ -291,7 +29

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-22 Thread Robert Haas
On Wed, Jan 21, 2015 at 11:13 AM, Sawada Masahiko  wrote:
> I think this new view is updated only when postmaster received SIGHUP
> or is started.
> And we can have new function like pg_update_file_setting() which
> updates this view.

I really don't think the postmaster should be in the business of
trying to publish views of the data.  Let the individual backend
reveal its own view, and keep the postmaster out of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-22 Thread Robert Haas
On Wed, Jan 21, 2015 at 1:38 AM, Amit Kapila  wrote:
> Now as I have suggested upthread, that we can have a new view
> pg_file_settings which will display information about settings even
> when there exists multiple entries for the same in different files.
>
> I think adding such information to existing view pg_settings would
> be difficult as the same code is used for show commands which
> we don't want to change.

A new view might work, or we could add columns that contain arrays to
the existing view.  But a new view may be nicer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-22 Thread Robert Haas
On Wed, Jan 21, 2015 at 1:32 AM, David Johnston
 wrote:
> to make the whole thing work.  Maybe it does all fit directly on pg_settings
> but tacking on some read-only columns to this updateable view/table doesn't
> come across as something that should be forbidden in general.

No, of course not.  But they should be things that are of general
utility (we agree that most people will want them) and they should be
orthogonal to what we already have (not just a duplication of
something that's present elsewhere).

> Maybe I am imagining a use-case that just isn't there but if there are two
> separate queries needed, and we call one "consolidated", then having that
> query indicate whether the other query has useful information, and it is
> quite likely that it will not, avoids the user expending the effort to run
> the wasteful secondary query.

Well, we shouldn't assume that everyone uses the same queries, or
issues them in the same order, so I think this is pretty speculative.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collate test now failing

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 12:07 PM, Bruce Momjian  wrote:
>> Back to watching the buildfarm returns roll in...
>
> Does this explain the Windows failures too?

The Windows machines have mostly been failing since the original
abbreviated keys patch went in.  See the message I just posted on the
"Windows buildfarm animals are still not happy with abbreviated keys
patch" thread for a fuller analysis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 11:28 AM, Robert Haas  wrote:
> This seems to have broken more stuff.  My working hypothesis is that
> the culprit is here:
>
> /*
>  * There is no special handling of the C locale here, unlike with
>  * varstr_cmp().  strxfrm() is used indifferently.
>  */
>
> As far as I can see, that's just hoping that when the locale is C,
> strxfrm() is memcpy().  If it isn't, then you will potentially get
> different results when the abbreviated keys stuff is used vs. when it
> isn't, because when it isn't, we're going to memcmp(), and when it is,
> we're going to memcmp() the results of strxfrm().

As noted also on the thread Kevin started, I've now pushed a fix for
this and am eagerly awaiting new buildfarm returns.  I wonder if this
might account for the ordering failures we were seeing on Windows
before.  We thought it was a Windows-specific issue, but I bet the
real problem was here:

if (collid != DEFAULT_COLLATION_OID)
{
if (!OidIsValid(collid))
{
/*
 * This typically means that the parser could
not resolve a
 * conflict of implicit collations, so report
it that way.
 */
ereport(ERROR,

(errcode(ERRCODE_INDETERMINATE_COLLATION),
 errmsg("could not determine
which collation to use for string comparison"),
 errhint("Use the COLLATE
clause to set the collation explicitly.")));
}
#ifdef HAVE_LOCALE_T
tss->locale = pg_newlocale_from_collation(collid);
#endif
}

Before the abbreviated keys commit, this code only ran if we'd already
determined that lc_collate_is_c(collid) was false.  But that commit
made this also run when that value was true.  So if HAVE_LOCALE_T and
collid != DEFAULT_COLLATION_ID, then tss->locale was getting set.  If
it got set to something that made strxfrm() behave like memcmp(), then
all was well.  If not, then life was bad.  Now you'd hope that would
work out, but maybe not.

Also, suppose HAVE_LOCALE_T was defined but collid ==
DEFAULT_COLLATION_ID.  Then tss->locale didn't get initialized at all,
and bttext_abbrev_convert() just passed whatever stupid value it found
there to strxfrm_l().

Finally, suppose we *didn't* HAVE_LOCALE_T.   Then, the
non-abbreviated comparator knew it should use memcmp(), but the
abbreviated comparator was happy to use strxfrm() even though that
would use the current locale, but the C locale that the user
requested.

Long story short: this was broken.  It may be that when the dust
settles we can try re-enabling this on Windows.   It might work now
that this issue is (hopefully) fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-22 Thread Sawada Masahiko
On Thu, Jan 22, 2015 at 11:41 AM, Amit Kapila  wrote:
> On Wed, Jan 21, 2015 at 9:43 PM, Sawada Masahiko 
> wrote:
>>
>> On Wed, Jan 21, 2015 at 3:38 PM, Amit Kapila 
>> wrote:
>> > On Tue, Jan 20, 2015 at 9:38 PM, Robert Haas 
>> > wrote:
>> >
>> >
>> > The reason why "sourcefile" and "sourceline" are not sufficient is that
>> > they can only give the information about the setting in last file it is
>> > present.  Assume max_connections (or any other setting) is available
>> > in both postgresql.conf and postgresql.auto.conf, then it will display
>> > the information about the setting in postgresql.auto.conf, so now user
>> > might not be able to decide whether that is the setting he want to
>> > retain
>> > unless he knows the information about setting in postgresql.conf.
>> >
>> > Now as I have suggested upthread, that we can have a new view
>> > pg_file_settings which will display information about settings even
>> > when there exists multiple entries for the same in different files.
>> >
>> > I think adding such information to existing view pg_settings would
>> > be difficult as the same code is used for show commands which
>> > we don't want to change.
>> >
>>
>> I think this new view is updated only when postmaster received SIGHUP
>> or is started.
>> And we can have new function like pg_update_file_setting() which
>> updates this view.
>>
>
> If that is doable without much complication, then it might not be bad
> idea to just add additional columns to existing view (pg_settings).  I
> think you can once evaluate the details like what additional columns
> (other than what pg_settings has) are required and how you want to
> update them. After doing so further discussion could be more meaningful.
>
>

I have posted mail as another thread to discuss about this.


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: knowing detail of config files via SQL

2015-01-22 Thread Sawada Masahiko
Hi,

As per discussion
,
I would like to proposal new view like pg_file_settings to know detail
of config file via SQL.

- Background
In 9.4 postgresql.auto.conf is added to support ALTER SYSTEM command
and that config file is loaded after whenever postgresql.conf is
loaded.
That is, postgresql.auto.conf is quite high priority so that the value
in postgresql.conf can not work at all if DBA set it manually.
ALTER SYSTEM RESET command can remove the unnecessary value in
postgresql.auto.conf but  there are no way to know about where the
value has came from.
(They can only give the information about the setting in last file it
is present.)

- Solution
The patch not is implemented yet, just proposing now.
I'm imaging that we can have new pg_file_settings view has following
column to store current assigned value in config file.
 - guc value name
 - guc value
 - config file path (e.g. /opt/data/postgresql.sql,
/opt/data/postgresql.auto.conf, /opt/hoge.conf)
This view could be convenient for DBA to decide if the
postgresql.auto.conf is useful or not and if it's not useful then DBA
could use ALTER SYSTEM .. RESET command to remove the same from
postgresql.auto.conf.

Also other idea is to add additional columns existing view
(pg_settings), according to prev discussion.

Please give me comments.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes && superuser review

2015-01-22 Thread Stephen Frost
Adam, all,

* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
> After re-reading through this thread is seems like EXCLUSIVEBACKUP
> (proposed by Magnus) seemed to be a potentially acceptable alternative.

I just chatted a bit on IRC w/ Magnus about this and I'm on-board with
his proposal to use EXCLUSIVEBACKUP, but there's a couple additional
things which need doing as part of that:

We need to actually define what an 'exclusive backup' is in the
documentation- it's not there today.

pg_start_backup/pg_stop_backup docs should be updated to refer to those
operations as relating to "on-line exclusive backup," same as
pg_is_in_backup() and pg_backup_start_time() do.

The docs for the EXCLUSIVEBACKUP attribute should, of course, refer to
where EXCLUSIVE BACKUP is defined.

Hopefully that wraps up the last of the debate around the naming of
these options and we can move forward, once the patch is updated to
reflect this and docs are written.  Would be great to hear from anyone
who feels otherwise.

Regarding the pg_dump/read-only/etc discussion- that wasn't intended to
be part of this and I continue to feel it'd be best addressed on a new
thread specifically for that.  Once we have this initial round of role
attribute additions settled, I'd be more than happy to support that
discussion and see what we can do to provide such a role.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] collate test now failing

2015-01-22 Thread Bruce Momjian
On Thu, Jan 22, 2015 at 12:04:14PM -0500, Robert Haas wrote:
> On Thu, Jan 22, 2015 at 11:53 AM, Robert Haas  wrote:
> > On Thu, Jan 22, 2015 at 11:50 AM, Kevin Grittner  wrote:
> >> Kevin Grittner  wrote:
> >>> I think this may have just started with:
> >>>
> >>> b529b65d1bf8537ca7fa024760a9782d7c8b66e5
> >>
> >> Confirmed that I get a clean check on the prior commit.
> >
> > Can you check whether this fixes it?
> 
> Kevin says (via IM) that it does, but with a compiler warning.  Fixed
> that and pushed.
> 
> Back to watching the buildfarm returns roll in...

Does this explain the Windows failures too?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collate test now failing

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 11:53 AM, Robert Haas  wrote:
> On Thu, Jan 22, 2015 at 11:50 AM, Kevin Grittner  wrote:
>> Kevin Grittner  wrote:
>>> I think this may have just started with:
>>>
>>> b529b65d1bf8537ca7fa024760a9782d7c8b66e5
>>
>> Confirmed that I get a clean check on the prior commit.
>
> Can you check whether this fixes it?

Kevin says (via IM) that it does, but with a compiler warning.  Fixed
that and pushed.

Back to watching the buildfarm returns roll in...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-01-22 Thread Pavel Stehule
2015-01-22 12:37 GMT+01:00 Marko Tiikkaja :

> Hello,
>
> I just heard that there's going to be a fifth CF for 9.5 so I'm trying to
> gather all the patches I'd like to see in 9.5..
>
> On 8/23/13 10:36 AM, I wrote:
>
>> My opinion at this very moment is that we should leave the the DEFAULT
>> verbosity alone and add a new one (call it COMPACT or such) with the
>> suppressed context for non-ERRORs.
>>
>
> I wonder if a better option would be to add a GUC to control this from the
> server side.  plpgsql.suppress_simple_error_context or such, defaulting
> to false to maintain full backwards compatibility.  That could be set to
> true for development installations and for client programs which only care
> about having all information available, rather than readability or
> aesthetics.
>
> Or is that a stupid idea?  I just think hacking libpq for something like
> this is a huge overkill.
>

I don't think so only plpgsql  solution is satisfactory idea. There are
some mix plpgsql / plperl ... application - and it isn't possible to remove
error context from only one language.

Regards

Pavel


>
>
> .marko
>


Re: [HACKERS] collate test now failing

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 11:50 AM, Kevin Grittner  wrote:
> Kevin Grittner  wrote:
>> I think this may have just started with:
>>
>> b529b65d1bf8537ca7fa024760a9782d7c8b66e5
>
> Confirmed that I get a clean check on the prior commit.

Can you check whether this fixes it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


no-strxfrm-for-c.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collate test now failing

2015-01-22 Thread Kevin Grittner
Kevin Grittner  wrote:

> I think this may have just started with:
>
> b529b65d1bf8537ca7fa024760a9782d7c8b66e5


Confirmed that I get a clean check on the prior commit.

ubuntu 14.04 LTS


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] collate test now failing

2015-01-22 Thread Kevin Grittner
I think this may have just started with:

b529b65d1bf8537ca7fa024760a9782d7c8b66e5

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 10:57 AM, Robert Haas  wrote:
> On Wed, Jan 21, 2015 at 2:30 PM, Peter Geoghegan  wrote:
>> Even following Robert's disabling of abbreviated keys on Windows,
>> buildfarm animals hamerkop, brolga, currawong and bowerbird remain
>> unhappy, with failing regression tests for "collate" and sometimes
>> (but not always) "aggregates". Some of these only use the C locale.
>>
>> I think that "aggregates" does not fail for brolga because it's built
>> without "locale_t" support (not sure how to interpret MSVC "configure"
>> output, but I think that the other animals do have such support).  So
>> maybe this code being executed within btsortsupport_worker(), for the
>> C locale on Windows is at issue (note that abbreviation is still
>> disabled):
>>
>> tss->locale = pg_newlocale_from_collation(collid);
>
> Yes, you seem to have rather unwisely changed around the order of the
> checks in btsortsupport_worker().  I've just rewritten it heavily to
> try to more clearly separate the decision about whether to do
> fmgr-elision, and if so which comparator to use, from the decision
> about whether to use abbreviation.  Naturally I can't promise that
> this is completely right, but I hope that, if it's not, it will at
> least be easier to fix.  There's no sanity in removing the
> lc_collate_is_c() check from the top of the function and then trying
> to remember to exclude that case individually from the multiple places
> further down that count on the fact that they'll never be reached in
> that case.  This function may even have a third decision to make
> someday, and they can't all be intertwined.

This seems to have broken more stuff.  My working hypothesis is that
the culprit is here:

/*
 * There is no special handling of the C locale here, unlike with
 * varstr_cmp().  strxfrm() is used indifferently.
 */

As far as I can see, that's just hoping that when the locale is C,
strxfrm() is memcpy().  If it isn't, then you will potentially get
different results when the abbreviated keys stuff is used vs. when it
isn't, because when it isn't, we're going to memcmp(), and when it is,
we're going to memcmp() the results of strxfrm().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-22 Thread Tomas Vondra
Hi,

On 21.1.2015 09:01, Jeff Davis wrote:
> On Tue, 2015-01-20 at 23:37 +0100, Tomas Vondra wrote:
>> Tom's message where he points that out is here:
>> http://www.postgresql.org/message-id/20707.1396372...@sss.pgh.pa.us
> 
> That message also says:
> 
> "I think a patch that stood a chance of getting committed would need
> to detect whether the aggregate was being called in simple or
> grouped contexts, and apply different behaviors in the two cases."
> 
> I take that as an objection to any patch which does not distinguish 
> between the grouped and ungrouped aggregate cases, which includes
> your patch.

I don't think 'simple context' in this case means 'aggregate without a
group by clause'.

The way I understood it back in April 2014 was that while the patch
worked fine with grouped cases (i.e. in nodeAgg.c or such), the
underlying function are used elsewhere in a simple context (e.g. in an
xpath() or so) - and in this case it was broken. And that was a correct
point, and was fixed by the recent patches.

But maybe I'm missing something?

> I don't agree with that objection (or perhaps I don't understand
> it); but given the strong words above, I need to get some kind of
> response before I can consider committing your patch.
> 
>> I generally agree that having two API 'facets' with different behavior
>> is slightly awkward and assymetric, but I wouldn't call that ugly.
> 
> Right, your words are more precise (and polite). My apologies.

U ... I wasn't suggesting calling the resulting API 'ugly' is
impolite or so. It was meant just as a comment that the aesthetics of
the API is quite subjective matter. No apology needed.

>> I actually modified both APIs initially, but I think Ali is right 
>> that not breaking the existing API (and keeping the original
>> behavior in that case) is better. We can break it any time we want
>> in the future, but it's impossible to "unbreak it" ;-)
> 
> We can't break the old API, and I'm not suggesting that we do. I was
> hoping to find some alternative.

Why can't we? I'm not saying we should in this cases, but there are
cases when breaking the API is the right thing to do (e.g. when the
behavior changes radically, and needs to be noticed by the users).


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-01-22 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> Hello, as the discuttion on async fetching on postgres_fdw, FETCH
> with data-size limitation would be useful to get memory usage
> stability of postgres_fdw.

> Is such a feature and syntax could be allowed to be added?

This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical.  Have you got numbers
showing any actual performance win for postgres_fdw?

Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size.  That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire.  (The difference is not small:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-22 Thread Petr Jelinek

On 22/01/15 16:50, Heikki Linnakangas wrote:

On 01/12/2015 11:33 PM, Petr Jelinek wrote:

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.


Do we need DDL commands for this at all? I could go either way on that
question. We recently had a discussion on that wrt. index access methods
[1], and Tom opined that providing DDL for creating index access methods
is not worth it. The extension can just insert the rows into pg_seqam
with INSERT. Do we expect sequence access methods as extensions to be
more popular than index access methods? Maybe, because the WAL-logging
problem doesn't exist. But OTOH, if you're writing something like a
replication system that needs global sequences as part of it, there
aren't that many of those, and the installation scripts will need to
deal with more complicated stuff than inserting a row in pg_seqam.


I don't know about popularity, and I've seen the discussion about 
indexes. The motivation for DDL for me was handling dependencies 
correctly, that's all. If we say we don't care about that (and allow 
DROP EXTENSION even though user has sequences that are using the AM) 
then we don't need DDL.




[1] http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us


And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.


I'll take a look at that to see how the API works, but we're not going
to include it in the source tree, because it doesn't actually guarantee
gaplessness. That makes it a pretty dangerous example. I'm sure we can
come up with a better example that might even be useful. How about a
Lamport's clock sequence, which advances once per second, in addition to
when anyone calls nextval() ? Or a remote sequence that uses an FDW to
call nextval() in a foreign server?



I have updated patch ready, just didn't submit it because I am otherwise 
busy this week, I hope to get to it today evening or tomorrow morning, 
so I'd wait until that with looking at the patch.


The new version (the one that is not submitted yet) of gapless sequence 
is way more ugly and probably not best example either but does guarantee 
gaplessness (it stores the last value in it's own value table). So I am 
not sure if it should be included either...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 10:50 AM, Heikki Linnakangas
 wrote:
> On 01/12/2015 11:33 PM, Petr Jelinek wrote:
>> Second patch adds DDL support. I originally wanted to make it
>> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
>> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
>> which does not need to change anything (besides adding METHOD to
>> unreserved keywords).
>> The DDL support uses the DefineStmt infra with some very small change as
>> the sequence ams are not schema qualified, but I think it's acceptable
>> and saves considerable amount of boilerplate.
>
> Do we need DDL commands for this at all? I could go either way on that
> question. We recently had a discussion on that wrt. index access methods
> [1], and Tom opined that providing DDL for creating index access methods is
> not worth it. The extension can just insert the rows into pg_seqam with
> INSERT. Do we expect sequence access methods as extensions to be more
> popular than index access methods? Maybe, because the WAL-logging problem
> doesn't exist. But OTOH, if you're writing something like a replication
> system that needs global sequences as part of it, there aren't that many of
> those, and the installation scripts will need to deal with more complicated
> stuff than inserting a row in pg_seqam.

I think the main reason we don't need DDL for pg_am is because there's
no real hope of anybody actually inserting anything in there whether
we have a command for it or not.  If we actually expect this to be
used, I think it should probably have some kind of DDL, because
otherwise what will pg_dump emit?  "Nothing" is bad because then dumps
won't restore, and "direct inserts to pg_seqam" doesn't seem great
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-22 Thread Robert Haas
On Wed, Jan 21, 2015 at 2:30 PM, Peter Geoghegan  wrote:
> Even following Robert's disabling of abbreviated keys on Windows,
> buildfarm animals hamerkop, brolga, currawong and bowerbird remain
> unhappy, with failing regression tests for "collate" and sometimes
> (but not always) "aggregates". Some of these only use the C locale.
>
> I think that "aggregates" does not fail for brolga because it's built
> without "locale_t" support (not sure how to interpret MSVC "configure"
> output, but I think that the other animals do have such support).  So
> maybe this code being executed within btsortsupport_worker(), for the
> C locale on Windows is at issue (note that abbreviation is still
> disabled):
>
> tss->locale = pg_newlocale_from_collation(collid);

Yes, you seem to have rather unwisely changed around the order of the
checks in btsortsupport_worker().  I've just rewritten it heavily to
try to more clearly separate the decision about whether to do
fmgr-elision, and if so which comparator to use, from the decision
about whether to use abbreviation.  Naturally I can't promise that
this is completely right, but I hope that, if it's not, it will at
least be easier to fix.  There's no sanity in removing the
lc_collate_is_c() check from the top of the function and then trying
to remember to exclude that case individually from the multiple places
further down that count on the fact that they'll never be reached in
that case.  This function may even have a third decision to make
someday, and they can't all be intertwined.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-22 Thread Heikki Linnakangas

On 01/12/2015 11:33 PM, Petr Jelinek wrote:

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.


Do we need DDL commands for this at all? I could go either way on that 
question. We recently had a discussion on that wrt. index access methods 
[1], and Tom opined that providing DDL for creating index access methods 
is not worth it. The extension can just insert the rows into pg_seqam 
with INSERT. Do we expect sequence access methods as extensions to be 
more popular than index access methods? Maybe, because the WAL-logging 
problem doesn't exist. But OTOH, if you're writing something like a 
replication system that needs global sequences as part of it, there 
aren't that many of those, and the installation scripts will need to 
deal with more complicated stuff than inserting a row in pg_seqam.


[1] http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us


And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.


I'll take a look at that to see how the API works, but we're not going 
to include it in the source tree, because it doesn't actually guarantee 
gaplessness. That makes it a pretty dangerous example. I'm sure we can 
come up with a better example that might even be useful. How about a 
Lamport's clock sequence, which advances once per second, in addition to 
when anyone calls nextval() ? Or a remote sequence that uses an FDW to 
call nextval() in a foreign server?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fix typo in reinit.h

2015-01-22 Thread Alvaro Herrera
Sawada Masahiko wrote:
> Hi,
> 
> Attached patch fixes a typo in init.h.

Thanks, pushed.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app: changing email sender

2015-01-22 Thread Magnus Hagander
On Tue, Jan 20, 2015 at 10:58 AM, Andrew Gierth  wrote:

> > "Kyotaro" == Kyotaro HORIGUCHI 
> writes:
>
>  Kyotaro> Hmm. The mail address indeed *was* mine but is now obsolete,
>  Kyotaro> so that the email might bounce. But I haven't find how to
>  Kyotaro> change it within the app itself, and the PostgreSQL community
>  Kyotaro> account page.
>
> Just being able to change the email address on the community account
> isn't enough; I for one am subscribed to the lists with a different
> email address than the one associated with my community account (for
> spam management reasons). There needs to be a way to have multiple
> addresses or to specify which is to be used for the post.
>

This should also be fixed now. You can click Edit Profile, and add one or
more extra email addresses there (after email verification of course), and
then pick one of those to use for sending email.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] fix typo in reinit.h

2015-01-22 Thread Sawada Masahiko
Hi,

Attached patch fixes a typo in init.h.

Regards,

---
Sawada Masahiko


fix_typo_reinit_h.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2015-01-22 Thread Kevin Grittner
Robert Haas  wrote:

> Another idea is to change what actually gets passed to the parser
> callback.  Right now we just pass the PLpgSQL_expr.  If we created a
> new structure that contained that plus the PLpgSQL_execstate, we'd be
> in fine shape.  But this sort of gets at the root of the problem here:
> with variables, the parser callback doesn't return the actual *value*,
> it returns a Param node that will later, at execution time, be looked
> up to find the actual value.  With relations, we're sort of doing the
> same thing - the tuplestore RTE doesn't need to contain the actual
> data, just the tuple descriptor.  But the data structures from which
> we can get that information also contain the actual execution-time
> information, so passing them into parse-time callbacks seems like it
> might be, if nothing else, a recipe for future bugs.

That was, of course, why this patch evolved to using this structure
during parsing:

typedef struct TsrmdData
{
char   *name;  /* name used to identify the 
tuplestore */
TupleDesc   tupdesc;/* description of result rows */
} TsrmdData;

typedef TsrmdData *Tsrmd;

... and this during execution:

typedef struct TsrData
{
TsrmdData  md;
Tuplestorestate*tstate; /* data (or tids) */
} TsrData;

typedef TsrData *Tsr;

The big problem, as I see it, is how to deliver these to where they
are needed.  I didn't think it was that hard to do with the parser
hook; it's what to do to get the execution time structure to where
it's needed that I can't figure out.  Passing it with the
parameters is tricky because we're often passing a NULL for the
reference to the parameter list when we need these.  Trying to coax
the executor to pass in a parameter list when there are no
parameters, just these ephemeral relations, seems very tricky and
all solutions I have tried (other than the one Heikki and others
have objected to) very fragile.

In short, the only solution which I've been able to come up with
that works (and seems to me solid enough to commit) is the one that
Hekki, Tom, and Robert seem to think should be made more like
parameter handling; and every attempt at handling these relations
like parameters seems to me too fragile for me to feel it is worthy
of commit.

We're really down to the wire on getting this feature into 9.5; and
we're way past what was initially my goal, which was to build on
this to get some incremental maintenance of (some types of simple)
materialized views into 9.5.  IMO we need to be able to build up
tuplestores and easily reference them from within complex queries
to create any sane and efficient implementation of incremental
maintenance of materialized views.  This patch was mainly intended
to make progress on the MV front, with an incidental benefit of
providing a standard feature that I have seen requested a few times.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 9:02 AM, Amit Kapila  wrote:
>> I'm confused.  Your actual test numbers seem to show that the
>> performance with the block-by-block approach was slightly higher with
>> parallelism than without, where as the performance with the
>> chunk-by-chunk approach was lower with parallelism than without, but
>> the text quoted above, summarizing those numbers, says the opposite.
>
> Sorry for causing confusion, I should have been more explicit about
> explaining the numbers.  Let me try again,
> Values in columns is time in milliseconds to complete the execution,
> so higher means it took more time.  If you see in block-by-block, the
> time taken to complete the execution with 2 workers is more than
> no workers which means parallelism has degraded the performance.

*facepalm*

Oh, yeah, right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-22 Thread Amit Kapila
On Thu, Jan 22, 2015 at 7:23 PM, Robert Haas  wrote:
>
> On Thu, Jan 22, 2015 at 5:57 AM, Amit Kapila 
wrote:
> > 1. Scanning block-by-block has negative impact on performance and
> > I thin it will degrade more if we increase parallel count as that can
lead
> > to more randomness.
> >
> > 2. Scanning in fixed chunks improves the performance. Increasing
> > parallel count to a very large number might impact the performance,
> > but I think we can have a lower bound below which we will not allow
> > multiple processes to scan the relation.
>
> I'm confused.  Your actual test numbers seem to show that the
> performance with the block-by-block approach was slightly higher with
> parallelism than without, where as the performance with the
> chunk-by-chunk approach was lower with parallelism than without, but
> the text quoted above, summarizing those numbers, says the opposite.
>

Sorry for causing confusion, I should have been more explicit about
explaining the numbers.  Let me try again,
Values in columns is time in milliseconds to complete the execution,
so higher means it took more time.  If you see in block-by-block, the
time taken to complete the execution with 2 workers is more than
no workers which means parallelism has degraded the performance.

> Also, I think testing with 2 workers is probably not enough.  I think
> we should test with 8 or even 16.
>

Sure, will do this and post the numbers.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-01-22 Thread Robert Haas
On Thu, Jan 22, 2015 at 5:57 AM, Amit Kapila  wrote:
> 1. Scanning block-by-block has negative impact on performance and
> I thin it will degrade more if we increase parallel count as that can lead
> to more randomness.
>
> 2. Scanning in fixed chunks improves the performance. Increasing
> parallel count to a very large number might impact the performance,
> but I think we can have a lower bound below which we will not allow
> multiple processes to scan the relation.

I'm confused.  Your actual test numbers seem to show that the
performance with the block-by-block approach was slightly higher with
parallelism than without, where as the performance with the
chunk-by-chunk approach was lower with parallelism than without, but
the text quoted above, summarizing those numbers, says the opposite.

Also, I think testing with 2 workers is probably not enough.  I think
we should test with 8 or even 16.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2015-01-22 Thread Robert Haas
On Thu, Oct 23, 2014 at 11:19 AM, Robert Haas  wrote:
> So what I'm imagining now is:
>
> 1. During parse analysis, p_tableref_hook gets control and calls
> addRangeTableEntryForTuplestore(), creating an RTE of type
> RTE_TUPLESTORE.  The RTE stores an integer parameter-index.
>
> 2. Path generation doesn't need to do anything very exciting; it just
> generates a Path node of type T_TuplestoreScan.  The RTE is still
> available, so the path itself doesn't need to know which tuplestore
> we're referencing, because that information is present in the RTE.
>
> 3. At plan generation time, we look up the RTE for the path and
> extract the parameter index, which is what gets stored in the
> TuplestoreScan node.
>
> 4. At executor initialization time, we use the parameter index in the
> TuplestoreScan to index into the EState's es_param_list_info and
> retrieve the tuplestore.

I spent some time poking at this yesterday, based on commit
5060b9352b0d0301ffb002355f0572e93f8b05fe from
https://github.com/kgrittn/postgres.git

Here's where I got stuck:

The plpgsql_parser_setup() callback sets pstate->p_ref_hook_state =
(void *) expr, so if we add p_tableref_hook as an additional callback,
that's what it has to work with to find the information needed to
generate a RangeTblEntry.  That is a PLpgSQL_expr, and it contains a
pointer to the PLpgSQL_function, which is created when the function is
compiled, which seems good, but the information we need is not there.
Specifically, we need to know the names the user picked for the old
and new tuplestores (tgoldtable and tgnewtable) and we need to know
what the tuple descriptor should be, and the PLpgSQL_function hasn't
got it.

It does not seem impossible to fix that, but I'm not sure it's safe.
do_compile() has the FunctionCallInfo, so from there it can get at the
TriggerData and the Trigger.  The trigger has got tgoldtable and
tgnewtable, and the TriggerData has got tg_relation, so everything we
need is there.  We could add pointers to the relevant stuff to the
PLpgSQL_function, and then the parser callbacks could get at it.
However, I'm not sure that's really OK -- presumably, tg_relation is
going to be valid only during the initial compile.  If somebody came
back and looked at that PLpgSQL_function again later, and tried to
follow that pointer, bad things would happen.  In practice maybe it
would be OK because the only likely reason to come back and look at
the PLpgSQL_function again is because we're recompiling, and at that
point we'd have a new relation pointer to copy in there, and so it
would probably be OK.  But that feels mighty ugly.

Another idea is to change what actually gets passed to the parser
callback.  Right now we just pass the PLpgSQL_expr.  If we created a
new structure that contained that plus the PLpgSQL_execstate, we'd be
in fine shape.  But this sort of gets at the root of the problem here:
with variables, the parser callback doesn't return the actual *value*,
it returns a Param node that will later, at execution time, be looked
up to find the actual value.  With relations, we're sort of doing the
same thing - the tuplestore RTE doesn't need to contain the actual
data, just the tuple descriptor.  But the data structures from which
we can get that information also contain the actual execution-time
information, so passing them into parse-time callbacks seems like it
might be, if nothing else, a recipe for future bugs.

Any suggestions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-01-22 Thread Marko Tiikkaja

Hello,

I just heard that there's going to be a fifth CF for 9.5 so I'm trying 
to gather all the patches I'd like to see in 9.5..


On 8/23/13 10:36 AM, I wrote:

My opinion at this very moment is that we should leave the the DEFAULT
verbosity alone and add a new one (call it COMPACT or such) with the
suppressed context for non-ERRORs.


I wonder if a better option would be to add a GUC to control this from 
the server side.  plpgsql.suppress_simple_error_context or such, 
defaulting to false to maintain full backwards compatibility.  That 
could be set to true for development installations and for client 
programs which only care about having all information available, rather 
than readability or aesthetics.


Or is that a stupid idea?  I just think hacking libpq for something like 
this is a huge overkill.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app: changing email sender

2015-01-22 Thread Kyotaro HORIGUCHI
Lovely!

> I've just pushed a change to the main website which now lets you change the
> email address there. That meas you can go to
> https://www.postgresql.org/account/profile/ and change the email. After you
> have confirmed the change (an email will be sent to your new address when
> you change it), the the new one should be valid on the cf app (if it
> doesn't show up, log out of the cf app and back in, and it should show up).
> 
> I will look into Andrews issue of being able to have multiple email
> addresses soon as well - but one feature per evening :)

It works perfectly. Thanks.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-22 Thread Amit Kapila
On Mon, Jan 19, 2015 at 6:50 PM, Robert Haas  wrote:
>
> On Mon, Jan 19, 2015 at 2:24 AM, Amit Kapila 
wrote:
>
> > Another thing is that I think prefetching is not supported on all
platforms
> > (Windows) and for such systems as per above algorithm we need to
> > rely on block-by-block method.
>
> Well, I think we should try to set up a test to see if this is hurting
> us.  First, do a sequential-scan of a related too big at least twice
> as large as RAM.  Then, do a parallel sequential scan of the same
> relation with 2 workers.  Repeat these in alternation several times.
> If the operating system is accomplishing meaningful readahead, and the
> parallel sequential scan is breaking it, then since the test is
> I/O-bound I would expect to see the parallel scan actually being
> slower than the normal way.
>

I have taken some performance data as per above discussion.  Basically,
I have used parallel_count module which is part of parallel-mode patch
as that seems to be more close to verify the I/O pattern (doesn't have any
tuple communication overhead).

Script used to test is attached (parallel_count.sh)

Performance Data

Configuration and Db Details

IBM POWER-7 16 cores, 64 hardware threads
RAM = 64GB

Table Size - 120GB

Used below statements to create table -
create table tbl_perf(c1 int, c2 char(1000));
insert into tbl_perf values(generate_series(1,1000),'a');
insert into tbl_perf values(generate_series(1001,3000),'a');
insert into tbl_perf values(generate_series(3001,11000),'a');

  *Block-By-Block*

 *No. of workers/Time (ms)* *0* *2*  Run-1 267798 295051  Run-2 276646
296665  Run-3 281364 314952  Run-4 290231 326243  Run-5 288890 295684


Then I have modified the parallel_count module such that it can scan in
fixed chunks, rather than block-by-block, the patch for same is attached
(parallel_count_fixed_chunk_v1.patch, this is a patch based on parallel
count module in parallel-mode patch [1]).

  *Fixed-Chunks*

 *No. of workers/Time (ms)* *0* *2*
286346 234037
250051 215111
255915 254934
263754 242228
251399 202581

Observations

1. Scanning block-by-block has negative impact on performance and
I thin it will degrade more if we increase parallel count as that can lead
to more randomness.
2. Scanning in fixed chunks improves the performance. Increasing
parallel count to a very large number might impact the performance,
but I think we can have a lower bound below which we will not allow
multiple processes to scan the relation.


Now I can go-ahead and try with prefetching approach as suggested
by you, but I have a feeling that overall it might not be beneficial (mainly
due to the reason that it is not supported on all platforms, we can say
that we don't care for such platforms, but still there is no mitigation
strategy
for those platforms) due to the reasons mentioned up-thread.

Thoughts?

[1]
http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_count.sh
Description: Bourne shell script


parallel_count_fixed_chunk_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [POC] FETCH limited by bytes.

2015-01-22 Thread Kyotaro HORIGUCHI
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

== 

Postgres_fdw fetches tuples from remote servers using cursor. The
transfer gets faster as the number of fetch decreases. On the
other hand buffer size for the fetched tuples widely varies
according to their average length. 100 tuples per fetch is quite
small for short tuples but larger fetch size will easily cause
memory exhaustion. However, there's no way to know it in advance.

One means to settle the contradiction would be a FETCH which
sends result limiting by size, not the number of tuples. So I'd
like to propose this.
 

This patch is a POC for the feature. For exapmle,

  FETCH 1 LIMIT 100 FROM c1;

This FETCH retrieves up to 1 tuples but cut out just after
the total tuple length exceeds 1MB. (It does not literally
"LIMIT" in that sense)

The syntax added by this patch is described as following.

 FETCH [FORWARD|BACKWARD]  LIMIT Iconst [FROM|IN] curname

The "data size" to be compared with the LIMIT size is the
summation of the result of the following expression. The
appropriateness of it should be arguable.

[if tupleslot has tts_tuple]
   HEAPTUPLESIZE + slot->tts_tuple->t_len
[else]
   HEAPTUPLESIZE + 
 heap_compute_data_size(slot->tts_tupleDescriptor,
slot->tts_values,
slot->tts_isnull);



This patch does following changes,

- This patch adds the parameter "size" to following functions
  (standard_)ExecutorRun / ExecutePlan / RunFromStore
  PortalRun / PortalRunSelect / PortalRunFetch / DoPortalRunFetch

- The core is in StandardExecutorRun and RunFromStore. Simplly
  sum up the sent tuple length and compare against the given
  limit.

- struct FetchStmt and EState has new member.

- The modifications in gram.y affects on ecpg parser. I think I
  could fix them but with no confidence :(

- Modified the corespondence parts of the changes above in
  auto_explain and pg_stat_statments only in parameter list.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 6f1dd6998ba312c3552f137365e3a3118b7935be Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 21 Jan 2015 17:18:09 +0900
Subject: [PATCH] Size limitation feature of FETCH v0

---
 contrib/auto_explain/auto_explain.c |  8 +--
 contrib/pg_stat_statements/pg_stat_statements.c |  8 +--
 src/backend/commands/copy.c |  2 +-
 src/backend/commands/createas.c |  2 +-
 src/backend/commands/explain.c  |  2 +-
 src/backend/commands/extension.c|  2 +-
 src/backend/commands/matview.c  |  2 +-
 src/backend/commands/portalcmds.c   |  3 +-
 src/backend/commands/prepare.c  |  2 +-
 src/backend/executor/execMain.c | 35 +++--
 src/backend/executor/execUtils.c|  1 +
 src/backend/executor/functions.c|  2 +-
 src/backend/executor/spi.c  |  4 +-
 src/backend/parser/gram.y   | 59 +++
 src/backend/tcop/postgres.c |  2 +
 src/backend/tcop/pquery.c   | 95 +
 src/include/executor/executor.h |  8 +--
 src/include/nodes/execnodes.h   |  1 +
 src/include/nodes/parsenodes.h  |  1 +
 src/include/tcop/pquery.h   |  3 +-
 src/interfaces/ecpg/preproc/Makefile|  2 +-
 src/interfaces/ecpg/preproc/ecpg.addons | 63 
 22 files changed, 248 insertions(+), 59 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 2a184ed..f121a33 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -57,7 +57,7 @@ void		_PG_fini(void);
 static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
 static void explain_ExecutorRun(QueryDesc *queryDesc,
 	ScanDirection direction,
-	long count);
+	long count, long size);
 static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
@@ -232,15 +232,15 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * ExecutorRun hook: all we need do is track nesting depth
  */
 static void
-explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
+explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count, long size)
 {
 	nesting_level++;
 	PG_TRY();
 	{
 		if (prev_ExecutorRun)
-			prev_ExecutorRun(queryDesc, direction, count);
+			prev_ExecutorRun(queryDesc, direction, count, size);
 		else
-			standard_ExecutorRun(queryDesc, direction, count);
+			standard_ExecutorRun(queryDesc, direction, count, size);
 		nesting_level-