Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Ashutosh Bapat
On Mon, Jan 5, 2015 at 8:42 PM, Atri Sharma atri.j...@gmail.com wrote:


 Hi All,

 Please forgive if this is a repost.

 Please find attached patch for supporting ORDER BY clause in CREATE
 FUNCTION for SRFs. Specifically:

 CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof
 record as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

 This shall allow for capturing information about existing preorder that
 might be present inherently in the SRF's input or algorithm (the above
 example and think generate_series).

 This allows for eliminating sorts that can be based off the known existing
 preorder. For eg:

 SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to
 sort by e since  existing preorder is known.

 Eliminating such sorts can be a huge gain, especially if the expected
 input to needed Sort node is large.

 The obvious question that comes is what happens if specified ORDER BY
 clause is false. For checking the order, a new node is added which is top
 node of the plan and is responsible for projecting result rows. It tracks
 the previous row seen and given a sort order, ensures that the current
 tuple to be projected is in the required sort order.

 So, for above example

 EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
   QUERY PLAN
 ---
  OrderCheck
-  Function Scan on correct_order_multicol
 (2 rows)


 If order of result rows is not the same as required, an error is raised:

 SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
 ERROR:  Order not same as specified



 Preorder columns are first transformed into SortGroupClauses first and
 then stored directly in pg_proc.


 This functionality is a user case seen functionality, and is especially
 useful when SRF inputs are large and/or might be pipelined from another
 function (SRFs are used in pipelines in analytical systems many times, with
 large data).

 The overhead of this patch is small. A new path is added for the preorder
 keys, and OrderCheck node's additional cost is pretty low, given that it
 only compares two rows and stores only a single row (previous row seen),
 hence the memory footprint is minuscule.


We can eliminate the new node and put onus or having the right order on the
user like we do with volatile setting of the function.


 In the inner joins thread, Tom mentioned having a new node which has
 multiple plans and executor can decide which plan to execute given runtime
 conditions. I played around with the idea, and am open to experiment having
 a new node which has a Sort based plan and is executed in case OrderCheck
 node sees that the inherent order of result tuples is not correct. Feedback
 here would be very welcome.


 I will add the patch to current commitfest.

 Thoughts?

 Regards,

 Atri


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma



 The overhead of this patch is small. A new path is added for the preorder
 keys, and OrderCheck node's additional cost is pretty low, given that it
 only compares two rows and stores only a single row (previous row seen),
 hence the memory footprint is minuscule.


 We can eliminate the new node and put onus or having the right order on
 the user like we do with volatile setting of the function.



That is exactly what the new node does, since we are not re sorting right
now in case the order is wrong. Please see my explanation upthread,
OrderCheck node's primary purpose is to check for a user error in the
result rows order. The onus right now to give correct order is on user.

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Ashutosh Bapat
On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote:



 The overhead of this patch is small. A new path is added for the
 preorder keys, and OrderCheck node's additional cost is pretty low, given
 that it only compares two rows and stores only a single row (previous row
 seen), hence the memory footprint is minuscule.


 We can eliminate the new node and put onus or having the right order on
 the user like we do with volatile setting of the function.



 That is exactly what the new node does, since we are not re sorting right
 now in case the order is wrong. Please see my explanation upthread,
 OrderCheck node's primary purpose is to check for a user error in the
 result rows order. The onus right now to give correct order is on user.


Even checking whether the output of the function is in the right order or
not, has its cost. I am suggesting that we can eliminate this cost as well.
For example, PostgreSQL does not check whether a function is really
immutable or not.


 Regards,

 Atri
 --
 Regards,

 Atri
 *l'apprenant*




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:

 On 06-01-2015 PM 04:00, Ashutosh Bapat wrote:
  On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com
 wrote:
  We can eliminate the new node and put onus or having the right order on
  the user like we do with volatile setting of the function.
 
 
 
  That is exactly what the new node does, since we are not re sorting
 right
  now in case the order is wrong. Please see my explanation upthread,
  OrderCheck node's primary purpose is to check for a user error in the
  result rows order. The onus right now to give correct order is on user.
 
 
  Even checking whether the output of the function is in the right order or
  not, has its cost. I am suggesting that we can eliminate this cost as
 well.
  For example, PostgreSQL does not check whether a function is really
  immutable or not.
 

 Sounds something like ORDERED BY x implying output is known ordered by
 x perhaps enough hint for the planner to make necessary plan choices
 though I may be wrong.



I may be missing something, but isnt what you mentioned the exact
functionality this patch adds?


Re: [HACKERS] pg_rewind in contrib

2015-01-05 Thread Michael Paquier
On Tue, Jan 6, 2015 at 2:38 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Here's an updated version of pg_rewind. The code itself is the same as
 before, and corresponds to the sources in the current pg_rewind github
 repository, as of commit a65e3754faf9ca9718e6b350abc736de586433b7. Based
 mostly on Michael's comments, I have:

 * replaced VMware copyright notices with PGDG ones.
 * removed unnecessary cruft from .gitignore
 * changed the --version line and report bugs notice in --help to match
 other binaries in the PostgreSQL distribution
 * moved documentation from README to the user manual.
 * minor fixes to how the regression tests are launched so that they work
 again

 Some more work remains to be done on the regression tests. The way they're
 launched now is quite weird. It was written that way to make it work outside
 the PostgreSQL source tree, and also on Windows. Now that it lives in
 contrib, it should be redesigned.

 The documentation could also use some work; for now I just converted the
 existing text from README to sgml format.

Some more comments:
- The binary pg_rewind needs to be ignored in contrib/pg_rewind/
- Be careful that ./launcher permission should be set to 755 when
doing a git add in it... Or regression tests will fail.
- It would be good to get make check working so as it includes both
check-local and check-remote
- installcheck should be a target ignored.
- Nitpicking: the header formats of filemap.c, datapagemap.c,
datapagemap.h and util.h are incorrect (I pushed a fix about that in
pg_rewind itself, feel free to pick it up).
- parsexlog.c has a copyright mention to Nippon Telegraph and
Telephone Corporation. Cannot we drop it safely?
- MSVC build is not supported yet. You need to do something similar to
pg_xlogdump, aka some magic with for example xlogreader.c.
- Error codes needs to be generated before building pg_rewind. If I do
for example a simple configure followed by make I get a failure:
$ ./configure
$ cd contrib/pg_rewind  make
In file included from parsexlog.c:16:
In file included from ../../src/include/postgres.h:48:
../../src/include/utils/elog.h:69:10: fatal error: 'utils/errcodes.h'
file not found
#include utils/errcodes.h
- Build fails with MinGW as there is visibly some unportable code:
copy_fetch.c: In function 'recurse_dir':
copy_fetch.c:112:3: warning: implicit declaration of function 'S_ISLNK' [-Wimpli
cit-function-declaration]
   else if (S_ISLNK(fst.st_mode))
   ^
copy_fetch.c: In function 'check_samefile':
copy_fetch.c:298:2: warning: passing argument 2 of '_fstat64i32' from incompatib
le pointer type [enabled by default]
  if (fstat(fd1, statbuf1)  0)
  ^
In file included from ../../src/include/port.h:283:0,
 from ../../src/include/c.h:1050,
 from ../../src/include/postgres_fe.h:25,
 from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg
ument is of type 'struct stat *'
 __CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat)
{
^
copy_fetch.c:304:2: warning: passing argument 2 of '_fstat64i32' from incompatib
le pointer type [enabled by default]
  if (fstat(fd2, statbuf2)  0)
  ^
In file included from ../../src/include/port.h:283:0,
 from ../../src/include/c.h:1050,
 from ../../src/include/postgres_fe.h:25,
 from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg
ument is of type 'struct stat *'
 __CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat)
{
^
copy_fetch.c: In function 'truncate_target_file':
copy_fetch.c:436:2: warning: implicit declaration of function 'truncate' [-Wimpl
icit-function-declaration]
  if (truncate(dstpath, newsize) != 0)
  ^
copy_fetch.c: In function 'slurpFile':
copy_fetch.c:561:2: warning: passing argument 2 of '_fstat64i32' from incompatib
le pointer type [enabled by default]
  if (fstat(fd, statbuf)  0)
  ^
In file included from ../../src/include/port.h:283:0,
 from ../../src/include/c.h:1050,
 from ../../src/include/postgres_fe.h:25,
 from copy_fetch.c:10:
c:\mingw\include\sys\stat.h:200:32: note: expected 'struct _stat64i32 *' but arg
ument is of type 'struct stat *'
 __CRT_MAYBE_INLINE int __cdecl _fstat64i32(int desc, struct _stat64i32 *_stat)
{
^
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We
ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f
wrapv -fexcess-precision=standard -g -DG -fno-omit-frame-pointer -I../../src/int
erfaces/libpq -I. -I. -I../../src/include -DFRONTEND  -I../../src/include/port/
win32  -c -o libpq_fetch.o libpq_fetch.c -MMD -MP -MF .deps/libpq_fetch.Po
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We
ndif-labels 

Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

2015-01-05 Thread Michael Paquier
On Tue, Jan 6, 2015 at 12:39 AM, Manuel Kniep man...@adjust.com wrote:
 Hi,

 we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with gcc 
 4.8.3
 Any ideas ?

 #17 0x0062bb9d in SPI_execute_with_args (
src=0x22b880bb0 \nCREATE TEMPORARY TABLE
 [...]
 #33 0x7f363555ab97 in plpgsql_exec_function (func=0xd888c8, 
 fcinfo=0x7aa89a60) at pl_exec.c:321
 #34 0x7f3632be in plpgsql_call_handler (fcinfo=0x7aa89a60) at 
 pl_handler.c:129
 [...]
 #46 0x0072e4eb in exec_simple_query (
 query_string=0xd633b0 SELECT 'event' as item, '2014-12-30' as date, 
 'Backends::Backend9' as backend, '33' as bucket,  * FROM 
 materialize_events('2014-11-20', '2014-12-30')) at postgres.c:1048
From the backtrace you are showing, you are creating a temporary table
with CREATE TABLE AS within a plpgsql function. Could you provide a
self-contained test case?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-05 Thread Michael Paquier
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 Attached is v3 which besides the fixes mentioned above also includes changes
 discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
 crash with FETCH FIRST and is rebased against current master.
This patch needs a rebase, there is a small conflict in parallel_schedule.

Structurally speaking, I think that the tsm methods should be added in
src/backend/utils and not src/backend/access which is more high-level
as tsm_bernoulli.c and tsm_system.c contain only a set of new
procedure functions. Having a single header file tsm.h would be also a
good thing.

Regarding the naming, is tsm (table sample method) really appealing?
Wouldn't it be better to use simply tablesample_* for the file names
and the method names?

This is a large patch... Wouldn't sampling.[c|h] extracted from
ANALYZE live better as a refactoring patch? This would limit a bit bug
occurrences on the main patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2015-01-05 Thread Michael Paquier
On Fri, Dec 19, 2014 at 3:53 PM, Noah Misch n...@leadboat.com wrote:
 localhost template1=# select clock_timestamp(), pg_sleep(.1 * (n % 2)) from 
 generate_series(0,7) t(n);
 clock_timestamp| pg_sleep
 ---+--
  2014-12-18 08:34:34.522126+00 |
  2014-12-18 08:34:34.522126+00 |
  2014-12-18 08:34:34.631508+00 |
  2014-12-18 08:34:34.631508+00 |
  2014-12-18 08:34:34.74089+00  |
  2014-12-18 08:34:34.74089+00  |
  2014-12-18 08:34:34.850272+00 |
  2014-12-18 08:34:34.850272+00 |
 (8 rows)
So, we would need additional information other than the node ID *and*
the timestamp to ensure proper transaction commit ordering on Windows.
That's not cool and makes this feature very limited on this platform.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2015-01-05 Thread Amit Langote
On 18-12-2014 AM 04:52, Robert Haas wrote:
 On Wed, Dec 17, 2014 at 1:53 PM, Josh Berkus j...@agliodbs.com wrote:

 Sure.  But there's a big difference between we're going to take these
 steps and that problem will be fixable eventually and we're going to
 retain features of the current partitioning system which make that
 problem impossible to fix.  The drift of discussion on this thread
 *sounded* like the latter, and I've been calling attention to the issue
 in an effort to make sure that it's not.

 Last week, I wrote a longish email listing out the common problems users
 have with our current partitioning as a way of benchmarking the plan for
 new partitioning.  While some people responded to that post, absolutely
 nobody discussed the list of issues I gave.  Is that because there's
 universal agreement that I got the major issues right?  Seems doubtful.
 
 I agreed with many of the things you listed but not all of them.
 However, I don't think it's realistic to burden whatever patch Amit
 writes with the duty of, for example, making global indexes work.
 That's a huge problem all of its own.  Now, conceivably, we could try
 to solve that as part of the next patch by insisting that the
 partitions have to really be block number ranges within a single
 relfilenode rather than separate relfilenodes as they are today ...
 but I think that's a bad design which we would likely regret bitterly.
 I also think that it would likely make what's being talked about here
 so complicated that it will never go anywhere.  I think it's better
 that we focus on solving one problem really well - storing metadata
 for partition boundaries in the catalog so that we can do efficient
 tuple routing and partition pruning - and leave the other problems for
 later.
 

Yes, I think partitioning as a whole is a BIG enough project that we
need to tackle it as a series of steps each of which is a discussion of
its own. The first step might as well be discussing how we represent a
partitioned table. We have a number of design decisions to make during
this step itself and we would definitely want to reach a consensus on
these points.

Things like where we indicate if a table is partitioned (pg_class), what
the partition key looks like, where it is stored, what the partition
definition looks like, where it is stored, how we represent arbitrary
number of levels in partitioning hierarchy, how we implement that only
leaf level relations in a hierarchy have storage, what are implications
of all these choices, etc. Some of these points are being discussed.

I agree that while we are discussing these points, we could also be
discussing how we solve problems of existing partitioning implementation
using whatever the above things end up being. Proposed approaches to
solve those problems might be useful to drive the first step as well or
perhaps that's how it should be done anyway.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Amit Langote
On 06-01-2015 PM 04:00, Ashutosh Bapat wrote:
 On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote:
 We can eliminate the new node and put onus or having the right order on
 the user like we do with volatile setting of the function.



 That is exactly what the new node does, since we are not re sorting right
 now in case the order is wrong. Please see my explanation upthread,
 OrderCheck node's primary purpose is to check for a user error in the
 result rows order. The onus right now to give correct order is on user.


 Even checking whether the output of the function is in the right order or
 not, has its cost. I am suggesting that we can eliminate this cost as well.
 For example, PostgreSQL does not check whether a function is really
 immutable or not.
 

Sounds something like ORDERED BY x implying output is known ordered by
x perhaps enough hint for the planner to make necessary plan choices
though I may be wrong.

Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
On Tue, Jan 6, 2015 at 12:30 PM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:



 On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote:




 Even checking whether the output of the function is in the right order
 or not, has its cost. I am suggesting that we can eliminate this cost as
 well. For example, PostgreSQL does not check whether a function is really
 immutable or not.



That implies possibly returning a non ordered result set even when the user
explicitly specified an ORDER BY clause. If we are depending on an
optimization and it did not work out (even if it is a user error), I think
we should error out indicating that the order was incorrect rather than
returning non ordered rows, which could be disastrous IMO.


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-05 Thread Peter Geoghegan
On Mon, Jan 5, 2015 at 5:53 PM, Peter Geoghegan p...@heroku.com wrote:
 It's not all that easy to recreate, even with my custom
 instrumentation. With fsync=off, it occurs somewhat infrequently with
 a custom instrumentation Postgres build:

 pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 10
 [Mon Jan  5 17:31:57 2015] init done at count_upsert.pl line 101.
 [Mon Jan  5 17:32:17 2015] child abnormal exit [Mon Jan  5 17:32:17
 2015] DBD::Pg::db selectall_arrayref failed: ERROR:  mismatch in xmin
 for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line
 210.\n  at count_upsert.pl line 227.
 [Mon Jan  5 17:32:49 2015] sum is -800
 [Mon Jan  5 17:32:49 2015] count is 9515
 [Mon Jan  5 17:32:49 2015] normal exit at 1420507969 after 733912
 items processed at count_upsert.pl line 182.

 I think that super deleting broken promise tuples undermines how
 vacuum interlocking is supposed to work [1].

Hmm. On second thought, I think that the instrumentation I added here
can be confused by redirecting line pointers, and that may account for
this apparent problem. Still, I would like for us to figure out how it
was previously possible for the implementation to update a
super-deleted tuple (resulting in unusual all-NULLs tuples) since I
though that in order for that to happen, other xacts would have to see
what was only ever a promise tuple as a fully-fledged tuple -- in
order for the logic that pre-checks for conflicts to find a conflict,
it would have to find the tuple already committed (so it certainly
couldn't be an unresolved promise tuple). My superficial fix [1] was
written without fully understanding what the problem was.

In any case, I doubt the changes that I made to tqual.c in that fix
will be accepted as-is.

[1] 
http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-01-05 Thread Heikki Linnakangas

On 01/04/2015 11:44 PM, Josh Berkus wrote:

On 01/03/2015 12:56 AM, Heikki Linnakangas wrote:

On 01/03/2015 12:28 AM, Josh Berkus wrote:

On 01/02/2015 01:57 AM, Heikki Linnakangas wrote:

wal_keep_segments does not affect the calculation of CheckPointSegments.
If you set wal_keep_segments high enough, checkpoint_wal_size will be
exceeded. The other alternative would be to force a checkpoint earlier,
i.e. lower CheckPointSegments, so that checkpoint_wal_size would be
honored. However, if you set wal_keep_segments high enough, higher than
checkpoint_wal_size, it's impossible to honor checkpoint_wal_size no
matter how frequently you checkpoint.


So you're saying that wal_keep_segments is part of the max_wal_size
total, NOT in addition to it?


Not sure what you mean. wal_keep_segments is an extra control that can
prevent WAL segments from being recycled. It has the same effect as
archive_command failing for N most recent segments, if that helps.


I mean, if I have these settings:

max_wal_size* = 256MB
wal_keep_segments = 8

... then my max wal size is *still* 256MB, NOT 384MB?


Right.


If that's the case (and I think it's a good plan), then as a follow-on,
we should prevent users from setting wal_keep_segments to more than 50%
of max_wal_size, no?


Not sure if the 50% figure is correct, but I see what you mean: don't 
allow setting wal_keep_segments so high that we would exceed 
max_wal_size because of it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread Fabien COELHO



I'm also just looking at you ERROR() macro, it seems that core code is
quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is
not defined. I'd say this needs to be fixed too. Have a look at the trick
used in elog.h for when HAVE__VA_ARGS  is not defined.


Here is a v5 with the vararg macro expanded.

--
Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..2d4033b 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,9 @@ PGFILEDESC = pgbench - a simple program for running benchmark tests
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
+
+EXTRA_CLEAN	= exprparse.c exprscan.c
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +20,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..243c6b9
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include postgres_fe.h
+
+#include pgbench.h
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix=expr_yy
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type expr expr
+%type ival INTEGER
+%type str VARIABLE
+%token INTEGER VARIABLE
+%token CHAR_ERROR /* never used, will raise a syntax error */
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; }
+	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_INTEGER_CONSTANT;
+	expr-u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_VARIABLE;
+	expr-u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_OPERATOR;
+	expr-u.operator.operator = operator;
+	expr-u.operator.lexpr = lexpr;
+	expr-u.operator.rexpr = rexpr;
+	return expr;
+}
+
+#include exprscan.c
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
new file mode 100644
index 000..4c9229c
--- /dev/null
+++ b/contrib/pgbench/exprscan.l
@@ -0,0 +1,105 @@
+%{
+/*-
+ *
+ * exprscan.l
+ *	  a lexical scanner for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+/* line and column number for error reporting */
+static int	yyline = 0, yycol = 0;
+
+/* Handles to the buffer that the lexer uses internally */
+static YY_BUFFER_STATE scanbufhandle;
+static char 

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-01-05 Thread Fujii Masao
On Mon, Jan 5, 2015 at 6:22 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-05 16:22:56 +0900, Fujii Masao wrote:
 On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
  pg_basebackup really
  changed heavily since it's introduction. And desparately needs some
  restructuring.

 The patch seems to break pg_receivexlog. I got the following error message
 while running pg_receivexlog.

 pg_receivexlog: could not create archive status file
 mmm/archive_status/00010003.done: No such file or
 directory

 Dang. Stupid typo. And my tests didn't catch it, because I had
 archive_directory in the target directory :(

 At least it's only broken in master :/

 Thanks for the catch. Do you have some additional testsuite or did you
 catch it manually?

Manually... I just tested the tools and options which the patch may affect...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-01-05 Thread Andres Freund
On 2015-01-05 11:34:54 +0200, Heikki Linnakangas wrote:
 On 01/04/2015 11:44 PM, Josh Berkus wrote:
 On 01/03/2015 12:56 AM, Heikki Linnakangas wrote:
 On 01/03/2015 12:28 AM, Josh Berkus wrote:
 On 01/02/2015 01:57 AM, Heikki Linnakangas wrote:
 wal_keep_segments does not affect the calculation of CheckPointSegments.
 If you set wal_keep_segments high enough, checkpoint_wal_size will be
 exceeded. The other alternative would be to force a checkpoint earlier,
 i.e. lower CheckPointSegments, so that checkpoint_wal_size would be
 honored. However, if you set wal_keep_segments high enough, higher than
 checkpoint_wal_size, it's impossible to honor checkpoint_wal_size no
 matter how frequently you checkpoint.
 
 So you're saying that wal_keep_segments is part of the max_wal_size
 total, NOT in addition to it?
 
 Not sure what you mean. wal_keep_segments is an extra control that can
 prevent WAL segments from being recycled. It has the same effect as
 archive_command failing for N most recent segments, if that helps.
 
 I mean, if I have these settings:
 
 max_wal_size* = 256MB
 wal_keep_segments = 8
 
 ... then my max wal size is *still* 256MB, NOT 384MB?
 
 Right.

With that you mean that wal_keep_segments has *no* influence over
checkpoint pacing or the contrary? Because upthread you imply that it
doesn't, but later comments may mean the contrary.

I think that influencing the pacing would be pretty insane - the user
certainly doesn't expect drastic performance changes when changing
wal_keep_segments. It's confusing enough that it can cause slight
peformance variations due to recycling, but we shouldn't make it have a
larger influence.

 If that's the case (and I think it's a good plan), then as a follow-on,
 we should prevent users from setting wal_keep_segments to more than 50%
 of max_wal_size, no?
 
 Not sure if the 50% figure is correct, but I see what you mean: don't allow
 setting wal_keep_segments so high that we would exceed max_wal_size because
 of it.

That seems a unrealistic goal. I've seen setups that have set
checkpoint_segments intentionally, and with good reasoning, north of
50k.

Neither wal_keep_segments, nor failing archive_commands nor replication
slot should have an influence on checkpoint pacing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC

2015-01-05 Thread Marco Nenciarini
I've noticed that I missed to add this to the commitfest.

I've just added it.

It is not meant to end up in a committable state, but at this point I'm
searching for some code review and more discusison.

I'm also about to send an additional patch to implement an LSN map as an
additional fork for heap files.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC

2015-01-05 Thread Marco Nenciarini
I've noticed that I missed to add this to the commitfest.

I've just added it.

It is not meant to end up in a committable state, but at this point I'm
searching for some code review and more discusison.

I'm also about to send an additional patch to implement an LSN map as an
additional fork for heap files.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Something is broken in logical decoding with CLOBBER_CACHE_ALWAYS

2015-01-05 Thread Andres Freund
On 2015-01-04 21:54:40 +0100, Andres Freund wrote:
 On January 4, 2015 9:51:43 PM CET, Andrew Dunstan and...@dunslane.net wrote:
 
 On 12/15/2014 12:04 PM, Andres Freund wrote:
 
  I think the safest fix would be to defer catchup interrupt
 processing
  while you're in this mode.  You don't really want to be processing
 any
  remote sinval messages at all, I'd think.
  Well, we need to do relmap, smgr and similar things. So I think
 that'd
  be more complicated than we want.
 
 
 
 
 Where are we on this? Traffic seems to have gone quite but we still
 have 
 a bunch of buildfarm animals red.
 
 I've a simple fix (similar too what I iriginally outkined) which I
 plan to post soonish. I've tried a bunch of things roughly in the vein
 of Tom's suggestions, but they all are more invasive and still
 incomplete.

Attached.

Note that part 1) referenced in the commit message is actually
problematic in all branches. I think we actually should backpatch that
part all the way, because if we ever hit that case the consequences of
the current coding will be rather hard to analyze. If others think so as
well, I'm going to split the commit into two parts, so commit messages
for  9.4 won't reference logical decoding. Since there hasn't been a
report of that error for a long while (~8.1 era), I can also live with
not backpatching further than 9.4.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From e5a6c2e9211561bc5c9985a8bf107d9911aad9f6 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 5 Jan 2015 11:52:05 +0100
Subject: [PATCH] Improve handling of relcache invalidations to currently
 invisible relations.

The corner case where a relcache invalidation tried to rebuild the
entry for a referenced relation but couldn't find it in the catalog
wasn't correct.

1) The code tried to RelationCacheDelete/RelationDestroyRelation the
   entry. That didn't work when assertions are enabled because the
   latter contains an assertion ensuring the refcount is zero. It's
   also a bad idea, because by virtue of being referenced somebody
   might actually look at the entry, which is possible if the error is
   trapped and handled via a subtransaction abort.  Instead just error
   out, without deleting the entry. As the entry is marked invalid,
   the worst that can happen is that we leak some memory.

2) When using a historic snapshot for logical decoding it can validly
   happen that a relation that's in the relcache isn't visible to that
   historic snapshot. E.g. if the relation is referenced in the query
   that uses the SQL interface for logical decoding.  While erroring
   cleanly, instead of hitting an assertion due to 1) is already an
   improvement, it's not good enough. So additionally allow that case
   without an error if a historic snapshot is set up - that won't
   allow an invalid entry to stay in the cache because it's a) already
   marked invalid and will thus be rebuilt during the next access b)
   the syscaches will be reset at the end of decoding.

   There might be prettier solutions to handle this case, but all that
   we could think of so far end up being much more complex than this
   simple fix.

This fixes the assertion failures reported by the buildfarm (markhor,
tick, leech) after the introduction of new regression tests in
89fd41b390a4. The failure there weren't actually directly caused by
CLOBBER_CACHE_ALWAYS but via the long runtimes due to it.

Backpatch the whole patch to 9.4, and 1) from above all the way back.
---
 src/backend/utils/cache/relcache.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index c2e574d..2cce2c1 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2192,9 +2192,24 @@ RelationClearRelation(Relation relation, bool rebuild)
 		newrel = RelationBuildDesc(save_relid, false);
 		if (newrel == NULL)
 		{
-			/* Should only get here if relation was deleted */
-			RelationCacheDelete(relation);
-			RelationDestroyRelation(relation, false);
+			/*
+			 * We can validly get here, if we're using a historic snapshot in
+			 * which a relation, accessed from outside logical decoding, is
+			 * still invisible. In that case it's fine to just mark the
+			 * relation as invalid and return - it'll fully get reloaded after
+			 * by the cache reset at the end of logical decoding.
+			 */
+			if (HistoricSnapshotActive())
+return;
+
+			/*
+			 * This situation shouldn't happen as dropping a relation
+			 * shouldn't be possible if still referenced
+			 * (c.f. CheckTableNotInUse()). But if get here anyway, we can't
+			 * just delete the relcache entry as it's possibly still
+			 * referenced and the error might get caught and handled via a
+			 * subtransaction rollback.
+			 */
 			elog(ERROR, relation %u 

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-01-05 Thread Michael Paquier
On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev leopard...@inbox.ru wrote:
 Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier 
 michael.paqu...@gmail.com:
 As I understand now = (pg_time_t) time(NULL); return time in seconds, what is 
 why I multiply value to 1000 to compare with restore_command_retry_interval 
 in milliseconds.
This way of doing is incorrect, you would get a wait time of 1s even
for retries lower than 1s. In order to get the feature working
correctly and to get a comparison granularity sufficient, you need to
use TimetampTz for the tracking of the current and last failure time
and to use the APIs from TimestampTz for difference calculations.

 I am not sure about small retry interval of time, in my cases I need interval 
 bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 
 100 milliseconds.
Other people may disagree here, but I don't see any reason to put
restrictions to this interval of time.

Attached is a patch fixing the feature to use TimestampTz, updating as
well the docs and recovery.conf.sample which was incorrect, on top of
other things I noticed here and there.

Alexey, does this new patch look fine for you?
-- 
Michael
From 9bb71e39b218885466a53c005a87361d4ae889fa Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 5 Jan 2015 17:10:13 +0900
Subject: [PATCH] Add restore_command_retry_interval to control retries of
 restore_command

restore_command_retry_interval can be specified in recovery.conf to control
the interval of time between retries of restore_command when failures occur.
---
 doc/src/sgml/recovery-config.sgml   | 21 +
 src/backend/access/transam/recovery.conf.sample |  9 ++
 src/backend/access/transam/xlog.c   | 42 -
 3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0c64ff2..0488ae3 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,27 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
+ varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval
+  termvarnamerestore_command_retry_interval/varname (typeinteger/type)
+  indexterm
+primaryvarnamerestore_command_retry_interval/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+If varnamerestore_command/ returns nonzero exit status code, retry
+command after the interval of time specified by this parameter,
+measured in milliseconds if no unit is specified. Default value is
+literal5s/.
+   /para
+   para
+This command is particularly useful for warm standby configurations
+to leverage the amount of retries done to restore a WAL segment file
+depending on the activity of a master node.
+   /para
+  /listitem
+ /varlistentry
+
 /variablelist
 
   /sect1
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..8699a33 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,15 @@
 #
 #recovery_end_command = ''
 #
+#
+# restore_command_retry_interval
+#
+# specifies an optional retry interval for restore_command command, if
+# previous command returned nonzero exit status code. This can be useful
+# to increase or decrease the number of calls of restore_command.
+#
+#restore_command_retry_interval = '5s'
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5cc7e47..1944e6f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int	restore_command_retry_interval = 5000;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,26 @@ readRecoveryCommandFile(void)
 	(errmsg_internal(trigger_file = '%s',
 	 TriggerFile)));
 		}
+		else if (strcmp(item-name, restore_command_retry_interval) == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS,
+		   hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(parameter \%s\ requires a temporal value,
+restore_command_retry_interval),
+		 hintmsg ? errhint(%s, _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal(restore_command_retry_interval 

Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread David Rowley
On 1 January 2015 at 21:23, Fabien COELHO coe...@cri.ensmp.fr wrote:


  I was about to go and look at this, but I had a problem when attempting to
 compile with MSVC.


 Thanks! Here is a v4 which includes your fix.


I'm also just looking at you ERROR() macro, it seems that core code is
quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is
not defined. I'd say this needs to be fixed too. Have a look at the trick
used in elog.h for when HAVE__VA_ARGS  is not defined.

Regards

David Rowley


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread Fabien COELHO



I'm also just looking at you ERROR() macro, it seems that core code is
quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is
not defined. I'd say this needs to be fixed too. Have a look at the trick
used in elog.h for when HAVE__VA_ARGS  is not defined.


Indeed.

The simplest solution seems to expand this macro so that these is no 
macro:-) I'll do that.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-05 Thread Fujii Masao
On Sat, Jan 3, 2015 at 1:52 AM, Bruce Momjian br...@momjian.us wrote:
 On Fri, Jan  2, 2015 at 10:15:57AM -0600, k...@rice.edu wrote:
 On Fri, Jan 02, 2015 at 01:01:06PM +0100, Andres Freund wrote:
  On 2014-12-31 16:09:31 -0500, Bruce Momjian wrote:
   I still don't understand the value of adding WAL compression, given the
   high CPU usage and minimal performance improvement.  The only big
   advantage is WAL storage, but again, why not just compress the WAL file
   when archiving.
 
  before: pg_xlog is 800GB
  after: pg_xlog is 600GB.
 
  I'm damned sure that many people would be happy with that, even if the
  *per backend* overhead is a bit higher. And no, compression of archives
  when archiving helps *zap* with that (streaming, wal_keep_segments,
  checkpoint_timeout). As discussed before.
 
  Greetings,
 
  Andres Freund
 

 +1

 On an I/O constrained system assuming 50:50 table:WAL I/O, in the case
 above you can process 100GB of transaction data at the cost of a bit
 more CPU.

 OK, so given your stats, the feature give a 12.5% reduction in I/O.  If
 that is significant, shouldn't we see a performance improvement?  If we
 don't see a performance improvement, is I/O reduction worthwhile?  Is it
 valuable in that it gives non-database applications more I/O to use?  Is
 that all?

 I suggest we at least document that this feature as mostly useful for
 I/O reduction, and maybe say CPU usage and performance might be
 negatively impacted.

 OK, here is the email I remember from Fujii Masao this same thread that
 showed a performance improvement for WAL compression:

 
 http://www.postgresql.org/message-id/CAHGQGwGqG8e9YN0fNCUZqTTT=hnr7ly516kft5ffqf4pp1q...@mail.gmail.com

 Why are we not seeing the 33% compression and 15% performance
 improvement he saw?

Because the benchmarks I and Michael used are very difffernet.
I just used pgbench, but he used his simple test SQLs (see
http://www.postgresql.org/message-id/cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com).

Furthermore, the data type of pgbench_accounts.filler column is character(84)
and its content is empty, so pgbench_accounts is very compressible. This is
one of the reasons I could see good performance improvement and high compression
ratio.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-05 Thread Fujii Masao
On Sat, Jan 3, 2015 at 2:24 AM, Bruce Momjian br...@momjian.us wrote:
 On Fri, Jan  2, 2015 at 02:18:12PM -0300, Claudio Freire wrote:
 On Fri, Jan 2, 2015 at 2:11 PM, Andres Freund and...@2ndquadrant.com wrote:
  , I now see the compression patch as something that has negatives, so
  has to be set by the user, and only wins in certain cases.  I am
  disappointed, and am trying to figure out how this became such a
  marginal win for 9.5.  :-(
 
  I find the notion that a multi digit space reduction is a marginal win
  pretty ridiculous and way too narrow focused. Our WAL volume is a
  *significant* problem in the field. And it mostly consists out of FPWs
  spacewise.

 One thing I'd like to point out, is that in cases where WAL I/O is an
 issue (ie: WAL archiving), usually people already compress the
 segments during archiving. I know I do, and I know it's recommended on
 the web, and by some consultants.

 So, I wouldn't want this FPW compression, which is desirable in
 replication scenarios if you can spare the CPU cycles (because of
 streaming), adversely affecting WAL compression during archiving.

 To be specific, desirable in streaming replication scenarios that don't
 use SSL compression.  (What percentage is that?)  It is something we
 should mention in the docs for this feature?

Even if SSL is used in replication, FPW compression is useful. It can reduce
the amount of I/O in the standby side. Sometimes I've seen walreceiver's I/O had
become a performance bottleneck especially in synchronous replication cases.
FPW compression can be useful for those cases, for example.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-01-05 Thread Andres Freund
On 2015-01-05 16:22:56 +0900, Fujii Masao wrote:
 On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
  pg_basebackup really
  changed heavily since it's introduction. And desparately needs some
  restructuring.
 
 The patch seems to break pg_receivexlog. I got the following error message
 while running pg_receivexlog.

 pg_receivexlog: could not create archive status file
 mmm/archive_status/00010003.done: No such file or
 directory

Dang. Stupid typo. And my tests didn't catch it, because I had
archive_directory in the target directory :(

At least it's only broken in master :/

Thanks for the catch. Do you have some additional testsuite or did you
catch it manually?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC

2015-01-05 Thread Michael Paquier
On Mon, Jan 5, 2015 at 7:56 PM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 I've noticed that I missed to add this to the commitfest.

 I've just added it.

 It is not meant to end up in a committable state, but at this point I'm
 searching for some code review and more discusison.

 I'm also about to send an additional patch to implement an LSN map as an
 additional fork for heap files.
Moved to CF 2015-02.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-01-05 Thread Andres Freund
On 2015-01-05 18:49:06 +0900, Fujii Masao wrote:
 On Mon, Jan 5, 2015 at 6:22 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-01-05 16:22:56 +0900, Fujii Masao wrote:
  On Sun, Jan 4, 2015 at 5:47 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   On 2015-01-03 16:03:36 +0100, Andres Freund wrote:
   pg_basebackup really
   changed heavily since it's introduction. And desparately needs some
   restructuring.
 
  The patch seems to break pg_receivexlog. I got the following error message
  while running pg_receivexlog.
 
  pg_receivexlog: could not create archive status file
  mmm/archive_status/00010003.done: No such file or
  directory
 
  Dang. Stupid typo. And my tests didn't catch it, because I had
  archive_directory in the target directory :(
 
  At least it's only broken in master :/

I've pushed the trivial fix, and verified using my adapted testscript
that it works on all branches.

  Thanks for the catch. Do you have some additional testsuite or did you
  catch it manually?
 
 Manually... I just tested the tools and options which the patch may affect...

Thanks!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-05 Thread Fujii Masao
On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:
 On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
 pglz_compress() and pglz_decompress() still use PGLZ_Header, so the
 frontend
 which uses those functions needs to handle PGLZ_Header. But it basically
 should
 be handled via the varlena macros. That is, the frontend still seems to
 need to
 understand the varlena datatype. I think we should avoid that. Thought?
 Hm, yes it may be wiser to remove it and make the data passed to pglz
 for varlena 8 bytes shorter..

 OK, here is the result of this work, made of 3 patches.

Thanks for updating the patches!

 The first two patches move pglz stuff to src/common and make it a frontend
 utility entirely independent on varlena and its related metadata.
 - Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still
 present. There is nothing amazing here, and that's the broken version that
 has been reverted in 966115c.

The patch 1 cannot be applied to the master successfully because of
recent change.

 - The real stuff comes with patch 2, that implements the removal of
 PGLZ_Header, changing the APIs of compression and decompression to pglz to
 not have anymore toast metadata, this metadata being now localized in
 tuptoaster.c. Note that this patch protects the on-disk format (tested with
 pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
 compression and decompression look like with this patch, simply performing
 operations from a source to a destination:
 extern int32 pglz_compress(const char *source, int32 slen, char *dest,
   const PGLZ_Strategy *strategy);
 extern int32 pglz_decompress(const char *source, char *dest,
   int32 compressed_size, int32 raw_size);
 The return value of those functions is the number of bytes written in the
 destination buffer, and 0 if operation failed.

So it's guaranteed that 0 is never returned in success case? I'm not sure
if that case can really happen, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-01-05 Thread Alexey Vasiliev
Mon, 5 Jan 2015 17:16:43 +0900 от Michael Paquier michael.paqu...@gmail.com:
 On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev leopard...@inbox.ru wrote:
  Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier 
  michael.paqu...@gmail.com:
  As I understand now = (pg_time_t) time(NULL); return time in seconds, what 
  is why I multiply value to 1000 to compare with 
  restore_command_retry_interval in milliseconds.
 This way of doing is incorrect, you would get a wait time of 1s even
 for retries lower than 1s. In order to get the feature working
 correctly and to get a comparison granularity sufficient, you need to
 use TimetampTz for the tracking of the current and last failure time
 and to use the APIs from TimestampTz for difference calculations.
 
  I am not sure about small retry interval of time, in my cases I need 
  interval bigger 5 seconds (20-40 seconds). Right now I limiting this value 
  be bigger 100 milliseconds.
 Other people may disagree here, but I don't see any reason to put
 restrictions to this interval of time.
 
 Attached is a patch fixing the feature to use TimestampTz, updating as
 well the docs and recovery.conf.sample which was incorrect, on top of
 other things I noticed here and there.
 
 Alexey, does this new patch look fine for you?
 -- 
 Michael
 
 

Hello. Thanks for help. Yes, new patch look fine!

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread Alvaro Herrera
David Rowley wrote:

 I'm also just looking at you ERROR() macro, it seems that core code is
 quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is
 not defined. I'd say this needs to be fixed too. Have a look at the trick
 used in elog.h for when HAVE__VA_ARGS  is not defined.

Hm, I just looked at the previous version which used ERROR rather than
LOCATE and LOCATION, and I liked that approach better.  Can we get that
back?  I understand that for the purposes of this patch it's easier to
not change existing fprintf()/exit() calls, though.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread Alvaro Herrera
Fabien COELHO wrote:
 
 I'm also just looking at you ERROR() macro, it seems that core code is
 quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is
 not defined. I'd say this needs to be fixed too. Have a look at the trick
 used in elog.h for when HAVE__VA_ARGS  is not defined.
 
 Here is a v5 with the vararg macro expanded.

Thanks.

On top of evaluateExpr() we need a comment (generally I think pgbench
could do with more comments; not saying your patch should add them, just
expressing an opinion.)  Also, intuitively I would say that the return
values of that function should be reversed: return true if things are
good. Can we cause a stack overflow in this function with a complex
enough expression?

I wonder about LOCATE and LOCATION.  Can we do away with the latter, and
keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or
similar?  I would just expand an ad-hoc fprintf in the single place
where the other macro is used.

Are we okay with only integer operands?  Is this something we would
expand in the future?  Is the gaussian/exp random stuff going to work
with integer operands, if we want to change it to use function syntax,
as expressed elsewhere?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-05 Thread Robert Haas
On Fri, Jan 2, 2015 at 5:36 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 1, 2015 at 11:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 1, 2015 at 12:00 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Can we check the number of free bgworkers slots to set the max workers?

 The real solution here is that this patch can't throw an error if it's
 unable to obtain the desired number of background workers.  It needs
 to be able to smoothly degrade to a smaller number of background
 workers, or none at all.

 I think handling this way can have one side effect which is that if
 we degrade to smaller number, then the cost of plan (which was
 decided by optimizer based on number of parallel workers) could
 be more than non-parallel scan.
 Ideally before finalizing the parallel plan we should reserve the
 bgworkers required to execute that plan, but I think as of now
 we can workout a solution without it.

I don't think this is very practical.  When cached plans are in use,
we can have a bunch of plans sitting around that may or may not get
reused at some point in the future, possibly far in the future.  The
current situation, which I think we want to maintain, is that such
plans hold no execution-time resources (e.g. locks) and, generally,
don't interfere with other things people might want to execute on the
system.  Nailing down a bunch of background workers just in case we
might want to use them in the future would be pretty unfriendly.

I think it's right to view this in the same way we view work_mem.  We
plan on the assumption that an amount of memory equal to work_mem will
be available at execution time, without actually reserving it.  If the
plan happens to need that amount of memory and if it actually isn't
available when needed, then performance will suck; conceivably, the
OOM killer might trigger.  But it's the user's job to avoid this by
not setting work_mem too high in the first place.  Whether this system
is for the best is arguable: one can certainly imagine a system where,
if there's not enough memory at execution time, we consider
alternatives like (a) replanning with a lower memory target, (b)
waiting until more memory is available, or (c) failing outright in
lieu of driving the machine into swap.  But devising such a system is
complicated -- for example, replanning with a lower memory target
might be latch onto a far more expensive plan, such that we would have
been better off waiting for more memory to be available; yet trying to
waiting until more memory is available might result in waiting
forever.  And that's why we don't have such a system.

We don't need to do any better here.  The GUC should tell us how many
parallel workers we should anticipate being able to obtain.  If other
settings on the system, or the overall system load, preclude us from
obtaining that number of parallel workers, then the query will take
longer to execute; and the plan might be sub-optimal.  If that happens
frequently, the user should lower the planner GUC to a level that
reflects the resources actually likely to be available at execution
time.

By the way, another area where this kind of effect crops up is with
the presence of particular disk blocks in shared_buffers or the system
buffer cache.  Right now, the planner makes no attempt to cost a scan
of a frequently-used, fully-cached relation different than a
rarely-used, probably-not-cached relation; and that sometimes leads to
bad plans.  But if it did try to do that, then we'd have the same kind
of problem discussed here -- things might change between planning and
execution, or even after the beginning of execution.  Also, we might
get nasty feedback effects: since the relation isn't cached, we view a
plan that would involve reading it in as very expensive, and avoid
such a plan.  However, we might be better off picking the slow plan
anyway, because it might be that once we've read the data once it will
stay cached and run much more quickly than some plan that seems better
starting from a cold cache.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2015-01-05 Thread Robert Haas
On Sat, Jan 3, 2015 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote:
 A couple remarks:
 * Shouldn't this provide more infrastructure to deal with the fact that
   we might get less parallel workers than we had planned for?

Maybe, but I'm not really sure what that should look like.  My working
theory is that individual parallel applications (executor nodes, or
functions that use parallelism internally, or whatever) should be
coded in such a way that they work correctly even if the number of
workers that starts is smaller than what they requested, or even zero.
It may turn out that this is impractical in some cases, but I don't
know what those cases are yet so I don't know what the common threads
will be.

I think parallel seq scan should work by establishing N tuple queues
(where N is the number of workers we have).  When asked to return a
tuple, the master should poll all of those queues one after another
until it finds one that contains a tuple.  If none of them contain a
tuple then it should claim a block to scan and return a tuple from
that block.  That way, if there are enough running workers to keep up
with the master's demand for tuples, the master won't do any of the
actual scan itself.  But if the workers can't keep up -- e.g. suppose
90% of the CPU consumed by the query is in the filter qual for the
scan -- then the master can pitch in along with everyone else.  As a
non-trivial fringe benefit, if the workers don't all start, or take a
while to initialize, the user backend isn't stalled meanwhile.

 * I wonder if parallel contexts shouldn't be tracked via resowners

That is a good question.  I confess that I'm somewhat fuzzy about
which things should be tracked via the resowner mechanism vs. which
things should have their own bespoke bookkeeping.  However, the
AtEOXact_Parallel() stuff happens well before ResourceOwnerRelease(),
which makes merging them seem not particularly clean.

 * combocid.c should error out in parallel mode, as insurance

Eh, which function?  HeapTupleHeaderGetCmin(),
HeapTupleHeaderGetCmax(), and AtEOXact_ComboCid() are intended to work
in parallel mode. HeapTupleHeaderAdjustCmax() could
Assert(!IsInParallelMode()) but the only calls are in heap_update()
and heap_delete() which already have error checks, so putting yet
another elog() there seems like overkill.

 * I doubt it's a good idea to allow heap_insert, heap_inplace_update,
   index_insert. I'm not convinced that those will be handled correct and
   relaxing restrictions is easier than adding them.

I'm fine with adding checks to heap_insert() and
heap_inplace_update().  I'm not sure we really need to add anything to
index_insert(); how are we going to get there without hitting some
other prohibited operation first?

 * I'd much rather separate HandleParallelMessageInterrupt() in one
   variant that just tells the machinery there's a interrupt (called from
   signal handlers) and one that actually handles them. We shouldn't even
   consider adding any new code that does allocations, errors and such in
   signal handlers. That's just a *bad* idea.

That's a nice idea, but I didn't invent the way this crap works today.
ImmediateInterruptOK gets set to true while performing a lock wait,
and we need to be able to respond to errors while in that state.  I
think there's got to be a better way to handle that than force every
asynchronous operation to have to cope with the fact that
ImmediateInterruptOK may be set or not set, but as of today that's
what you have to do.

 * I'm not a fan of the shm_toc stuff... Verbose and hard to read. I'd
   much rather place a struct there and be careful about not using
   pointers. That also would obliviate the need to reserve some ids.

I don't see how that's going to work with variable-size data
structures, and a bunch of the things that we need to serialize are
variable-size.  Moreover, I'm not really interested in rehashing this
design again.  I know it's not your favorite; you've said that before.
But it makes it possible to write code to do useful things in
parallel, a capability that we do not otherwise have.  And I like it
fine.

 * Doesn't the restriction in GetSerializableTransactionSnapshotInt()
   apply for repeatable read just the same?

Yeah.  I'm not sure whether we really need that check at all, because
there is a check in GetTransactionSnapshot() that probably checks the
same thing.

 * I'm not a fan of relying on GetComboCommandId() to restore things in
   the same order as before.

I thought that was a little wonky, but it's not likely to break, and
there is an elog(ERROR) there to catch it if it does, so I'd rather
not make it more complicated.

 * I'd say go ahead and commit the BgworkerByOid thing
   earlier/independently. I've seen need for that a couple times.

Woohoo!  I was hoping someone would say that.

 * s/entrypints/entrypoints/

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via 

Re: [HACKERS] Publish autovacuum informations

2015-01-05 Thread Robert Haas
On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd be all right with putting the data structure declarations in a file
 named something like autovacuum_private.h, especially if it carried an
 annotation that if you depend on this, don't be surprised if we break
 your code in future.

Works for me.  I am not in general surprised when we do things that
break my code, or anyway, the code that I'm responsible for
maintaining.  But I think it makes sense to segregate this into a
separate header file so that we are clear that it is only exposed for
the benefit of extension authors, not so that other things in the core
system can touch it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Change how first WAL segment on new timeline after promotion is

2015-01-05 Thread Heikki Linnakangas

On 01/03/2015 08:59 PM, Andres Freund wrote:

Hi Heikki,

While writing a test script for
http://archives.postgresql.org/message-id/20141205002854.GE21964%40awork2.anarazel.de
I noticed that this commit broke starting a pg_basebackup -X * without a
recovery.conf present. Which might not be the best idea, but imo is a
perfectly valid thing to do.

To me the changes to StartupXLOG() in that commit look a bit bogus. The
new startLogSegNo is initialized to XLByteToSeg(EndOfLog)? Which points
to the end of the record +1? Which thus isn't guaranteed to exist as a
segment (e.g. never if the last record was a XLOG_SWITCH).


Ah, good point.


Did you perhaps intend to use XLogFileInit(use_existing = true)
instead of XLogFileOpen()? That works for me.


Hmm, that doesn't sound right either. XLogFileInit is used when you 
switch to a new segment, not to open an old segment for writing. It 
happens to work, because with use_existing = true it will in fact always 
open the old segment, instead of creating a new one, but I don't think 
that's in the spirit of how that function's intended to be used.


A very simple fix is to not try opening the segment at all. There isn't 
actually any requirement to have the segment open at that point, 
XLogWrite() will open it the first time the WAL is flushed. The WAL is 
flushed after writing the initial checkpoint or end-of-recovery record, 
which happens pretty soon anyway. Any objections to the attached?



I've attached my preliminary testscript (note it's really not that
interesting at this point) that reliably reproduces the problem for me.


Thanks!

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5cc7e47..e623463 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5646,7 +5646,6 @@ StartupXLOG(void)
 	XLogRecPtr	RecPtr,
 checkPointLoc,
 EndOfLog;
-	XLogSegNo	startLogSegNo;
 	TimeLineID	PrevTimeLineID;
 	XLogRecord *record;
 	TransactionId oldestActiveXID;
@@ -6633,7 +6632,6 @@ StartupXLOG(void)
 	 */
 	record = ReadRecord(xlogreader, LastRec, PANIC, false);
 	EndOfLog = EndRecPtr;
-	XLByteToSeg(EndOfLog, startLogSegNo);
 
 	/*
 	 * Complain if we did not roll forward far enough to render the backup
@@ -6741,9 +6739,6 @@ StartupXLOG(void)
 	 * buffer cache using the block containing the last record from the
 	 * previous incarnation.
 	 */
-	openLogSegNo = startLogSegNo;
-	openLogFile = XLogFileOpen(openLogSegNo);
-	openLogOff = 0;
 	Insert = XLogCtl-Insert;
 	Insert-PrevBytePos = XLogRecPtrToBytePos(LastRec);
 	Insert-CurrBytePos = XLogRecPtrToBytePos(EndOfLog);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2015-01-05 Thread Guillaume Lelarge
2015-01-05 17:40 GMT+01:00 Robert Haas robertmh...@gmail.com:

 On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'd be all right with putting the data structure declarations in a file
  named something like autovacuum_private.h, especially if it carried an
  annotation that if you depend on this, don't be surprised if we break
  your code in future.

 Works for me.  I am not in general surprised when we do things that
 break my code, or anyway, the code that I'm responsible for
 maintaining.  But I think it makes sense to segregate this into a
 separate header file so that we are clear that it is only exposed for
 the benefit of extension authors, not so that other things in the core
 system can touch it.


I'm fine with that too. I'll try to find some time to work on that.

Thanks.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: [HACKERS] Additional role attributes superuser review

2015-01-05 Thread Robert Haas
On Wed, Dec 24, 2014 at 12:48 PM, Adam Brightwell
adam.brightw...@crunchydatasolutions.com wrote:
 * BACKUP - allows role to perform backup operations
 * LOGROTATE - allows role to rotate log files
 * MONITOR - allows role to view pg_stat_* details
 * PROCSIGNAL - allows role to signal backend processes

How about just SIGNAL instead of PROCSIGNAL?

Generally, I think we'll be happier if these capabilities have names
that are actual words - or combinations of words - rather than partial
words, so I'd suggest avoiding things like PROC for PROCESS and AUTH
for AUTHORIZATION.

In this particular case, it seems like the name of the capability is
based off the name of an internal system data structure.  That's the
sort of thing that we do not want to expose to users.  As far as we
can, we should try to describe what the capability allows, not the
details of how that is (currently) implemented.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2015-01-05 Thread Petr Jelinek

On 05/01/15 16:17, Petr Jelinek wrote:

On 05/01/15 07:28, Fujii Masao wrote:

On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao masao.fu...@gmail.com
wrote:

On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Pushed with some extra cosmetic tweaks.


I got the following assertion failure when I executed
pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c,
Line: 315)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the standby
#5. execute select pg_xact_commit_timestamp('1000'::xid) in the
standby

BTW, at the step #4, I got the following log messages. This might be
a hint for
this problem.

LOG:  file pg_commit_ts/ doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384


This problem still happens in the master.

Regards,



Attached patch fixes it, I am not sure how happy I am with the way I did
it though.



Added more comments and made the error message bit clearer.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ca074da..59d19a0 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,12 @@ StartupCommitTs(void)
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
 
+	if (track_commit_timestamp)
+	{
+		ActivateCommitTs();
+		return;
+	}
+
 	LWLockAcquire(CommitTsControlLock, LW_EXCLUSIVE);
 
 	/*
@@ -571,14 +577,30 @@ StartupCommitTs(void)
  * This must be called ONCE during postmaster or standalone-backend startup,
  * when commit timestamp is enabled.  Must be called after recovery has
  * finished.
+ */
+void
+CompleteCommitTsInitialization(void)
+{
+	if (!track_commit_timestamp)
+		DeactivateCommitTs(true);
+}
+
+/*
+ * This must be called when track_commit_timestamp is turned on.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
+ *
+ * The reason why this SLRU needs separate activation/deactivation functions is
+ * that it can be enabled/disabled during start and the activation/deactivation
+ * on master is propagated to slave via replay. Other SLRUs don't have this
+ * property and they can be just initialized during normal startup.
  *
  * This is in charge of creating the currently active segment, if it's not
  * already there.  The reason for this is that the server might have been
  * running with this module disabled for a while and thus might have skipped
  * the normal creation point.
  */
-void
-CompleteCommitTsInitialization(void)
+void ActivateCommitTs(void)
 {
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
@@ -591,22 +613,6 @@ CompleteCommitTsInitialization(void)
 	LWLockRelease(CommitTsControlLock);
 
 	/*
-	 * If this module is not currently enabled, make sure we don't hand back
-	 * possibly-invalid data; also remove segments of old data.
-	 */
-	if (!track_commit_timestamp)
-	{
-		LWLockAcquire(CommitTsLock, LW_EXCLUSIVE);
-		ShmemVariableCache-oldestCommitTs = InvalidTransactionId;
-		ShmemVariableCache-newestCommitTs = InvalidTransactionId;
-		LWLockRelease(CommitTsLock);
-
-		TruncateCommitTs(ReadNewTransactionId());
-
-		return;
-	}
-
-	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
 	 * need to set the oldest and newest values to the next Xid; that way, we
 	 * will not try to read data that might not have been set.
@@ -641,6 +647,35 @@ CompleteCommitTsInitialization(void)
 }
 
 /*
+ * This must be called when track_commit_timestamp is turned off.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
+ *
+ * Resets CommitTs into invalid state to 

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-05 Thread Robert Haas
On Thu, Jan 1, 2015 at 3:04 PM, Andres Freund and...@2ndquadrant.com wrote:
That's true, but if you don't align the beginnings of the allocations,
then it's a lot more complicated for the code to properly align stuff
within the allocation.  It's got to insert a variable amount of
padding based on the alignment it happens to get.

 Hm? Allocate +PG_CACHELINE_SIZE and do var = CACHELINEALIGN(var).

Meh.  I guess that will work, but I see little advantage in it.
Cache-line aligning the allocations is simple and, not of no value, of
long precedent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2015-01-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jan 3, 2015 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote:
 * I wonder if parallel contexts shouldn't be tracked via resowners

 That is a good question.  I confess that I'm somewhat fuzzy about
 which things should be tracked via the resowner mechanism vs. which
 things should have their own bespoke bookkeeping.  However, the
 AtEOXact_Parallel() stuff happens well before ResourceOwnerRelease(),
 which makes merging them seem not particularly clean.

FWIW, the resowner mechanism was never meant as a substitute for bespoke
bookkeeping.  What it is is a helper mechanism to reduce the need for
PG_TRY blocks that guarantee that a resource-releasing function will be
called even in error paths.  I'm not sure whether that analogy applies
well in parallel-execution cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-01-05 Thread Heikki Linnakangas

On 01/05/2015 12:06 PM, Andres Freund wrote:

On 2015-01-05 11:34:54 +0200, Heikki Linnakangas wrote:

On 01/04/2015 11:44 PM, Josh Berkus wrote:

On 01/03/2015 12:56 AM, Heikki Linnakangas wrote:

On 01/03/2015 12:28 AM, Josh Berkus wrote:

On 01/02/2015 01:57 AM, Heikki Linnakangas wrote:

wal_keep_segments does not affect the calculation of CheckPointSegments.
If you set wal_keep_segments high enough, checkpoint_wal_size will be
exceeded. The other alternative would be to force a checkpoint earlier,
i.e. lower CheckPointSegments, so that checkpoint_wal_size would be
honored. However, if you set wal_keep_segments high enough, higher than
checkpoint_wal_size, it's impossible to honor checkpoint_wal_size no
matter how frequently you checkpoint.


So you're saying that wal_keep_segments is part of the max_wal_size
total, NOT in addition to it?


Not sure what you mean. wal_keep_segments is an extra control that can
prevent WAL segments from being recycled. It has the same effect as
archive_command failing for N most recent segments, if that helps.


I mean, if I have these settings:

max_wal_size* = 256MB
wal_keep_segments = 8

... then my max wal size is *still* 256MB, NOT 384MB?


Right.


With that you mean that wal_keep_segments has *no* influence over
checkpoint pacing or the contrary? Because upthread you imply that it
doesn't, but later comments may mean the contrary.


wal_keep_segments does not influence checkpoint pacing.


If that's the case (and I think it's a good plan), then as a follow-on,
we should prevent users from setting wal_keep_segments to more than 50%
of max_wal_size, no?


Not sure if the 50% figure is correct, but I see what you mean: don't allow
setting wal_keep_segments so high that we would exceed max_wal_size because
of it.


I wasn't clear on my opinion here. I think I understood what Josh meant, 
but I don't think we should do it. Seems like unnecessary nannying of 
the DBA. Let's just mention in the manual that if you set 
wal_keep_segments higher than [insert formula here], you will routinely 
have more WAL in pg_xlog than what checkpoint_wal_size is set to.



That seems a unrealistic goal. I've seen setups that have set
checkpoint_segments intentionally, and with good reasoning, north of
50k.


So? I don't see how that's relevant.


Neither wal_keep_segments, nor failing archive_commands nor replication
slot should have an influence on checkpoint pacing.


Agreed.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PGCon 2015 call for papers

2015-01-05 Thread Dan Langille
PGCon 2015 will be on 18-19 June 2015 at University of Ottawa.

* 16-17 (Tue-Wed) tutorials
* 18-19 (Thu-Fri) talks - the main part of the conference
* 20 (Sat) The Unconference (very popular)

PLEASE NOTE: PGCon 2015 is in June.

See http://www.pgcon.org/2015/

We are now accepting proposals for the main part of the conference (18-19 June).
Proposals can be quite simple. We do not require academic-style papers.

You have about two weeks left before submissions close.

If you are doing something interesting with PostgreSQL, please submit
a proposal.  You might be one of the backend hackers or work on a
PostgreSQL related project and want to share your know-how with
others. You might be developing an interesting system using PostgreSQL
as the foundation. Perhaps you migrated from another database to
PostgreSQL and would like to share details.  These, and other stories
are welcome. Both users and developers are encouraged to share their
experiences.

Here are a some ideas to jump start your proposal process:

- novel ways in which PostgreSQL is used
- migration of production systems from another database
- data warehousing
- tuning PostgreSQL for different work loads
- replication and clustering
- hacking the PostgreSQL code
- PostgreSQL derivatives and forks
- applications built around PostgreSQL
- benchmarking and performance engineering
- case studies
- location-aware and mapping software with PostGIS
- The latest PostgreSQL features and features in development
- research and teaching with PostgreSQL
- things the PostgreSQL project could do better
- integrating PostgreSQL with 3rd-party software

Both users and developers are encouraged to share their experiences.

The schedule is:

1 Dec 2014 Proposal acceptance begins
19 Jan 2015 Proposal acceptance ends
19 Feb 2015 Confirmation of accepted proposals

NOTE: the call for lightning talks will go out very close to the conference.
Do not submit lightning talks proposals until then.

See also http://www.pgcon.org/2015/papers.php

Instructions for submitting a proposal to PGCon 2015 are available
from: http://www.pgcon.org/2015/submissions.php

—
Dan Langille
http://langille.org/



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2015-01-05 Thread Adam Brightwell
On Mon, Jan 5, 2015 at 11:49 AM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Dec 24, 2014 at 12:48 PM, Adam Brightwell
 adam.brightw...@crunchydatasolutions.com wrote:
  * BACKUP - allows role to perform backup operations
  * LOGROTATE - allows role to rotate log files
  * MONITOR - allows role to view pg_stat_* details
  * PROCSIGNAL - allows role to signal backend processes

 How about just SIGNAL instead of PROCSIGNAL?


Sure.


 Generally, I think we'll be happier if these capabilities have names
 that are actual words - or combinations of words - rather than partial
 words, so I'd suggest avoiding things like PROC for PROCESS and AUTH
 for AUTHORIZATION.


Agreed.


 In this particular case, it seems like the name of the capability is
 based off the name of an internal system data structure.  That's the
 sort of thing that we do not want to expose to users.  As far as we
 can, we should try to describe what the capability allows, not the
 details of how that is (currently) implemented.


Agreed.

If others are also in agreement on this point then I'll update the patch
accordingly.

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


Re: [HACKERS] pg_rewind in contrib

2015-01-05 Thread Heikki Linnakangas
Here's an updated version of pg_rewind. The code itself is the same as 
before, and corresponds to the sources in the current pg_rewind github 
repository, as of commit a65e3754faf9ca9718e6b350abc736de586433b7. Based 
mostly on Michael's comments, I have:


* replaced VMware copyright notices with PGDG ones.
* removed unnecessary cruft from .gitignore
* changed the --version line and report bugs notice in --help to match 
other binaries in the PostgreSQL distribution

* moved documentation from README to the user manual.
* minor fixes to how the regression tests are launched so that they work 
again


Some more work remains to be done on the regression tests. The way 
they're launched now is quite weird. It was written that way to make it 
work outside the PostgreSQL source tree, and also on Windows. Now that 
it lives in contrib, it should be redesigned.


The documentation could also use some work; for now I just converted the 
existing text from README to sgml format.


Anything else?

- Heikki

diff --git a/contrib/Makefile b/contrib/Makefile
index 195d447..2fe861f 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_rewind	\
 		pg_standby	\
 		pg_stat_statements \
 		pg_test_fsync	\
diff --git a/contrib/pg_rewind/.gitignore b/contrib/pg_rewind/.gitignore
new file mode 100644
index 000..816a91d
--- /dev/null
+++ b/contrib/pg_rewind/.gitignore
@@ -0,0 +1,9 @@
+# Files generated during build
+/xlogreader.c
+
+# Generated by test suite
+/tmp_check/
+/regression.diffs
+/regression.out
+/results/
+/regress_log/
diff --git a/contrib/pg_rewind/Makefile b/contrib/pg_rewind/Makefile
new file mode 100644
index 000..241c775
--- /dev/null
+++ b/contrib/pg_rewind/Makefile
@@ -0,0 +1,51 @@
+# Makefile for pg_rewind
+#
+# Copyright (c) 2013-2014, PostgreSQL Global Development Group
+#
+
+PGFILEDESC = pg_rewind - repurpose an old master server as standby
+PGAPPICON = win32
+
+PROGRAM = pg_rewind
+OBJS	= pg_rewind.o parsexlog.o xlogreader.o util.o datapagemap.o timeline.o \
+	fetch.o copy_fetch.o libpq_fetch.o filemap.o
+
+REGRESS = basictest extrafiles databases
+REGRESS_OPTS=--use-existing --launcher=./launcher
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS = $(libpq_pgport)
+
+override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+
+EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c
+
+all: pg_rewind
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_rewind
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
+	rm -f $@  $(LN_S) $ .
+
+# The regression tests can be run separately against, using the libpq or local
+# method to copy the files. For local mode, the makefile target is
+# check-local, and for libpq mode, check-remote. The target check-both
+# runs the tests in both modes.
+check-local:
+	echo Running tests against local data directory, in copy-mode
+	bindir=$(bindir) TEST_SUITE=local $(MAKE) installcheck
+
+check-remote:
+	echo Running tests against a running standby, via libpq
+	bindir=$(bindir) TEST_SUITE=remote $(MAKE) installcheck
+
+check-both: check-local check-remote
diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c
new file mode 100644
index 000..5c40ec7
--- /dev/null
+++ b/contrib/pg_rewind/copy_fetch.c
@@ -0,0 +1,586 @@
+/*-
+ *
+ * copy_fetch.c
+ *	  Functions for copying a PostgreSQL data directory
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ *-
+ */
+#include postgres_fe.h
+
+#include catalog/catalog.h
+
+#include sys/types.h
+#include sys/stat.h
+#include dirent.h
+#include fcntl.h
+#include unistd.h
+#include string.h
+
+#include pg_rewind.h
+#include fetch.h
+#include filemap.h
+#include datapagemap.h
+#include util.h
+
+static void recurse_dir(const char *datadir, const char *path,
+			process_file_callback_t callback);
+
+static void execute_pagemap(datapagemap_t *pagemap, const char *path);
+
+static void remove_target_file(const char *path);
+static void create_target_dir(const char *path);
+static void remove_target_dir(const char *path);
+static void create_target_symlink(const char *path, const char *link);
+static void remove_target_symlink(const char *path);
+
+/*
+ * Traverse through all files in a data directory, calling 'callback'
+ * for each file.
+ */
+void
+traverse_datadir(const char *datadir, process_file_callback_t callback)
+{
+	/* should this copy config files or not? */
+	recurse_dir(datadir, NULL, callback);
+}
+
+/*
+ * recursive part of traverse_datadir
+ */
+static void
+recurse_dir(const char *datadir, const char *parentpath,
+			process_file_callback_t callback)
+{
+	

Re: [HACKERS] replicating DROP commands across servers

2015-01-05 Thread Jeff Janes
On Fri, Jan 2, 2015 at 9:59 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Mon, Dec 29, 2014 at 2:15 PM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Here's a patch that tweaks the grammar to use TypeName in COMMENT,
 SECURITY LABEL, and DROP for the type and domain cases.  The required
 changes in the code are pretty minimal, thankfully.  Note the slight
 changes to the new object_address test also.

 I think I'm pretty much done with this now, so I intend to push this
 first thing tomorrow and then the rebased getObjectIdentityParts patch
 sometime later.



 This commit 3f88672a4e4d8e648d24ccc65 seems to have broken pg_upgrade for
 pg_trgm.

 On 9.2.9 freshly initdb and started with default config:

 $ createdb jjanes

 in psql:

 create extension pg_trgm ;
 create table foo (x text);
 create index on foo using gin (upper(x) gin_trgm_ops);

 Then run 9.5's pg_upgrade and it fails at the stage of restoring the
 database schema.


The problem also occurs doing a self-upgrade from 9.5 to 9.5.

The contents of the dump not changed meaningfully between 9.4 and 9.5.
I've isolated the problem to the backend applying the pg_restore of the
dump, regardless of which version created the dump.

After compiling  3c9e4cdbf2ec876dbb7 with CFLAGS=-fno-omit-frame-pointer,
I get this backtrace for the core-dump of postmaster during the pg_restore:



Core was generated by `postgres: jjanes jjanes [local] ALTER EXTENSION
  '.
Program terminated with signal 11, Segmentation fault.
#0  0x005135ff in NameListToString (names=0x257fcf8) at
namespace.c:2935
2935if (IsA(name, String))
(gdb) bt
#0  0x005135ff in NameListToString (names=0x257fcf8) at
namespace.c:2935
#1  0x00512f33 in DeconstructQualifiedName (names=0x257fcf8,
nspname_p=0x7fff419bc960, objname_p=0x7fff419bc958) at namespace.c:2648
#2  0x0058a746 in LookupTypeName (pstate=0x0, typeName=0x257fd10,
typmod_p=0x0, missing_ok=0 '\000') at parse_type.c:153
#3  0x005220b4 in get_object_address_type (objtype=OBJECT_TYPE,
objname=0x257fd50, missing_ok=0 '\000') at objectaddress.c:1306
#4  0x00520cf5 in get_object_address (objtype=OBJECT_TYPE,
objname=0x257fd50, objargs=0x0, relp=0x7fff419bcb58, lockmode=4,
missing_ok=0 '\000')
at objectaddress.c:678
#5  0x005c0f36 in ExecAlterExtensionContentsStmt (stmt=0x257fd80)
at extension.c:2906
#6  0x0077508c in ProcessUtilitySlow (parsetree=0x257fd80,
queryString=0x254f990 \n-- For binary upgrade, must preserve pg_type
oid\nSELECT
binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For
binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se...,
context=PROCESS_UTILITY_QUERY, params=0x0, dest=0x2581b60,
completionTag=0x7fff419bd100 ) at utility.c:1167
#7  0x0077490e in standard_ProcessUtility (parsetree=0x257fd80,
queryString=0x254f990 \n-- For binary upgrade, must preserve pg_type
oid\nSELECT
binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For
binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se...,
context=PROCESS_UTILITY_QUERY, params=0x0, dest=0x2581b60,
completionTag=0x7fff419bd100 ) at utility.c:840
#8  0x00773bcc in ProcessUtility (parsetree=0x257fd80,
queryString=0x254f990 \n-- For binary upgrade, must preserve pg_type
oid\nSELECT
binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For
binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se...,
context=PROCESS_UTILITY_QUERY, params=0x0, dest=0x2581b60,
completionTag=0x7fff419bd100 ) at utility.c:313
#9  0x00772dd6 in PortalRunUtility (portal=0x2505b90,
utilityStmt=0x257fd80, isTopLevel=0 '\000', dest=0x2581b60,
completionTag=0x7fff419bd100 )
at pquery.c:1187
#10 0x00772f8c in PortalRunMulti (portal=0x2505b90, isTopLevel=0
'\000', dest=0x2581b60, altdest=0x2581b60, completionTag=0x7fff419bd100 )
at pquery.c:1318
#11 0x00772560 in PortalRun (portal=0x2505b90,
count=9223372036854775807, isTopLevel=0 '\000', dest=0x2581b60,
altdest=0x2581b60,
completionTag=0x7fff419bd100 ) at pquery.c:816
#12 0x0076c868 in exec_simple_query (
query_string=0x254f990 \n-- For binary upgrade, must preserve pg_type
oid\nSELECT
binary_upgrade.set_next_pg_type_oid('16394'::pg_catalog.oid);\n\n\n-- For
binary upgrade, must preserve pg_type array oid\nSELECT binary_upgrade.se...)
at postgres.c:1045
#13 0x007708a2 in PostgresMain (argc=1, argv=0x24ed5e0,
dbname=0x24ed4b8 jjanes, username=0x24ed4a0 jjanes) at postgres.c:4012
#14 0x00701940 in BackendRun (port=0x250c1b0) at postmaster.c:4130
#15 0x00701083 in BackendStartup (port=0x250c1b0) at
postmaster.c:3805
#16 0x006fd8c5 in ServerLoop () at postmaster.c:1573
#17 0x006fd013 in PostmasterMain (argc=18, argv=0x24ec480) at
postmaster.c:1220
#18 0x0066476b in main (argc=18, argv=0x24ec480) at 

Re: [HACKERS] parallel mode and parallel contexts

2015-01-05 Thread Simon Riggs
On 22 December 2014 at 19:14, Robert Haas robertmh...@gmail.com wrote:

 Here is another new version, with lots of bugs fixed.

An initial blind review, independent of other comments already made on thread.

OK src/backend/access/heap/heapam.c
heapam.c prohibitions on update and delete look fine

OK src/backend/access/transam/Makefile

OK src/backend/access/transam/README.parallel
README.parallel and all concepts look good

PARTIAL src/backend/access/transam/parallel.c
wait_patiently mentioned in comment but doesn’t exist
Why not make nworkers into a uint?
Trampoline? Really? I think we should define what we mean by that,
somewhere, rather than just use the term as if it was self-evident.
Entrypints?
..

OK src/backend/access/transam/varsup.c

QUESTIONS  src/backend/access/transam/xact.c
These comments don’t have any explanation or justification

+ * This stores XID of the toplevel transaction to which we belong.
+ * Normally, it's the same as TopTransactionState.transactionId, but in
+ * a parallel worker it may not be.

+ * If we're a parallel worker, there may be additional XIDs that we need
+ * to regard as part of our transaction even though they don't appear
+ * in the transaction state stack.

This comment is copied-and-pasted too many times for safety and elegance
+   /*
+* Workers synchronize transaction state at the beginning of
each parallel
+* operation, so we can't account for transaction state change
after that
+* point.  (Note that this check will certainly error out if
s-blockState
+* is TBLOCK_PARALLEL_INPROGRESS, so we can treat that as an
invalid case
+* below.)
+*/
We need a single Check() that contains more detail and comments within

Major comments

* Parallel stuff at start sounded OK
* Transaction stuff strikes me as overcomplicated and error prone.
Given there is no explanation or justification for most of it, I’d be
inclined to question why its there
* If we have a trampoline for DSM, why don’t we use a trampoline for
snapshots, then you wouldn’t need to do all this serialize stuff

This is very impressive, but it looks like we’re trying to lift too
much weight on the first lift. If we want to go anywhere near this, we
need to have very clear documentation about how and why its like that.
I’m actually very sorry to say that because the review started very
well, much much better than most.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POLA violation with \c service=

2015-01-05 Thread David Fetter
On Tue, Dec 30, 2014 at 04:48:11PM -0800, David Fetter wrote:
 On Wed, Dec 17, 2014 at 08:14:04AM -0500, Andrew Dunstan wrote:
  
  Yeah, that's the correct solution. It should not be terribly difficult to
  create a test for a conninfo string in the dbname parameter. That's what
  libpq does after all. We certainly don't want psql to have to try to
  interpret the service file. psql just needs to let libpq do its work in this
  situation.
 
 This took a little longer to get time to polish than I'd hoped, but
 please find attached a patch which:
 
 - Correctly connects to service= and postgres(ql)?:// with \c
 - Disallows tab completion in the above cases
 
 I'd like to see about having tab completion actually work correctly in
 at least the service= case, but that's a matter for a follow-on patch.
 
 Thanks to Andrew Dunstan for the original patch, and to Andrew Gierth
 for his help getting it into shape.
 
 Cheers,
 David.

I should mention that the patch also corrects a problem where the
password was being saved/discarded at inappropriate times.  Please
push this patch to the back branches :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] event trigger pg_dump fix

2015-01-05 Thread Petr Jelinek
Hi,

Attached is fix for
http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com

It was dumping comment with NULL owner while the function expects
empty string for objects without owner (there is pg_strdup down the
line).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6658fda..6541463 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15106,7 +15106,7 @@ dumpEventTrigger(Archive *fout, DumpOptions *dopt, EventTriggerInfo *evtinfo)
  query-data, , NULL, NULL, 0, NULL, NULL);
 
 	dumpComment(fout, dopt, labelq-data,
-NULL, NULL,
+NULL, ,
 evtinfo-dobj.catId, 0, evtinfo-dobj.dumpId);
 
 	destroyPQExpBuffer(query);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger test exception message

2015-01-05 Thread Robert Haas
On Sun, Jan 4, 2015 at 8:09 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Andrew Dunstan wrote:
 I don't wish to seem humorless, but I think this should probably be changed:

 root/HEAD/pgsql/src/test/regress/sql/event_trigger.sql:248:  RAISE EXCEPTION
 'I''m sorry Sir, No Rewrite Allowed.';

 Quite apart from any other reason, the Sir does seem a bit sexist - we
 have no idea of the gender of the reader. Probably just 'sorry, no rewrite
 allowed' would suffice.

 This seems pointless tinkering to me.  Should I start introducing female
 pronouns in test error messages, to measure how much this will annoy my
 male conterparts?  This is not a user visible message in any case.

I'm with Andrew.  Aside from any question of sexism, which I do not
discount, the capitalization is wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-05 Thread Peter Eisentraut
I think this is a valuable effort, but I wonder how we are going to
arrive at a consensus.  I don't see any committer supporting these
changes. Clearly, there are some advantages to having recovery
parameters be GUCs, but the proposed changes also come with some
disadvantages, and the tradeoffs aren't so clear.

For example, putting recovery target parameters into postgresql.conf
might not make sense to some people.  Once you have reached the recovery
end point, the information is obsolete.  Keeping it set could be
considered confusing.

Moreover, when I'm actually doing point-in-time-recovery attempts, I
don't think I want to be messing with my pristine postgresql.conf file.
 Having those settings separate isn't a bad idea in that case.

Some people might like the recovery.done file as a historical record.
Having standby.enabled just be deleted would destroy that information
once recovery has ended.

In the past, when some people have complained that recovery.conf cannot
be moved to a configuration directory, I (and others?) have argued that
recovery.conf is really more of a command script file than a
configuration file (that was before replication was commonplace).  The
premise of this patch is that some options really are more configuration
than command most of the time, but that doesn't mean the old view was
invalid.

The current system makes it easy to share postgresql.conf between
primary and standby and just maintain the information related to the
standby locally in recovery.conf.  pg_basebackup -R makes that even
easier.  It's still possible to do this in the new system, but it's
decidedly more work.

These are all soft reasons, but they add up.

The wins on the other hand are obscure: You can now use SHOW to inspect
recovery settings.  You can design your own configuration file include
structures to set them.  These are not bad, but is that all?

I note that all the new settings are PGC_POSTMASTER, so you can't set
them on the fly.  Changing primary_conninfo without a restart would be a
big win, for example.  Is that planned?

It might be acceptable to break all the old workflows if the new system
was obviously great, but it's not.  It's still confusing as heck.  For
example, we would have a file standby.enabled and a setting
standby_mode, which would mean totally different things.  I think
there is also some conceptual overlap between standby_mode and
recovery_target_action (standby_mode is really something like
recovery_target_action = keep_going).  I also find the various uses of
trigger file or to trigger confusing.  The old trigger file to
trigger promotion makes sense.  But calling standby.enabled a trigger
file as well confuses two entirely different concepts.

I have previously argued for a different approach: Ready recovery.conf
as a configuration file after postgresql.conf, but keep it as a file to
start recovery.  That doesn't break any old workflows, it gets you all
the advantages of having recovery parameters in the GUC system, and it
keeps all the options open for improving things further in the future.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger pg_dump fix

2015-01-05 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 Attached is fix for
 http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com

 It was dumping comment with NULL owner while the function expects
 empty string for objects without owner (there is pg_strdup down the
 line).

This isn't right either: event triggers do have owners.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger pg_dump fix

2015-01-05 Thread Petr Jelinek

On 06/01/15 01:02, Tom Lane wrote:

Petr Jelinek p...@2ndquadrant.com writes:

Attached is fix for
http://www.postgresql.org/message-id/1420492505.23613.15.ca...@bloodnok.com



It was dumping comment with NULL owner while the function expects
empty string for objects without owner (there is pg_strdup down the
line).


This isn't right either: event triggers do have owners.



Bah, of course, stupid me.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6658fda..00b87e7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15106,7 +15106,7 @@ dumpEventTrigger(Archive *fout, DumpOptions *dopt, EventTriggerInfo *evtinfo)
  query-data, , NULL, NULL, 0, NULL, NULL);
 
 	dumpComment(fout, dopt, labelq-data,
-NULL, NULL,
+NULL, evtinfo-evtowner,
 evtinfo-dobj.catId, 0, evtinfo-dobj.dumpId);
 
 	destroyPQExpBuffer(query);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-05 Thread Peter Geoghegan
On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan p...@heroku.com wrote:
 The attached patch fixes Jeff's test case, as far as it goes. But, as
 I mentioned, it's not intended as a real fix. For one thing, I have
 made multiple changes to HeapTupleSatisfies*() routines, but haven't
 fully considered the consequences for, say, ri_triggers.c.
 HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not
 modified. I've modified HeapTupleSatisfiesVacuum() to see
 InvalidTransactionID (not invalid xid infomask bits) as visible for
 garbage collection (alongside HeapTupleSatisfiesDirty() and
 HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be
 surprised if that created new problems in other areas. In short, this
 patch is just for illustrative purposes.

I think that I see another race condition, requiring custom
instrumentation to the server to demonstrate.

Basically, even with the fix above, there are still similar races. I'm
not trying to call ExecLockUpdateTuple() against super deleted NULL
tuples, because those are no longer visible to any relevant snapshot
type, the fix described above. However, I can still see a change in
tuple header xmin between a dirty index scan and attempting to lock
that row to update. If I'm not mistaken, the race condition is small
enough that something like Jeff's tool won't catch it directly in a
reasonable amount of time, because it happens to be the same index key
value.

It's not all that easy to recreate, even with my custom
instrumentation. With fsync=off, it occurs somewhat infrequently with
a custom instrumentation Postgres build:

pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 10
[Mon Jan  5 17:31:57 2015] init done at count_upsert.pl line 101.
[Mon Jan  5 17:32:17 2015] child abnormal exit [Mon Jan  5 17:32:17
2015] DBD::Pg::db selectall_arrayref failed: ERROR:  mismatch in xmin
for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line
210.\n  at count_upsert.pl line 227.
[Mon Jan  5 17:32:49 2015] sum is -800
[Mon Jan  5 17:32:49 2015] count is 9515
[Mon Jan  5 17:32:49 2015] normal exit at 1420507969 after 733912
items processed at count_upsert.pl line 182.

I think that super deleting broken promise tuples undermines how
vacuum interlocking is supposed to work [1]. We end up at the wrong
tuple, that happens to occupy the same slot as the old one (and
happens to have the same index key value, because the race is so small
are there are relatively few distinct values in play - it's hard to
recreate). Attached instrumentation patch was used with Jeff Jane's
tool.

If we weren't super deleting in the patch, then I think that our
backend's xmin horizon would be enough of an interlock (haven't tested
that theory by testing value locking approach #1 implementation yet,
but I worried about the scenario quite a lot before and things seemed
okay).  But we are super deleting with implementation #2, and we're
also not holding a pin on the heap buffer or B-Tree leaf page buffer
throughout, so this happens (if I'm not mistaken).

[1] 
https://github.com/postgres/postgres/blob/REL9_3_STABLE/src/backend/access/nbtree/README#L142
-- 
Peter Geoghegan
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f8a4017..4ce45e3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1189,7 +1189,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  */
 bool
 ExecCheckIndexConstraints(TupleTableSlot *slot,
-		  EState *estate, ItemPointer conflictTid,
+		  EState *estate, HeapTuple conflictTid,
 		  Oid arbiterIdx)
 {
 	ResultRelInfo *resultRelInfo;
@@ -1339,7 +1339,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	 Datum *values, bool *isnull,
 	 EState *estate, bool newIndex,
 	 bool violationOK, bool wait,
-	 ItemPointer conflictTid)
+	 HeapTuple conflictTid)
 {
 	Oid		   *constr_procs;
 	uint16	   *constr_strats;
@@ -1469,7 +1469,10 @@ retry:
 		{
 			conflict = true;
 			if (conflictTid)
-*conflictTid = tup-t_self;
+			{
+conflictTid-t_self = tup-t_self;
+conflictTid-t_data = tup-t_data;
+			}
 			break;
 		}
 
@@ -1506,7 +1509,10 @@ retry:
 		{
 			conflict = true;
 			if (conflictTid)
-*conflictTid = tup-t_self;
+			{
+conflictTid-t_self = tup-t_self;
+conflictTid-t_data = tup-t_data;
+			}
 			break;
 		}
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 69af2d1..a289ef4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -56,7 +56,7 @@
 
 static bool ExecLockUpdateTuple(EState *estate,
 ResultRelInfo *relinfo,
-ItemPointer conflictTid,
+HeapTuple 	conflictTid,
 TupleTableSlot *planSlot,
 TupleTableSlot *insertSlot,
 bool			canSetTag,
@@ -292,7 +292,7 @@ ExecInsert(TupleTableSlot *slot,
 	else
 	{
 		boolconflict;
-		

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-05 Thread Michael Paquier
On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote:
 The patch 1 cannot be applied to the master successfully because of
 recent change.
Yes, that's caused by ccb161b. Attached are rebased versions.

 - The real stuff comes with patch 2, that implements the removal of
 PGLZ_Header, changing the APIs of compression and decompression to pglz to
 not have anymore toast metadata, this metadata being now localized in
 tuptoaster.c. Note that this patch protects the on-disk format (tested with
 pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
 compression and decompression look like with this patch, simply performing
 operations from a source to a destination:
 extern int32 pglz_compress(const char *source, int32 slen, char *dest,
   const PGLZ_Strategy *strategy);
 extern int32 pglz_decompress(const char *source, char *dest,
   int32 compressed_size, int32 raw_size);
 The return value of those functions is the number of bytes written in the
 destination buffer, and 0 if operation failed.

 So it's guaranteed that 0 is never returned in success case? I'm not sure
 if that case can really happen, though.
This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
compression algorithm to return a size of 0 bytes as compressed or
decompressed length btw? We could as well make it return a negative
value when a failure occurs if you feel more comfortable with it.
-- 
Michael


20150105_fpw_compression_v13.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-05 Thread Bruce Momjian
On Fri, Jan  2, 2015 at 06:25:52PM +0100, Andres Freund wrote:
 I can't run tests right now...
 
 What exactly do you want to see with these tests? that's essentially
 what I've already benchmarked + some fprintfs?

I want to test two patches that both allocate an extra 64 bytes but
shift the alignment of just the buffer descriptor memory allocation.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-01-05 Thread Michael Paquier
On Mon, Jan 5, 2015 at 10:39 PM, Alexey Vasiliev leopard...@inbox.ru wrote:
 Hello. Thanks for help. Yes, new patch look fine!
OK cool. Let's get an opinion from a committer then.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2015-01-05 Thread Amit Kapila
On Mon, Jan 5, 2015 at 9:57 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Jan 3, 2015 at 7:31 PM, Andres Freund and...@2ndquadrant.com
wrote:

  * Doesn't the restriction in GetSerializableTransactionSnapshotInt()
apply for repeatable read just the same?

 Yeah.  I'm not sure whether we really need that check at all, because
 there is a check in GetTransactionSnapshot() that probably checks the
 same thing.


The check in GetSerializableTransactionSnapshotInt() will also prohibit
any user/caller of SetSerializableTransactionSnapshot() in parallel mode
as that can't be prohibited by GetTransactionSnapshot().


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] Possible typo in create_policy.sgml

2015-01-05 Thread Amit Langote
Hi,

Following is perhaps a typo:

-   qualifications of queries which are run against the table the policy
is on,
+   qualifications of queries which are run against the table if the
policy is on,

Attached fixes it if so.

Thanks,
Amit
diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
index 4c8c001..c28ba2c 100644
--- a/doc/src/sgml/ref/create_policy.sgml
+++ b/doc/src/sgml/ref/create_policy.sgml
@@ -40,7 +40,7 @@ CREATE POLICY replaceable class=parametername/replaceable ON replaceable
 
   para
A policy is an expression which is added to the security-barrier
-   qualifications of queries which are run against the table the policy is on,
+   qualifications of queries which are run against the table if the policy is on,
or an expression which is added to the with-check options for a table and
which is applied to rows which would be added to the table.
The security-barrier qualifications will always be evaluated prior to any

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-05 Thread Josh Berkus
On 01/05/2015 05:43 PM, Peter Eisentraut wrote:
 The wins on the other hand are obscure: You can now use SHOW to inspect
 recovery settings.  You can design your own configuration file include
 structures to set them.  These are not bad, but is that all?

That's not the only potential win, and it's not small either. I'll be
able to tell what master a replica is replicating from using via a port
5432 connection (currently there is absolutely no way to do this).

 I note that all the new settings are PGC_POSTMASTER, so you can't set
 them on the fly.  Changing primary_conninfo without a restart would be a
 big win, for example.  Is that planned?

... but there you have the flaw in the ointment.  Unless we make some of
the settings superuserset, no, it doesn't win us a lot, except ...

 I have previously argued for a different approach: Ready recovery.conf
 as a configuration file after postgresql.conf, but keep it as a file to
 start recovery.  That doesn't break any old workflows, it gets you all
 the advantages of having recovery parameters in the GUC system, and it
 keeps all the options open for improving things further in the future.

... and there you hit on one of the other issues with recovery.conf,
which is that it's a configuration file with configuration parameters
which gets automatically renamed when a standby is promoted.  This plays
merry hell with configuration management systems.  The amount of
conditional logic I've had to write for Salt to handle recovery.conf
truly doesn't bear thinking about.  There may be some other way to make
recovery.conf configuration-management friendly, but I haven't thought
of it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-05 Thread Peter Geoghegan
The patch that implements INSERT ... ON CONFLICT UPDATE has support
and tests for per-column privileges (which are not relevant to the
IGNORE variant, AFAICT). However, RLS support is another thing
entirely. It has not been properly thought out, and unlike per-column
privileges requires careful consideration, as the correct behavior
isn't obvious.

I've documented the current problems with RLS here:

https://wiki.postgresql.org/wiki/UPSERT#RLS

It's not clear whether or not the auxiliary UPDATE within an INSERT...
ON CONFLICT UPDATE statement should have security quals appended.
Stephen seemed to think that that might not be the best solution [1].
I am not sure. I'd like to learn what other people think.

What is the best way of integrating RLS with ON CONFLICT UPDATE? What
behavior is most consistent with the guarantees of RLS? In particular,
should the implementation append security quals to the auxiliary
UPDATE, or fail sooner?

[1] 
http://www.postgresql.org/message-id/20141121205926.gk28...@tamriel.snowman.net
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-01-05 Thread Josh Berkus
On 01/05/2015 09:06 AM, Heikki Linnakangas wrote:
 I wasn't clear on my opinion here. I think I understood what Josh meant,
 but I don't think we should do it. Seems like unnecessary nannying of
 the DBA. Let's just mention in the manual that if you set
 wal_keep_segments higher than [insert formula here], you will routinely
 have more WAL in pg_xlog than what checkpoint_wal_size is set to.
 
 That seems a unrealistic goal. I've seen setups that have set
 checkpoint_segments intentionally, and with good reasoning, north of
 50k.
 
 So? I don't see how that's relevant.
 
 Neither wal_keep_segments, nor failing archive_commands nor replication
 slot should have an influence on checkpoint pacing.
 
 Agreed.

Oh, right, slots can also cause the log to increase in size.  And we've
already had the discussion about hard limits, which is maybe a future
feature and not part of this patch.

Can we figure out a reasonable formula?  My thinking is 50% for
wal_keep_segments, because we need at least 50% of the wals to do a
reasonable spread checkpoint.   If max_wal_size is 1GB, and
wal_keep_segments is 1.5GB, what would happen?  What if
wal_keep_segments is 0.9GB?

I need to create a fake benchmark for this ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread Fabien COELHO


Hello Alvaro,

Here is a v6 with most of your suggestions applied.


On top of evaluateExpr() we need a comment (generally I think pgbench
could do with more comments; not saying your patch should add them, just
expressing an opinion.)  Also, intuitively I would say that the return
values of that function should be reversed: return true if things are
good.


Comment  inverted return value done.


I wonder about LOCATE and LOCATION.  Can we do away with the latter, and
keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or
similar?  I would just expand an ad-hoc fprintf in the single place
where the other macro is used.


I've used just one PRINT_ERROR_AT() macro consistently.


Are we okay with only integer operands?  Is this something we would
expand in the future?  Is the gaussian/exp random stuff going to work
with integer operands, if we want to change it to use function syntax,
as expressed elsewhere?


Nothing for now, I feel it is for a later round.


[other mail] bring ERROR() macro back


I also prefer the code with it, but the cost-benefit of a pre-C99 
compatible implementation seems quite low, and it does imply less (style) 
changes with the previous situation as it is.


--
Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..2d4033b 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,9 @@ PGFILEDESC = pgbench - a simple program for running benchmark tests
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
+
+EXTRA_CLEAN	= exprparse.c exprscan.c
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +20,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..243c6b9
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include postgres_fe.h
+
+#include pgbench.h
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix=expr_yy
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type expr expr
+%type ival INTEGER
+%type str VARIABLE
+%token INTEGER VARIABLE
+%token CHAR_ERROR /* never used, will raise a syntax error */
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; }
+	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_INTEGER_CONSTANT;
+	expr-u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_VARIABLE;
+	expr-u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_OPERATOR;
+	

Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Tom Lane
David G Johnston david.g.johns...@gmail.com writes:
 Atri Sharma wrote
 If order of result rows is not the same as required, an error is raised:
 
 SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
 ERROR:  Order not same as specified

 First reaction for the error was unfavorable but (see below) it likely is
 the best option and does adequately cover the reason for failure -
 programmer error.

TBH, my first reaction to this entire patch is unfavorable: it's a
solution in search of a problem.  It adds substantial complication not
only for users but for PG developers in order to solve a rather narrow
performance issue.

What would make sense to me is to teach the planner about inlining
SQL functions that include ORDER BY clauses, so that the performance
issue of a double sort could be avoided entirely transparently to
the user.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-05 Thread Robert Haas
On Mon, Jan 5, 2015 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That's a laudable goal, but I would bet that nothing built on the FDW
 infrastructure will ever get there.

Why?

It would be surprising to me if, given that we have gone to some pains
to create a system that allows cross-system queries, and hopefully
eventually pushdown of quals, joins, and aggregates, we then made
sharding work in some completely different way that reuses none of
that infrastructure.  But maybe I am looking at this the wrong way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-05 Thread Petr Jelinek

On 24/12/14 20:11, Alex Shulgin wrote:

Alex Shulgin a...@commandprompt.com writes:


- the StandbyModeRequested and StandbyMode should be lowercased like
the rest of the GUCs


Yes, except that standby_mode is linked to StandbyModeRequested, not the
other one.  I can try see if there's a sane way out of this.


As I see it, renaming these will only add noise to this patch, and there
is also ArchiveRecoveryRequested / InArchiveRecovery.  This is going to
be tricky and I'm not sure it's really worth the effort.



I don't buy that to be honest, most (if not all) places that would be 
affected are already in diff as part of context around other renames 
anyway and it just does not seem right to rename some variables and not 
the others.


I still think there should be some thought given to merging the 
hot_standby and standby_mode, but I think we'd need opinion of more 
people (especially some committers) on this one.



- The whole CopyErrorData and memory context logic is not needed in
check_recovery_target_time() as the FlushErrorState() is not called
there


You must be right.  I recall having some trouble with strings being
free'd prematurely, but if it's ERROR, then there should be no need for
that.  I'll check again.


Hrm, after going through this again I'm pretty sure that was correct:
the only way to obtain the current error message is to use
CopyErrorData(), but that requires you to switch back to non-error
memory context (via Assert).


Right, my bad.



The FlushErrorState() call is not there because it's handled by the hook
caller (or it can exit via ereport() with elevel==ERROR).



Yes I've seen that, shouldn't you FreeErrorData(edata) then though? I 
guess it does not matter too much considering that you are going to 
throw error and die eventually anyway once you are in that code path, 
maybe just for the clarity...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 5, 2015 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That's a laudable goal, but I would bet that nothing built on the FDW
 infrastructure will ever get there.

 Why?

 It would be surprising to me if, given that we have gone to some pains
 to create a system that allows cross-system queries, and hopefully
 eventually pushdown of quals, joins, and aggregates, we then made
 sharding work in some completely different way that reuses none of
 that infrastructure.  But maybe I am looking at this the wrong way.

Well, we intentionally didn't couple the FDW stuff closely into
transaction commit, because of the thought that the far end would not
necessarily have Postgres-like transactional behavior, and even if it did
there would be about zero chance of having atomic commit with a
non-Postgres remote server.  postgres_fdw is a seriously bad starting
point as far as that goes, because it encourages one to make assumptions
that can't possibly work for any other wrapper.

I think the idea I sketched upthread of supporting an external transaction
manager might be worth pursuing, in that it would potentially lead to
having at least an approximation of atomic commit across heterogeneous
servers.

Independently of that, I think what you are talking about would be better
addressed outside the constraints of the FDW mechanism.  That's not to say
that we couldn't possibly make postgres_fdw use some additional non-FDW
infrastructure to manage commits; just that solving this in terms of the
FDW infrastructure seems wrongheaded to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_min_apply_delay with a negative value

2015-01-05 Thread Petr Jelinek

On 05/01/15 20:44, Robert Haas wrote:

On Sat, Jan 3, 2015 at 12:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Of course, if recovery_min_apply_delay were a proper GUC, we'd just
configure it with a minimum value of zero and be done :-(


Amen.  We should *really* convert all of the recovery.conf parameters
to be GUCs.



Well, there is an ongoing effort on that and I think the patch is very 
close to the state where committer should take a look IMHO, I have only 
couple of gripes with it now and one of them needs opinions of others 
anyway.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
On Mon, Jan 5, 2015 at 11:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David G Johnston david.g.johns...@gmail.com writes:
  Atri Sharma wrote
  If order of result rows is not the same as required, an error is raised:
 
  SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
  ERROR:  Order not same as specified

  First reaction for the error was unfavorable but (see below) it likely is
  the best option and does adequately cover the reason for failure -
  programmer error.

 TBH, my first reaction to this entire patch is unfavorable: it's a
 solution in search of a problem.  It adds substantial complication not
 only for users but for PG developers in order to solve a rather narrow
 performance issue.


I could agree about the scope of the performance issue, but am not sure
about the added complication. It essentially is similar to, say, a
combination of how Unique is implemented with a flavour or ORDINALITY
implementation. A new path that is added in a certain number of cases plus
a low overhead node does not seem too bad to me IMO. This is inline with a
lot of real world cases I have seen, where the data is *bubbled* up to
SRFs, which does give a possibility of an existing order. Couple it with
the power to specify ORDER BY in your SRF function and you could save a lot.

I am not sure how it complicates for hackers.  Could you please elaborate a
bit?


 What would make sense to me is to teach the planner about inlining
 SQL functions that include ORDER BY clauses, so that the performance
 issue of a double sort could be avoided entirely transparently to
 the user.


It sounds good, but inlining in current way shall restrict the scope of
optimization (which is not applicable for current design). For eg, you
cannot inline RECORD returning SRFs...


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-05 Thread Robert Haas
On Fri, Jan 2, 2015 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In short, you can't force 2PC technology on people who aren't using it
 already; while for those who are using it already, this isn't nearly
 good enough as-is.

I was involved in some internal discussions related to this patch, so
I have some opinions on it.  The long-term, high-level goal here is to
facilitate sharding.  If we've got a bunch of PostgreSQL servers
interlinked via postgres_fdw, it should be possible to perform
transactions on the cluster in such a way that transactions are just
as atomic, consistent, isolated, and durable as they would be with
just one server.  As far as I know, there is no way to achieve this
goal through the use of an external transaction manager, because even
if that external transaction manager guarantees, for every
transaction, that the transaction either commits on all nodes or rolls
back on all nodes, there's no way for it to guarantee that other
transactions won't see some intermediate state where the commit has
been completed on some nodes but not others.  To get that, you need
some of integration that reaches down to the way snapshots are taken.

I think, though, that it might be worthwhile to first solve the
simpler problem of figuring out how to ensure that a transaction
commits everywhere or rolls back everywhere, even if intermediate
states might still be transiently visible.   I don't think this patch,
as currently designed, is equal to that challenge, because
XACT_EVENT_PRE_COMMIT fires before the transaction is certain to
commit - PreCommit_CheckForSerializationFailure or PreCommit_Notify
could still error out.  We could have a hook that fires after that,
but that doesn't solve the problem if a user of that hook can itself
throw an error.  Even if part of the API contract is that it's not
allowed to do so, the actual attempt to commit the change on the
remote side can fail due to - e.g. - a network interruption, and
that's go to be dealt with somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2015-01-05 Thread Stephen Frost
Adam, all,

* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
 If others are also in agreement on this point then I'll update the patch
 accordingly.

Works for me.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] NODE

2015-01-05 Thread Ravi Kiran
hi,

I am going through the hashjoin algorithm in postgres. I find a function
ExecHashjoin , which is called each time a new tuple is required by the
hash join *Node.*

could someone explain what exactly node mean in  postgres.

Thanks


Re: [HACKERS] NODE

2015-01-05 Thread Heikki Linnakangas

On 01/05/2015 09:28 PM, Ravi Kiran wrote:

hi,

I am going through the hashjoin algorithm in postgres. I find a function
ExecHashjoin , which is called each time a new tuple is required by the
hash join *Node.*

could someone explain what exactly node mean in  postgres.


See src/backend/nodes/. It's a mechanism that imitates class inheritance 
in object-oriented languages. Node is the superclass that everything 
else inherits from. Every Node type supports some basic operations like 
copy, equals and serialization to/from text.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery_min_apply_delay with a negative value

2015-01-05 Thread Robert Haas
On Sat, Jan 3, 2015 at 12:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Of course, if recovery_min_apply_delay were a proper GUC, we'd just
 configure it with a minimum value of zero and be done :-(

Amen.  We should *really* convert all of the recovery.conf parameters
to be GUCs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I was involved in some internal discussions related to this patch, so
 I have some opinions on it.  The long-term, high-level goal here is to
 facilitate sharding.  If we've got a bunch of PostgreSQL servers
 interlinked via postgres_fdw, it should be possible to perform
 transactions on the cluster in such a way that transactions are just
 as atomic, consistent, isolated, and durable as they would be with
 just one server.  As far as I know, there is no way to achieve this
 goal through the use of an external transaction manager, because even
 if that external transaction manager guarantees, for every
 transaction, that the transaction either commits on all nodes or rolls
 back on all nodes, there's no way for it to guarantee that other
 transactions won't see some intermediate state where the commit has
 been completed on some nodes but not others.  To get that, you need
 some of integration that reaches down to the way snapshots are taken.

That's a laudable goal, but I would bet that nothing built on the FDW
infrastructure will ever get there.  Certainly the proposed patch doesn't
look like it moves us very far towards that set of goalposts.

 I think, though, that it might be worthwhile to first solve the
 simpler problem of figuring out how to ensure that a transaction
 commits everywhere or rolls back everywhere, even if intermediate
 states might still be transiently visible.

Perhaps.  I suspect that it might still be a dead end if the ultimate
goal is cross-system atomic commit ... but likely it would teach us
some useful things anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE
FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record
as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that
might be present inherently in the SRF's input or algorithm (the above
example and think generate_series).

This allows for eliminating sorts that can be based off the known existing
preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort
by e since  existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input
to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY
clause is false. For checking the order, a new node is added which is top
node of the plan and is responsible for projecting result rows. It tracks
the previous row seen and given a sort order, ensures that the current
tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
  QUERY PLAN
---
 OrderCheck
   -  Function Scan on correct_order_multicol
(2 rows)


If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR:  Order not same as specified



Preorder columns are first transformed into SortGroupClauses first and then
stored directly in pg_proc.


This functionality is a user case seen functionality, and is especially
useful when SRF inputs are large and/or might be pipelined from another
function (SRFs are used in pipelines in analytical systems many times, with
large data).

The overhead of this patch is small. A new path is added for the preorder
keys, and OrderCheck node's additional cost is pretty low, given that it
only compares two rows and stores only a single row (previous row seen),
hence the memory footprint is minuscule.

In the inner joins thread, Tom mentioned having a new node which has
multiple plans and executor can decide which plan to execute given runtime
conditions. I played around with the idea, and am open to experiment having
a new node which has a Sort based plan and is executed in case OrderCheck
node sees that the inherent order of result tuples is not correct. Feedback
here would be very welcome.


I will add the patch to current commitfest.

Thoughts?

Regards,

Atri


orderbycreatefuncver1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2015-01-05 Thread Robert Haas
On Fri, Jan 2, 2015 at 9:04 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 While working on parallel seq-scan patch to adapt this framework, I
 noticed few things and have questions regrading the same.

 1.
 Currently parallel worker just attaches to error queue, for tuple queue
 do you expect it to be done in the same place or in the caller supplied
 function, if later then we need segment address as input to that function
 to attach queue to the segment(shm_mq_attach()).
 Another question, I have in this regard is that if we have redirected
 messages to error queue by using pq_redirect_to_shm_mq, then how can
 we set tuple queue for the same purpose.  Similarly I think more handling
 is needed for tuple queue in master backend and the answer to above will
 dictate what is the best way to do it.

I've come to the conclusion that it's a bad idea for tuples to be sent
through the same queue as errors.  We want errors (or notices, but
especially errors) to be processed promptly, but there may be a
considerable delay in processing tuples.  For example, imagine a plan
that looks like this:

Nested Loop
- Parallel Seq Scan on p
- Index Scan on q
Index Scan: q.x = p.x

The parallel workers should fill up the tuple queues used by the
parallel seq scan so that the master doesn't have to do any of that
work itself.  Therefore, the normal situation will be that those tuple
queues are all full.  If an error occurs in a worker at that point, it
can't add it to the tuple queue, because the tuple queue is full.  But
even if it could do that, the master then won't notice the error until
it reads all of the queued-up tuple messges that are in the queue in
front of the error, and maybe some messages from the other queues as
well, since it probably round-robins between the queues or something
like that.  Basically, it could do a lot of extra work before noticing
that error in there.

Now we could avoid that by having the master read messages from the
queue immediately and just save them off to local storage if they
aren't error messages.  But that's not very desirable either, because
now we have no flow-control.  The workers will just keep spamming
tuples that the master isn't ready for into the queues, and the master
will keep reading them and saving them to local storage, and
eventually it will run out of memory and die.

We could engineer some solution to this problem, of course, but it
seems quite a bit simpler to just have two queues.  The error queues
don't need to be very big (I made them 16kB, which is trivial on any
system on which you care about having working parallelism) and the
tuple queues can be sized as needed to avoid pipeline stalls.

 2.
 Currently there is no interface for wait_for_workers_to_become_ready()
 in your patch, don't you think it is important that before start of fetching
 tuples, we should make sure all workers are started, what if some worker
 fails to start?

I think that, in general, getting the most benefit out of parallelism
means *avoiding* situations where backends have to wait for each
other.  If the relation being scanned is not too large, the user
backend might be able to finish the whole scan - or a significant
fraction of it - before the workers initialize.  Of course in that
case it might have been a bad idea to parallelize in the first place,
but we should still try to make the best of the situation.  If some
worker fails to start, then instead of having the full degree N
parallelism we were hoping for, we have some degree K  N, so things
will take a little longer, but everything should still work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2015-01-05 Thread Petr Jelinek

On 05/01/15 07:28, Fujii Masao wrote:

On Thu, Dec 4, 2014 at 12:08 PM, Fujii Masao masao.fu...@gmail.com wrote:

On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Pushed with some extra cosmetic tweaks.


I got the following assertion failure when I executed pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c,
Line: 315)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the standby
#5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby

BTW, at the step #4, I got the following log messages. This might be a hint for
this problem.

LOG:  file pg_commit_ts/ doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384


This problem still happens in the master.

Regards,



Attached patch fixes it, I am not sure how happy I am with the way I did 
it though.


And while at it I noticed that redo code for XLOG_PARAMETER_CHANGE sets
ControlFile-wal_log_hints = wal_log_hints;
shouldn't it be
ControlFile-wal_log_hints = xlrec.wal_log_hints;
instead?

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ca074da..fcfccf8 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,12 @@ StartupCommitTs(void)
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
 
+	if (track_commit_timestamp)
+	{
+		ActivateCommitTs();
+		return;
+	}
+
 	LWLockAcquire(CommitTsControlLock, LW_EXCLUSIVE);
 
 	/*
@@ -571,14 +577,25 @@ StartupCommitTs(void)
  * This must be called ONCE during postmaster or standalone-backend startup,
  * when commit timestamp is enabled.  Must be called after recovery has
  * finished.
+ */
+void
+CompleteCommitTsInitialization(void)
+{
+	if (!track_commit_timestamp)
+		DeactivateCommitTs(true);
+}
+
+/*
+ * This must be called when track_commit_timestamp is turned on.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
  *
  * This is in charge of creating the currently active segment, if it's not
  * already there.  The reason for this is that the server might have been
  * running with this module disabled for a while and thus might have skipped
  * the normal creation point.
  */
-void
-CompleteCommitTsInitialization(void)
+void ActivateCommitTs(void)
 {
 	TransactionId xid = ShmemVariableCache-nextXid;
 	int			pageno = TransactionIdToCTsPage(xid);
@@ -591,22 +608,6 @@ CompleteCommitTsInitialization(void)
 	LWLockRelease(CommitTsControlLock);
 
 	/*
-	 * If this module is not currently enabled, make sure we don't hand back
-	 * possibly-invalid data; also remove segments of old data.
-	 */
-	if (!track_commit_timestamp)
-	{
-		LWLockAcquire(CommitTsLock, LW_EXCLUSIVE);
-		ShmemVariableCache-oldestCommitTs = InvalidTransactionId;
-		ShmemVariableCache-newestCommitTs = InvalidTransactionId;
-		LWLockRelease(CommitTsLock);
-
-		TruncateCommitTs(ReadNewTransactionId());
-
-		return;
-	}
-
-	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
 	 * need to set the oldest and newest values to the next Xid; that way, we
 	 * will not try to read data that might not have been set.
@@ -641,6 +642,35 @@ CompleteCommitTsInitialization(void)
 }
 
 /*
+ * This must be called when track_commit_timestamp is turned off.
+ * Note that this only happens during postmaster or standalone-backend startup
+ * or during WAL replay.
+ *
+ * Resets CommitTs into invalid state to make sure we don't hand back
+ * possibly-invalid data; also removes segments of old data.
+ */
+void
+DeactivateCommitTs(bool do_wal)
+{
+	TransactionId xid = ShmemVariableCache-nextXid;
+	

Re: [HACKERS] Parallel Seq Scan

2015-01-05 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 I think it's right to view this in the same way we view work_mem.  We
 plan on the assumption that an amount of memory equal to work_mem will
 be available at execution time, without actually reserving it.  

Agreed- this seems like a good approach for how to address this.  We
should still be able to end up with plans which use less than the max
possible parallel workers though, as I pointed out somewhere up-thread.
This is also similar to work_mem- we certainly have plans which don't
expect to use all of work_mem and others that expect to use all of it
(per node, of course).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread Fabien COELHO


Hello Alvaro,


On top of evaluateExpr() we need a comment (generally I think pgbench
could do with more comments; not saying your patch should add them, just
expressing an opinion.)


Having spent some time in pgbench, I agree that more comments are a good 
thing.


Also, intuitively I would say that the return values of that function 
should be reversed: return true if things are good.


Ok.


Can we cause a stack overflow in this function with a complex
enough expression?


Not for any practical purpose, IMO.


I wonder about LOCATE and LOCATION.  Can we do away with the latter, and
keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or
similar?  I would just expand an ad-hoc fprintf in the single place
where the other macro is used.


I think that all location information should always be the same, so 
having it defined only once helps maintenance. If someone fixes the macro 
and there is one expanded version it is likely that it would not be 
changed. Maybe we could do with only one macro, though.


Are we okay with only integer operands?  Is this something we would 
expand in the future?


Probably


Is the gaussian/exp random stuff going to work with integer operands,


No, it will need a floating point parameter, but I was thinking of only 
adding constants floats as function arguments in a first approach, and not 
allow an expression syntax on these, something like:


  \set n exprand(1, :size+3, 2.0) + 1

But not

  \set n exprand(1, :size+3, :size/3.14159) + 1

That is debatable. Otherwise we have to take care of typing issues, which 
would complicate the code significantly with two dynamic types (int  
float) to handle, propagate and so in the expression evaluation. It is 
possible though, but it seems to me that it is a lot of bother for a small 
added value.


Anyway, I suggest to keep that for another round and keep the Robert's
isofunctional patch as it is before extending.


if we want to change it to use function syntax, as expressed elsewhere?


I think I'll add a function syntax, and add a new node type to handle 
these, but the current syntax should/might be preserved for upward 
compatibility.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

2015-01-05 Thread Manuel Kniep
Hi, 

we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with gcc 
4.8.3 

getting a segfault from time to time with the below core dump.

The error happens only on our production system and is not reproducible a 
second run after
database recovery always succeed without any error.

From the coredump I can’t imagine where this error might come from.

Any ideas ?

Thanks

Manuel


#0  0x00608f74 in ExecStoreVirtualTuple (slot=0x24a40bd00) at 
execTuples.c:508
#1  0x005f9d6c in ExecFilterJunk (junkfilter=0x24a40cf50, 
slot=0x3ecbac18) at execJunk.c:318
#2  0x0061eba4 in ExecModifyTable (node=0x59ac150) at 
nodeModifyTable.c:988
#3  0x005fda57 in ExecProcNode (node=0x59ac150) at execProcnode.c:377
#4  0x00622a62 in CteScanNext (node=0x29500050) at nodeCtescan.c:103
#5  0x006084e7 in ExecScanFetch (node=0x29500050, accessMtd=0x622938 
CteScanNext, recheckMtd=0x622ac8 CteScanRecheck) at execScan.c:82
#6  0x00608556 in ExecScan (node=0x29500050, accessMtd=0x622938 
CteScanNext, recheckMtd=0x622ac8 CteScanRecheck) at execScan.c:132
#7  0x00622afd in ExecCteScan (node=0x29500050) at nodeCtescan.c:155
#8  0x005fdb53 in ExecProcNode (node=0x29500050) at execProcnode.c:434
#9  0x005fbcaf in ExecutePlan (estate=0x9280ef0, planstate=0x29500050, 
operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0, 
direction=ForwardScanDirection, dest=0x13b4530)
    at execMain.c:1473
#10 0x005fa12b in standard_ExecutorRun (queryDesc=0x228e58a78, 
direction=ForwardScanDirection, count=0) at execMain.c:307
#11 0x005fa02d in ExecutorRun (queryDesc=0x228e58a78, 
direction=ForwardScanDirection, count=0) at execMain.c:255
#12 0x0059449a in ExecCreateTableAs (stmt=0x927e3e0, 
    queryString=0x22b880bb0 \n    CREATE TEMPORARY TABLE unpro_inc_23472 ON 
COMMIT DROP AS\n    WITH affected AS (\n      UPDATE events t SET processed = 
true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) 
..., params=0x563d0520, completionTag=0x7aa88cc0 ) at createas.c:170
#13 0x007372c0 in ProcessUtilitySlow (parsetree=0x927e3e0, 
    queryString=0x22b880bb0 \n    CREATE TEMPORARY TABLE unpro_inc_23472 ON 
COMMIT DROP AS\n    WITH affected AS (\n      UPDATE events t SET processed = 
true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) 
..., context=PROCESS_UTILITY_QUERY, params=0x563d0520, dest=0xc56fc0 
spi_printtupDR, completionTag=0x7aa88cc0 ) at utility.c:1231
#14 0x0073688f in standard_ProcessUtility (parsetree=0x927e3e0, 
    queryString=0x22b880bb0 \n    CREATE TEMPORARY TABLE unpro_inc_23472 ON 
COMMIT DROP AS\n    WITH affected AS (\n      UPDATE events t SET processed = 
true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) 
..., context=PROCESS_UTILITY_QUERY, params=0x563d0520, dest=0xc56fc0 
spi_printtupDR, completionTag=0x7aa88cc0 ) at utility.c:829
#15 0x00735ba3 in ProcessUtility (parsetree=0x927e3e0, 
    queryString=0x22b880bb0 \n    CREATE TEMPORARY TABLE unpro_inc_23472 ON 
COMMIT DROP AS\n    WITH affected AS (\n      UPDATE events t SET processed = 
true\n      WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) 
..., context=PROCESS_UTILITY_QUERY, params=0x563d0520, dest=0xc56fc0 
spi_printtupDR, completionTag=0x7aa88cc0 ) at utility.c:309
#16 0x0062e897 in _SPI_execute_plan (plan=0x7aa88d70, 
paramLI=0x563d0520, snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 '\000', 
fire_triggers=1 '\001', tcount=0) at spi.c:2160
#17 0x0062bb9d in SPI_execute_with_args (
    src=0x22b880bb0 \n    CREATE TEMPORARY TABLE unpro_inc_23472 ON COMMIT 
DROP AS\n    WITH affected AS (\n      UPDATE events t SET processed = true\n   
   WHERE tracker_id BETWEEN $1 AND $2\n      AND ajdate(t.created_at) ..., 
nargs=4, argtypes=0x22b880ab0, Values=0x22b880af0, Nulls=0x22b881010     
\002, read_only=0 '\000', tcount=0) at spi.c:537
#18 0x7f3635560ed0 in exec_stmt_dynexecute (estate=0x7aa890d0, 
stmt=0xe2f140) at pl_exec.c:3462
#19 0x7f363555cfe9 in exec_stmt (estate=0x7aa890d0, stmt=0xe2f140) at 
pl_exec.c:1450
#20 0x7f363555cd05 in exec_stmts (estate=0x7aa890d0, stmts=0xe2ef58) at 
pl_exec.c:1345
#21 0x7f363555cbb0 in exec_stmt_block (estate=0x7aa890d0, 
block=0xe2fa80) at pl_exec.c:1283
#22 0x7f363555ab97 in plpgsql_exec_function (func=0xe1f5b0, 
fcinfo=0x11daac0) at pl_exec.c:321
#23 0x7f3632be in plpgsql_call_handler (fcinfo=0x11daac0) at 
pl_handler.c:129
#24 0x0060114d in ExecMakeFunctionResultNoSets (fcache=0x11daa50, 
econtext=0x346e2840, isNull=0x7aa89473 , isDone=0x0) at execQual.c:1992
#25 0x00601b4b in ExecEvalFunc (fcache=0x11daa50, econtext=0x346e2840, 
isNull=0x7aa89473 , isDone=0x0) at execQual.c:2383
#26 0x7f3635563f3b in exec_eval_simple_expr (estate=0x7aa89740, 
expr=0xdc4ab0, 

Re: [HACKERS] add modulo (%) operator to pgbench

2015-01-05 Thread Fabien COELHO



I'm also just looking at you ERROR() macro, it seems that core code is
quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is
not defined. I'd say this needs to be fixed too. Have a look at the trick
used in elog.h for when HAVE__VA_ARGS  is not defined.


Hm, I just looked at the previous version which used ERROR rather than
LOCATE and LOCATION, and I liked that approach better.  Can we get that
back?


It seems that postgresql must be able to compile with a preprocessor which 
does not handle varargs macros, so I thought it simpler to just expand the 
macro. If we must have it without a vararg macro, it means creating an 
adhoc vararg function to handle the vfprintf and exit. Oviously it can be 
done, if vfprintf is available. The prior style was to repeat fprintf/exit 
everywhere, I wanted to improve a little, but not to bother too much about 
portability constraints with pre C99 compilers.



I understand that for the purposes of this patch it's easier to
not change existing fprintf()/exit() calls, though.


The issue is really the portability constraints. C99 is only 16 years old, 
so it is a minor language:-)


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread David G Johnston
Atri Sharma wrote
 If order of result rows is not the same as required, an error is raised:
 
 SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
 ERROR:  Order not same as specified

First reaction for the error was unfavorable but (see below) it likely is
the best option and does adequately cover the reason for failure -
programmer error.  It is not data specific (other than by accident) so any
runtime attempt to correct an error is going to be wasted effort and
overhead.


 In the inner joins thread, Tom mentioned having a new node which has
 multiple plans and executor can decide which plan to execute given runtime
 conditions. I played around with the idea, and am open to experiment
 having
 a new node which has a Sort based plan and is executed in case OrderCheck
 node sees that the inherent order of result tuples is not correct.
 Feedback
 here would be very welcome.

Could SQL functions be explored such that if the planner sees an order by it
omits the post-function sort node otherwise it adds an explicit one?

How expensive is sorting an already sorted result?

Runtime conditional sorting seems worse case, depending on implementation,
since a large result with an error in the last bit will end up nearly double
processing.  I'd rather deem unsorted output programmer error and raise the
error so it is likely to be discovered during testing and not have any
meaningful runtime overhead.

David J.




--
View this message in context: 
http://postgresql.nabble.com/Patch-to-add-functionality-to-specify-ORDER-BY-in-CREATE-FUNCTION-for-SRFs-tp5832876p5832885.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL information view

2015-01-05 Thread Peter Eisentraut
On 11/19/14 7:36 AM, Magnus Hagander wrote:
 +  row
 +   
 entrystructnamepg_stat_ssl/indextermprimarypg_stat_ssl/primary/indexterm/entry
 +   entryOne row per connection (regular and replication), showing 
 statistics about
 +SSL used on this connection.
 +See xref linkend=pg-stat-ssl-view for details.
 +   /entry
 +  /row
 + 

It doesn't really show statistics.  It shows information or data.

We should make contrib/sslinfo a wrapper around this view as much as
possible.

Is it useful to include rows for sessions not using SSL?

Should we perpetuate the ssl-naming?  Is there a more general term?

Will this work for non-OpenSSL implementations?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Jim Nasby

On 1/5/15, 12:20 PM, Atri Sharma wrote:

What would make sense to me is to teach the planner about inlining
SQL functions that include ORDER BY clauses, so that the performance
issue of a double sort could be avoided entirely transparently to
the user.


It sounds good, but inlining in current way shall restrict the scope of 
optimization (which is not applicable for current design). For eg, you cannot 
inline RECORD returning SRFs...


Related... I'd like to see a way to inline a function that does something like:

CREATE FUNCTION foo(text) RETURNS int LANGUAGE sql AS $$
SELECT a FROM b WHERE lower(b.c) = lower($1)
$$

and have the performance be comparable to

SELECT ..., (SELECT a FROM b WHERE lower(b.c) = lower(something)) AS foo

I realize that there's a whole question about the function not being an SRF, 
but the thing is this works great when manually inlined and is fast. The SQL 
function is significantly slower.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 Related... I'd like to see a way to inline a function that does something 
 like:

 CREATE FUNCTION foo(text) RETURNS int LANGUAGE sql AS $$
 SELECT a FROM b WHERE lower(b.c) = lower($1)
 $$

The reason that's not inlined ATM is that the semantics wouldn't be the
same (ie, what happens if the SELECT returns more than one row).  It's
possible they would be the same if we attached a LIMIT 1 to the function's
query, but I'm not 100% sure about that offhand.  I'm also not real sure
that you'd still get good performance if there were an inserted LIMIT;
that would disable at least some optimizations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers