Re: [HACKERS] A little RLS oversight?

2015-07-26 Thread Dean Rasheed
On 25 July 2015 at 19:12, Joe Conway joe.con...@crunchydata.com wrote:
 On 07/22/2015 02:17 PM, Dean Rasheed wrote:
 Hmm, I think it probably ought to do more, based on whether or not RLS
 is being bypassed or in force-mode -- see the first few checks in
 get_row_security_policies(). Perhaps a new SQL-callable function
 exposing those checks and calling check_enable_rls(). It's probably
 still worth including the c.relrowsecurity = false check in SQL to
 save calling the function for the majority of tables that don't have
 RLS.

 Please see the attached patch and let me know what you think. I believe
 the only thing lacking is documentation for the two new user visible
 functions. Comments?


I'm not convinced about exporting convert_table_name() from acl.c,
particularly with such a non-descriptive name. It's only a couple of
lines of code, so I think they may as well just be included directly
in the new function, as seems to be common practice elsewhere.

As it stands, passing checkAsUser = GetUserId() to check_enable_rls()
will cause it to skip the check for row_security = OFF, and so it may
falsely report that RLS is active when the user has bypassed it. To
avoid that row_security_active() needs to pass checkAsUser =
InvalidOid to check_enable_rls().

[ Actually there seem to be a few other callers of check_enable_rls()
that suffer the same problem. I don't really understand the reasoning
behind that check in check_enable_rls() - if the current user has the
BYPASSRLS attribute and row_security = OFF, shouldn't RLS be disabled
on every table no matter how it is accessed? ]

I think it would be better if the security context check were part of
this new function too. Right now that can't make any difference, since
only the RI triggers set SECURITY_ROW_LEVEL_DISABLED and this function
cannot be called in that code path, but it's possible that in the
future other code might set SECURITY_ROW_LEVEL_DISABLED, so moving the
check down guarantees that check_enable_rls()/row_security_active()
always accurately return the RLS status for the table.

While I was looking at it, I spotted a couple of other things to tidy
up in existing related code:

* The comment for GetUserIdAndSecContext() needed updating for the new RLS bit.

* Calling GetUserIdAndSecContext() and then throwing away the user_id
returned seems ugly. There is already a code style precedent for
checking the other bits of SecurityRestrictionContext, so I've added a
similar bit-testing function for SECURITY_ROW_LEVEL_DISABLED which
makes this code a bit neater.

Attached is an updated patch (still needs some docs for the functions).

Regards,
Dean


rls-pg-stats.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-07-26 Thread Fabien COELHO


Hello,

Attached is very minor v5 update which does a rebase  completes the 
cleanup of doing a full sort instead of a chuncked sort.


Attached is an updated version of the patch which turns the sort option 
into a boolean, and also include the sort time in the checkpoint log.


There is still an open question about whether the sorting buffer allocation 
is lost on some signals and should be reallocated in such event.


In such case, probably the allocation should be managed from 
CheckpointerMain, and the lazy allocation could remain for other callers (I 
guess just initdb).



More open questions:

- best name for the flush option (checkpoint_flush_to_disk,
checkpoint_flush_on_write, checkpoint_flush, ...)

- best name for the sort option (checkpoint_sort,
checkpoint_sort_buffers, checkpoint_sort_ios, ...)


Other nice-to-have inputs:

- tests on a non-linux system with posix_fadvise
  (FreeBSD? others?)

- tests on a large dedicated box


Attached are some scripts to help with testing, if someone's feels like that:

- cp_test.sh: run some tests, to adapt to one's setup...

- cp_test_count.pl: show percent of late transactions

- avg.py: show stats about stuff

  sh grep 'progress: ' OUTPUT_FILE | cut -d' ' -f4 | avg.py

  *BEWARE* that if pgbench got stuck some 0 data are missing,
  look for the actual tps in the output file and for the line
  count to check whether it is the case... some currently submitted
  patch on pgbench helps, see https://commitfest.postgresql.org/5/199/


As this pgbench patch is now in master, pgbench is less likely to get 
stuck, but check nevertheless that the number of progress line matches the 
expected number.


--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bbe1eb0..0257e34 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2483,6 +2483,28 @@ include_dir 'conf.d'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-checkpoint-sort xreflabel=checkpoint_sort
+  termvarnamecheckpoint_sort/varname (typebool/type)
+  indexterm
+   primaryvarnamecheckpoint_sort/ configuration parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+Whether to sort buffers before writting them out to disk on checkpoint.
+For a HDD storage, this setting allows to group together
+neighboring pages written to disk, thus improving performance by
+reducing random write activity.
+This sorting should have limited performance effects on SSD backends
+as such storages have good random write performance, but it may
+help with wear-leveling so be worth keeping anyway.
+The default is literalon/.
+This parameter can only be set in the filenamepostgresql.conf/
+file or on the server command line.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-checkpoint-warning xreflabel=checkpoint_warning
   termvarnamecheckpoint_warning/varname (typeinteger/type)
   indexterm
@@ -2504,6 +2526,24 @@ include_dir 'conf.d'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-checkpoint-flush-to-disk xreflabel=checkpoint_flush_to_disk
+  termvarnamecheckpoint_flush_to_disk/varname (typebool/type)
+  indexterm
+   primaryvarnamecheckpoint_flush_to_disk/ configuration parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+When writing data for a checkpoint, hint the underlying OS that the
+data must be sent to disk as soon as possible.  This may help smoothing
+disk I/O writes and avoid a stall when fsync is issued at the end of
+the checkpoint, but it may also reduce average performance.
+This setting may have no effect on some platforms.
+The default is literaloff/.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-min-wal-size xreflabel=min_wal_size
   termvarnamemin_wal_size/varname (typeinteger/type)
   indexterm
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index e3941c9..eea6668 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -546,6 +546,29 @@
   /para
 
   para
+   When hard-disk drives (HDD) are used for terminal data storage
+   xref linkend=guc-checkpoint-sort allows to sort pages
+   so that neighboring pages on disk will be flushed together by
+   chekpoints, reducing the random write load and improving performance.
+   If solid-state drives (SSD) are used, sorting pages induces no benefit
+   as their random write I/O performance is good: this feature could then
+   be disabled by setting varnamecheckpoint_sort/ to valueoff/.
+   It is possible that sorting may help with SSD wear leveling, so it may
+   be kept on that account.
+  /para
+
+  para
+   On Linux and POSIX platforms, xref linkend=guc-checkpoint-flush-to-disk
+   allows to hint the OS that pages written on checkpoints must be flushed
+   to disk quickly.  

Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-26 Thread Tom Lane
Andreas Seltenreich seltenre...@gmx.de writes:
 when running my random query generator contraption[1] against the
 regression database of 9.5 or master, it occasionally triggers one of
 the following three assertions.

Very very cool tool!  Please keep doing that testing.

The first two seem to be planner problems, so I'll take responsibility for
digging into those.  But the third appears to be plain old brain fade in
the BRIN code.  It can be reproduced by

regression=# explain select * from brintest where int4col = 
NULL::integer::information_schema.cardinal_number;
   QUERY PLAN   



 Bitmap Heap Scan on brintest  (cost=52.01..56.02 rows=1 width=339)
   Recheck Cond: (int4col = ((NULL::integer)::information_schema.cardinal_number
)::integer)
   -  Bitmap Index Scan on brinidx  (cost=0.00..52.01 rows=1 width=0)
 Index Cond: (int4col = ((NULL::integer)::information_schema.cardinal_nu
mber)::integer)
(4 rows)

regression=# select * from brintest where int4col = 
NULL::integer::information_schema.cardinal_number;
server closed the connection unexpectedly

or you can do it like this:

regression=# select * from brintest where int4col = (select NULL::integer);
server closed the connection unexpectedly

or you could do it with a join to a table containing some null values.

You need some complication because if you just write a plain null literal:

regression=# explain select * from brintest where int4col = NULL::integer; 
QUERY PLAN
--
 Result  (cost=0.00..106.30 rows=1 width=339)
   One-Time Filter: NULL::boolean
   -  Seq Scan on brintest  (cost=0.00..106.30 rows=1 width=339)
(3 rows)

the planner knows that int4eq is strict so it reduces the WHERE clause
to constant NULL and doesn't bother with an indexscan.

Bottom line is that somebody failed to consider the possibility of a
null comparison value reaching the BRIN index lookup machinery.
The code stanza that's failing supposes that only IS NULL or IS NOT NULL
tests could have SK_ISNULL set, but that's just wrong.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving replay of XLOG_BTREE_VACUUM records

2015-07-26 Thread Andres Freund
On 2015-07-24 09:53:49 +0300, Heikki Linnakangas wrote:
 On 05/02/2015 02:10 AM, Jim Nasby wrote:
 This looks like a good way to address this until the more significant
 work can be done.
 
 I'm not a fan of RBM_ZERO_NO_BM_VALID; how about RBM_ZERO_BM_INVALID?
 or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
 the code; I see the logic to NO_BM_VALID...
 
 + * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set
 + * BM_VALID bit before returning buffer so that noone could pin it.
 
 It would be better to explain why we want that mode. How about:
 
 RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
 BM_VALID before returning the buffer. This is done to ensure that no one
 can pin the buffer without actually reading the buffer contents in. This
 is necessary while replying XLOG_BTREE_VACUUM records in hot standby.

 That's a rather strange mode. The BM_VALID flag is internal to the buffer
 manager - if you don't know how the buffer manager works, you cannot
 understand that description. I couldn't quite understand what that means
 myself. What can or can you not do with a buffer that's not marked as
 BM_VALID? I'd suggest a mode like this instead:

 In RBM_IF_IN_CACHE mode, the page is returned if it is in the buffer cache
 already. If it's not, it is not read from disk, and InvalidBuffer is
 returned instead.

To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something like

bool
BufferInCache(Relation, ForkNumber, BlockNumber)
{
/* XXX: setup tag, hash, partition */

LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(newTag, newHash);
LWLockRelease(newPartitionLock);
return buf_id != -1;
}

and then fall back to the normal ReadBuffer() when it's in cache.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-07-26 Thread Noah Misch
On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote:
 Peter Geoghegan wrote:
  On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch n...@leadboat.com wrote:
 - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local 
   pin
   count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared buffer
   when its global pin count falls to zero.
  
  Did a patch for this ever materialize?
 
 I think the first part would be something like the attached.

Neat.  Does it produce any new complaints during make installcheck?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSets: Getting collation related error when GSets has text column

2015-07-26 Thread Andres Freund
Hi,

On 2015-07-17 18:55:52 +0530, Jeevan Chalke wrote:
 Attached patch which attempts to fix this issue. However I am not sure
 whether it is correct. But it does not add any new issues as such.
 Added few test in the patch as well.

Pushed the fix. Thanks for the report and fix.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Buildfarm failure from overly noisy warning message

2015-07-26 Thread Tom Lane
chipmunk failed last night
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunkdt=2015-07-26%2007%3A36%3A32

like so:

== pgsql.build/src/test/regress/regression.diffs 
===
*** 
/home/pgbfarm/buildroot/REL9_3_STABLE/pgsql.build/src/test/regress/expected/create_index.out
Sun Jul 26 10:37:41 2015
--- 
/home/pgbfarm/buildroot/REL9_3_STABLE/pgsql.build/src/test/regress/results/create_index.out
 Sun Jul 26 10:51:48 2015
***
*** 14,19 
--- 14,20 
  CREATE INDEX tenk1_hundred ON tenk1 USING btree(hundred int4_ops);
  CREATE INDEX tenk1_thous_tenthous ON tenk1 (thousand, tenthous);
  CREATE INDEX tenk2_unique1 ON tenk2 USING btree(unique1 int4_ops);
+ WARNING:  could not send signal to process 30123: No such process
  CREATE INDEX tenk2_unique2 ON tenk2 USING btree(unique2 int4_ops);
  CREATE INDEX tenk2_hundred ON tenk2 USING btree(hundred int4_ops);
  CREATE INDEX rix ON road USING btree (name text_ops);

==

What's evidently happened here is that our session tried to boot an
autovacuum process off a table lock, only that process was gone by the
time we issued the kill() call.  No problem really ... but aside from
causing buildfarm noise, I could see somebody getting really panicky
if this message appeared on a production server.

I'm inclined to reduce the WARNING to LOG, and/or skip it altogether
if the error is ESRCH.  The relevant code in ProcSleep() is:

ereport(LOG,
  (errmsg(sending cancel to blocking autovacuum PID %d,
  pid),
   errdetail_log(%s, logbuf.data)));

if (kill(pid, SIGINT)  0)
{
/* Just a warning to allow multiple callers */
ereport(WARNING,
(errmsg(could not send signal to process %d: %m,
pid)));
}

so logging failures at LOG level seems pretty reasonable.  One could
also argue that both of these ereports ought to be downgraded to DEBUG1
or less, since this mechanism is pretty well shaken out by now.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Grouping Sets: Fix unrecognized node type bug

2015-07-26 Thread Andres Freund
On 2015-07-17 19:57:22 +0100, Andrew Gierth wrote:
 Attached is the current version of my fix (with Jeevan's regression
 tests plus one of mine).

Pushed, thanks for the report and fix!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CustomScan and readfuncs.c

2015-07-26 Thread Kouhei Kaigai
Hello,

Under the investigation of ParallelAppend, I noticed here is a few
problems in CustomScan, that prevents to reproduce an equivalent
plan node on the background worker from serialized string.

1. CustomScanMethods-TextOutCustomScan callback

This callback allows to output custom information on nodeToString.
Originally, we intend to use this callback for debug only, because
CustomScan must be copyObject() safe, thus, all the private data
also must be stored in custom_exprs or custom_private.
However, it will lead another problem when we try to reproduce
CustomScan node from the string form generated by outfuncs.c.
If TextOutCustomScan prints something, upcoming _readCustomScan
has to deal with unexpected number of tokens in unexpected format.
I'd like to propose to omit this callback prior to v9.5 release,
for least compatibility issues.

2. Reproduce method table on background worker
--
The method field of CustomPath/Scan/ScanState is expected to be
a reference to a static structure. Thus, copyObject() does not
copy the entire table, but only pointers.
However, we have no way to guarantee the callback functions have
same entrypoint addressed on background workers. So, we may need
an infrastructure to reproduce same CustomScan node with same
callback function tables, probably, identified by name.
We may have a few ways to solve the problem.

* Add system catalog, function returns pointer
The simplest way, like FDW. System catalog has a name and function
to return callback pointers. It also needs SQL statement support,
even a little down side.

* Registered by name, during shared_preload_libraries only
Like an early version of CustomScan interface, it requires custom-
scan providers to register a pair of name and callbacks, but only
when shared_preload_libraries is processed, to guarantee the callbacks
are also registered in the background workers also.
(Is this assumption right on windows?)

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-26 Thread Joe Conway
On 07/26/2015 07:19 AM, Dean Rasheed wrote:
 I'm not convinced about exporting convert_table_name() from acl.c,
 particularly with such a non-descriptive name. It's only a couple of
 lines of code, so I think they may as well just be included directly
 in the new function, as seems to be common practice elsewhere.

fair enough

 As it stands, passing checkAsUser = GetUserId() to check_enable_rls()
 will cause it to skip the check for row_security = OFF, and so it may
 falsely report that RLS is active when the user has bypassed it. To
 avoid that row_security_active() needs to pass checkAsUser =
 InvalidOid to check_enable_rls().

 [ Actually there seem to be a few other callers of check_enable_rls()
 that suffer the same problem. I don't really understand the reasoning
 behind that check in check_enable_rls() - if the current user has the
 BYPASSRLS attribute and row_security = OFF, shouldn't RLS be disabled
 on every table no matter how it is accessed? ]


Hmmm, I can see that now but find it surprising. Seems like an odd
interface, and probably deserves some explanation in the function
comments. Maybe Stephen or someone else can weigh in on this...


 I think it would be better if the security context check were part of
 this new function too. Right now that can't make any difference, since
 only the RI triggers set SECURITY_ROW_LEVEL_DISABLED and this function
 cannot be called in that code path, but it's possible that in the
 future other code might set SECURITY_ROW_LEVEL_DISABLED, so moving the
 check down guarantees that check_enable_rls()/row_security_active()
 always accurately return the RLS status for the table.
 
 While I was looking at it, I spotted a couple of other things to tidy
 up in existing related code:
 
 * The comment for GetUserIdAndSecContext() needed updating for the new RLS 
 bit.
 
 * Calling GetUserIdAndSecContext() and then throwing away the user_id
 returned seems ugly. There is already a code style precedent for
 checking the other bits of SecurityRestrictionContext, so I've added a
 similar bit-testing function for SECURITY_ROW_LEVEL_DISABLED which
 makes this code a bit neater.
 
 Attached is an updated patch (still needs some docs for the functions).

Thanks for that. I'll add the docs.

Joe





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm failure from overly noisy warning message

2015-07-26 Thread Andres Freund
Hi,

On 2015-07-26 10:56:05 -0400, Tom Lane wrote:
   CREATE INDEX tenk2_unique1 ON tenk2 USING btree(unique1 int4_ops);
 + WARNING:  could not send signal to process 30123: No such process

 What's evidently happened here is that our session tried to boot an
 autovacuum process off a table lock, only that process was gone by the
 time we issued the kill() call.  No problem really ... but aside from
 causing buildfarm noise, I could see somebody getting really panicky
 if this message appeared on a production server.

Oops.

 I'm inclined to reduce the WARNING to LOG

Hm, that doesn't seem like a very nice solution, given that LOG is even
more likely to end up in the server log.

 and/or skip it altogether if the error is ESRCH.

Combined with a comment why we're ignoring that case that'd work for me.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Grouping Sets: Fix unrecognized node type bug

2015-07-26 Thread Andres Freund
On 2015-07-17 11:37:26 +0530, Jeevan Chalke wrote:
 However I wonder why we are supporting GROUPING SETS inside GROUPING SETS.
 On Oracle, it is throwing an error.
 We are not trying to be Oracle compatible, but just curious to know.

The SQL specification seems to be pretty unambigous about supporting
nested grouping set specifications. Check 7.9 group by
clause: grouping set (nested inside a grouping sets specification)
can contain grouping sets specification.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2015-07-26 Thread Fabien COELHO


Hello Heikki,

As soon as we add more functions, the way they are documented needs to be 
reworked too; we'll need to add a table in the manual to list them.


Here is a v8 with abs, min, max, random, gaussian et
exponential.

[...]

There is no real doc, WIP...


Here is a v9 with a doc. The implicit typing of expressions is improved.

I also added two debug functions which allow to show intermediate integer 
(idebug) or double (ddebug).


  \set i idebug(random(1, 10))

will print the random value and assign it to i.


I updated the defaut scripts, which seems to speed up meta command 
evaluations. The initial version does less than 2 million evals per 
second:


  sh cat before.sql
  \set naccounts 10 * :scale
  \setrandom aid 1 :naccounts

  sh ./pgbench -T 3 -P 1 -f before.sql
  [...]
  tps = 1899004.793098 (excluding connections establishing)


The updated version does more than 3 million evals per second:

  sh cat after.sql
  \set aid random(1, 10 * :scale)

  sh ./pgbench -T 3 -P 1 -f after.sql
  [...]
  tps = 3143168.813690 (excluding connections establishing)


Suggestion:

A possibility would be to remove altogether the \setrandom stuff as the 
functionality is available through \set, maybe with an error message to 
advise about using \set with one of the random functions. That would 
remove about 200 ugly locs...


Another option would be to translate the setrandom stuff into a \set 
expression, that would maybe save 100 locs for the eval, but keep and 
expand a little the manual parser part.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 2517a3a..73b7caf 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -758,17 +758,20 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
   Sets variable replaceablevarname/ to an integer value calculated
   from replaceableexpression/.
   The expression may contain integer constants such as literal5432/,
-  references to variables literal:/replaceablevariablename/,
+  double constants such as literal3.14156/,
+  references to integer variables literal:/replaceablevariablename/,
   and expressions composed of unary (literal-/) or binary operators
   (literal+/, literal-/, literal*/, literal//, literal%/)
-  with their usual associativity, and parentheses.
+  with their usual associativity, integer function calls and parentheses.
+  xref linkend=functions-pgbench-func-table shows the available
+  functions.
  /para
 
  para
   Examples:
 programlisting
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 /programlisting/para
 /listitem
/varlistentry
@@ -918,18 +921,110 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
/varlistentry
   /variablelist
 
+   !-- list pgbench functions in alphabetical order --
+   table id=functions-pgbench-func-table
+titlePgBench Functions/title
+tgroup cols=5
+ thead
+  row
+   entryFunction/entry
+   entryReturn Type/entry
+   entryDescription/entry
+   entryExample/entry
+   entryResult/entry
+  /row
+ /thead
+ tbody
+  row
+   entryliteralfunctionabs(replaceablea/)///
+   entrysame as replaceablea//
+   entryinteger or double absolute value/
+   entryliteralabs(-17)//
+   entryliteral17//
+  /row
+  row
+   entryliteralfunctionddebug(replaceablex/)///
+   entrydouble/
+   entrystderr print for debug and return argument/
+   entryliteralddebug(5432.1)//
+   entryliteral5432.1//
+  /row
+  row
+   entryliteralfunctiondouble(replaceablei/)///
+   entrydouble/
+   entryevaluate as int and cast to double/
+   entryliteraldouble(5432)//
+   entryliteral5432.0//
+  /row
+  row
+   entryliteralfunctionexporand(replaceablei/, replaceablej/, replaceablet/)///
+   entryinteger/
+   entryexponentially distributed random integer in the bounds, see below/
+   entryliteralexporand(1, 10, 3.0)//
+   entryint between literal1/ and literal10//
+  /row
+  row
+   entryliteralfunctionidebug(replaceablei/)///
+   entryinteger/
+   entrystderr print for debug and return argument/
+   entryliteralidebug(5432)//
+   entryliteral5432//
+  /row
+  row
+   entryliteralfunctionint(replaceablex/)///
+   entryinteger/
+   entryevaluate as double and cast to int/
+   entryliteralint(5.4 + 3.8)//
+   entryliteral9//
+  /row
+  row
+   entryliteralfunctiongaussrand(replaceablei/, replaceablej/, replaceablet/)///
+   entryinteger/
+   entrygaussian distributed random integer in the bounds, see below/
+   entryliteralgaussrand(1, 10, 2.5)//
+   entryint between literal1/ and literal10//
+  /row
+  row
+   

Re: [HACKERS] Gsets: ROW expression semantic broken between 9.4 and 9.5

2015-07-26 Thread Andrew Gierth
 Jeevan == Jeevan Chalke jeevan.cha...@enterprisedb.com writes:

 Jeevan Hi
 Jeevan It looks like we have broken the ROW expression without
 Jeevan explicit ROW keyword in GROUP BY.

Andres has given the short version, but here's the long version:

In the spec, GROUP BY ROW(a,b) is an error, while GROUP BY (a,b) is
exactly equivalent to GROUP BY a,b.

Previously, pg treated GROUP BY (a,b) as if it were GROUP BY ROW(a,b)
since it was parsing it as an expression, and (a,b) in an expression is
shorthand for ROW(a,b).  However, the parens are significant in many
contexts in the grouping set syntax, e.g. ROLLUP(a,(b,c)) is equivalent
to GROUPING SETS ((a,b,c), (a), ()), and we have to be able to parse
both GROUPING SETS (a,b) (which is two grouping sets) and GROUPING SETS
((a,b),(c,d)), which means that we can't actually use the grammar to
distinguish expressions from parenthesized sublists.  What the code
therefore does is to explicitly distinguish (a,b) and ROW(a,b), and
treat the first as a list and the second as a single expression.

This is documented in the following NOTE in queries.sgml:

  note
   para
The construct literal(a,b)/ is normally recognized in expressions as
a link linkend=sql-syntax-row-constructorsrow constructor/link.
Within the literalGROUP BY/ clause, this does not apply at the top
levels of expressions, and literal(a,b)/ is parsed as a list of
expressions as described above.  If for some reason you emphasisneed/
a row constructor in a grouping expression, use literalROW(a,b)/.
   /para
  /note

http://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-GROUPING-SETS

Andres has suggested that this should be mentioned in an incompatibility
note in the release notes. I'm not sure that's needed, since I don't
believe there are any cases where previously valid queries change in
behavior; a query such as

select (a,b) from (values (1,2)) v(a,b) group by (a,b);

previously evaluated the row constructor before grouping, while now it
groups by a and b separately and evaluates the row constructor
afterwards.  If there's a way to make this change affect the result,
I've not found it yet, even when using volatile functions.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-26 Thread Heikki Linnakangas

On 07/09/2015 12:44 PM, David Rowley wrote:

On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote:



This basically allows an aggregate's state to be shared between other
aggregate functions when both aggregate's transition functions (and a few
other things) match
There's quite a number of aggregates in our standard set which will
benefit from this optimisation.


After compiling the original patch with another compiler, I noticed a
couple of warnings.

The attached fixes these.


I spent some time reviewing this. I refactored the ExecInitAgg code 
rather heavily to make it more readable (IMHO); see attached. What do 
you think? Did I break anything?


Some comments:

* In aggref_has_compatible_states(), you give up if aggtype or aggcollid 
differ. But those properties apply to the final function, so you were 
leaving some money on the table by disallowing state-sharing if they differ.


* The filter and input expressions are initialized for every AggRef, 
before the deduplication logic kicks in. The AggrefExprState.aggfilter, 
aggdirectargs and args fields really belong to the AggStatePerAggState 
struct instead. This is not a new issue, but now that we have a 
convenient per-aggstate struct to put them in, let's do so.


* There was a reference-after free bug in aggref_has_compatible_states; 
you cannot ReleaseSysCache and then continue pointing to the struct.


* The code was a bit fuzzy on which parts of the per-aggstate are filled 
in at what time. Some of the fields were overwritten every time a match 
was found. With the same values, so no harm done, but I found it 
confusing. I refactored ExecInitAgg in the attached patch to clear that up.


* There API of build_aggregate_fnexprs() was a bit strange now that some 
callers use it to only fill in the final function, some call it to fill 
both the transition functions and the final function. I split it to two 
separate functions.


* I wonder if we should do this duplicate elimination at plan time. It's 
very fast, so I'm not worried about that right now, but you had grand 
plans to expand this so that an aggregate could optionally use one of 
many different kinds of state values. At that point, it certainly seems 
like a planning decision to decide which aggregates share state. I think 
we can leave it as it is for now, but it's something to perhaps consider 
later.



BTW, the name of the AggStatePerAggStateData struct is pretty horrible. 
The repeated AggState feels awkward. Now that I've stared at the patch 
for a some time, it doesn't bother me anymore, but it took me quite a 
while to over that. I'm sure it will for others too. And it's not just 
that struct, the comments talk about aggregate state, which could be 
confused to mean AggState, but it actually means 
AggStatePerAggStateData. I don't have any great suggestions, but can you 
come up a better naming scheme?


- Heikki

diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 0f911f2..fd922bd 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4485,35 +4485,15 @@ ExecInitExpr(Expr *node, PlanState *parent)
 			break;
 		case T_Aggref:
 			{
-Aggref	   *aggref = (Aggref *) node;
 AggrefExprState *astate = makeNode(AggrefExprState);
 
 astate-xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref;
 if (parent  IsA(parent, AggState))
 {
 	AggState   *aggstate = (AggState *) parent;
-	int			naggs;
 
 	aggstate-aggs = lcons(astate, aggstate-aggs);
-	naggs = ++aggstate-numaggs;
-
-	astate-aggdirectargs = (List *) ExecInitExpr((Expr *) aggref-aggdirectargs,
-  parent);
-	astate-args = (List *) ExecInitExpr((Expr *) aggref-args,
-		 parent);
-	astate-aggfilter = ExecInitExpr(aggref-aggfilter,
-	 parent);
-
-	/*
-	 * Complain if the aggregate's arguments contain any
-	 * aggregates; nested agg functions are semantically
-	 * nonsensical.  (This should have been caught earlier,
-	 * but we defend against it here anyway.)
-	 */
-	if (naggs != aggstate-numaggs)
-		ereport(ERROR,
-(errcode(ERRCODE_GROUPING_ERROR),
-		errmsg(aggregate function calls cannot be nested)));
+	aggstate-numaggs++;
 }
 else
 {
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 2bf48c5..fcc3859 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -152,17 +152,28 @@
 
 
 /*
- * AggStatePerAggData - per-aggregate working state for the Agg scan
+ * AggStatePerAggStateData - per aggregate state data for the Agg scan
+ *
+ * Working state for calculating the aggregate state, using the state
+ * transition function. This struct does not store the information needed
+ * to produce the final aggregate result from the state value; that's stored
+ * in AggStatePerAggData instead. This separation allows multiple aggregate
+ * results to be 

Re: [HACKERS] Combining Aggregates

2015-07-26 Thread Heikki Linnakangas

On 04/01/2015 06:28 PM, Robert Haas wrote:

On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier
michael.paqu...@gmail.com wrote:

I've been thinking of bumping this patch to the June commitfest as the
patch only exists to provide the basic infrastructure for things like
parallel aggregation, aggregate before join, and perhaps auto updating
materialised views.

It seems unlikely that any of those things will happen for 9.5.


Yeah, I guess so...


Does anybody object to me moving this to June's commitfest?


Not from my side FWIW. I think it actually makes sense.


+1.  I'd like to devote some time to looking at this, but I don't have
the time right now.  The chances that we can do something useful with
it in 9.6 seem good.


And the June commitfest is now in progress.

This patch seems sane to me, as far as it goes. However, there's no 
planner or executor code to use the aggregate combining for anything. 
I'm not a big fan of dead code, I'd really like to see something to use 
this.


The main use case people have been talking about is parallel query, but 
is there some other case this would be useful right now, without the 
parallel query feature? You and Simon talked about this case:



2. Queries such as:

SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id
= p.product_id GROUP BY p.name;

Such a query could be transformed into:

SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
GROUP BY product_id) s
INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;

Of course the outer query's SUM and GROUP BY would not be required if there
happened to be a UNIQUE index on product(name), but assuming there's not
then the above should produce the results faster. This of course works ok
for SUM(), but for something like AVG() or STDDEV() the combine/merge
aggregate functions would be required to process those intermediate
aggregate results that were produced by the sub-query.


Any chance you could implement that in the planner?

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-26 Thread Andreas Seltenreich
Hi,

when running my random query generator contraption[1] against the
regression database of 9.5 or master, it occasionally triggers one of
the following three assertions.  Someone more knowledgeable might want
to take a look at them...

-- FailedAssertion(!(outer_rel-rows  0), File: indxpath.c, Line: 1911)
-- sample query:
select
  rel1925354.loid as c0,
  rel1925353.version as c1
from
  (select
rel1925352.aa as c0,
rel1925352.aa as c1
  from
public.b as rel1925352
  where (rel1925352.bb is NULL)
and (rel1925352.bb  rel1925352.bb)) as subq_303136
  inner join pg_catalog.pg_stat_ssl as rel1925353
  on (subq_303136.c0 = rel1925353.pid )
right join pg_catalog.pg_largeobject as rel1925354
on (subq_303136.c0 = rel1925354.pageno )
where (rel1925353.clientdn !~ rel1925353.clientdn)
  and (rel1925353.cipher = rel1925353.clientdn);

,[ git bisect ]
|   first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve
|   predtest.c's ability to reason about operator expressions.
`

-- FailedAssertion(!(!bms_is_empty(phinfo-ph_eval_at)), File: 
placeholder.c, Line: 109)
-- sample query:
select
  rel1600276.viewowner as c0,
  rel1600274.maxwritten_clean as c1,
  rel1600275.n_tup_hot_upd as c2
from
  pg_catalog.pg_stat_bgwriter as rel1600274
  inner join pg_catalog.pg_stat_xact_all_tables as rel1600275
  on (rel1600274.maxwritten_clean = rel1600275.seq_scan )
right join pg_catalog.pg_views as rel1600276
  right join pg_catalog.pg_operator as rel1600277
  on (rel1600276.viewname = rel1600277.oprname )
on (rel1600275.relname = rel1600277.oprname )
where 3 is not NULL;

,[ git bisect ]
|   first bad commit: [f4abd0241de20d5d6a79b84992b9e88603d44134] Support
|   flattening of empty-FROM subqueries and one-row VALUES tables.
`

-- FailedAssertion(!(key-sk_flags  0x0080), File: brin_minmax.c, Line: 
177)
-- sample query:
select
  rel167978.namecol as c0
from
  information_schema.parameters as rel167972
left join public.student as rel167977
  inner join public.brintest as rel167978
  on (rel167977.age = rel167978.int4col )
on (rel167972.interval_precision = rel167977.age )
where rel167977.name  rel167977.name;

regards,
andreas

Footnotes: 
[1]  https://github.com/anse1/sqlsmith


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSets: Fix bug involving GROUPING and HAVING together

2015-07-26 Thread Andres Freund
On 2015-07-24 11:34:22 +0100, Andrew Gierth wrote:
  Andrew == Andrew Gierth and...@tao11.riddles.org.uk writes:
 
  Andrew The other is that in subquery_planner, the optimization of
  Andrew converting HAVING clauses to WHERE clauses is suppressed if
  Andrew parse-groupingSets isn't empty. (It is empty if there's either
  Andrew no group by clause at all, or if there's exactly one grouping
  Andrew set, which must not be empty, however derived.) This is costing
  Andrew us some optimizations, especially in the case of an explicit
  Andrew GROUP BY () clause; I'll look into this.
 
 I'm inclined to go with the simplest approach here, which copies the
 quals if there are grouping sets. The only way we could safely move a
 qual without copying is if we can show that none of the grouping sets is
 empty, that is (), and at this point the grouping set list has not been
 expanded so it's not trivial to determine that.

I pushed this to both master and 9.5. While this isn't strictly a
bugfix, it seemed better to keep the branches the same at this point.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CustomScan and readfuncs.c

2015-07-26 Thread Tom Lane
Kouhei Kaigai kai...@ak.jp.nec.com writes:
 Under the investigation of ParallelAppend, I noticed here is a few
 problems in CustomScan, that prevents to reproduce an equivalent
 plan node on the background worker from serialized string.

 1. CustomScanMethods-TextOutCustomScan callback
 
 This callback allows to output custom information on nodeToString.
 Originally, we intend to use this callback for debug only, because
 CustomScan must be copyObject() safe, thus, all the private data
 also must be stored in custom_exprs or custom_private.
 However, it will lead another problem when we try to reproduce
 CustomScan node from the string form generated by outfuncs.c.
 If TextOutCustomScan prints something, upcoming _readCustomScan
 has to deal with unexpected number of tokens in unexpected format.

Um ... wait a second.  There is no support in readfuncs for any
plan node type, and never has been, and I seriously doubt that there
ever should be.  I do not think it makes sense to ship plans around
in the way you seem to have in mind.  (Also, I don't think the
problems you mention are exactly unique to CustomScan.  There's no
reason to assume that FDW plans could survive this treatment either,
since we do not know what's in the fdw_private stuff; certainly no
one has ever suggested that it should not contain pointers to static
data.)

 I'd like to propose to omit this callback prior to v9.5 release,
 for least compatibility issues.

I regard our commitment to cross-version compatibility for the
custom scan APIs as being essentially zero, for reasons previously
discussed.  So if this goes away in 9.6 it will not matter, but we
might as well leave it in for now for debug support.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSets: Fix bug involving GROUPING and HAVING together

2015-07-26 Thread Andres Freund
On 2015-07-14 14:51:09 +0530, Jeevan Chalke wrote:
 Fix this by adding GroupingFunc node in this walker.  We do it correctly in
 contain_aggs_of_level_walker() in which we have handling for GroupingFunc
 there.
 
 Attached patch to fix this.

Pushed, thanks for fix!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm failure from overly noisy warning message

2015-07-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 07/26/2015 11:00 AM, Andres Freund wrote:
 On 2015-07-26 10:56:05 -0400, Tom Lane wrote:
 I'm inclined to reduce the WARNING to LOG

 Hm, that doesn't seem like a very nice solution, given that LOG is even
 more likely to end up in the server log.

 and/or skip it altogether if the error is ESRCH.

 Combined with a comment why we're ignoring that case that'd work for me.

 +1 for this. if the process is gone we shouldn't really have a problem, 
 should we?

The only real reason for printing anything here is the possibility that
the shared lock table is sufficiently corrupted that we think there's
an autovacuum process blocking us when there isn't.  However, I don't
see any particular reason to think that this is more probable than any
other bad consequence of corrupted shmem, so I don't feel a need to be
in the user's face with a WARNING.  The evidence so far is that the
race condition is real, but I don't recall anyone ever reporting this
in a context where we'd suspect actual shmem corruption.

I do, however, think it might be reasonable to LOG the failure given
that we LOG'd the action.  Is nobody else on board with downgrading
both ereports to DEBUG?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anole: assorted stability problems

2015-07-26 Thread Andres Freund
On 2015-07-07 13:25:24 +0200, Andres Freund wrote:
 So, it's starting to look good. Not exactly allowing for a lot of
 confidence yet, but still:
 http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anolebr=HEAD

Since there have not been any relevant failures since, I'm going to
remove this issue from the open items list.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-26 Thread Noah Misch
On Thu, Jul 23, 2015 at 12:14:14PM -0400, Robert Haas wrote:
 On Wed, Jul 22, 2015 at 3:42 PM, Adam Brightwell 
 adam.brightw...@crunchydatasolutions.com wrote:
  I like Noah's proposal of having pg_dump --create reproduce all
  database-level state.
 
  Should it be enabled by default?  If so, then wouldn't it make more
  sense to call it --no-create and do the opposite?  So, --no-create
  would exclude rather than include database-level information?  Would
  enabling it by default cause issues with the current expected use of
  the tool by end users?
 
 This seems a bit hairy to me.  If we want to transfer responsibility
 for dumping this stuff from pg_dumpall to pg_dump, I have no problem
 with that at all.  But doing it only when --create is specified seems
 odd.  Then, does pg_dumpall -g dump it or not?

The principle I had in mind was to dump ACLs, pg_db_role_setting entries,
comments and security labels if and only if we emit a CREATE statement for the
object they modify.  That is already the rule for objects located inside
databases.  Since pg_dumpall -g does not emit CREATE DATABASE statements[1],
it would not dump these attributes of databases.

 If it does, then we're
 sorta dumping it in two places when --create is used.  If it doesn't,
 then when --create is not used we're doing it nowhere.

Yep.  Plain pg_dump dumps the contents of a database without dumping the
database itself.  I don't like that as a default, but we're stuck with it.


[1] pg_dumpall -g --binary-upgrade _does_ emit CREATE DATABASE statements,
so _it_ would dump these attributes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] False comment about speculative insertion

2015-07-26 Thread Peter Geoghegan
Attached patch removes a reference to an executor README section about
speculative insertion. In fact, the high-level overview of speculative
insertion ended up at the top of execIndexing.c. The executor README
was not touched by the ON CONFLICT patch at all.

I don't think it's necessary to refer to execIndexing.c within
ExecInsert instead. All the routines being called from ExecInsert()
that relate to speculative insertion are in execIndexing.c anyway.

-- 
Peter Geoghegan
From 5ea69e5f98a4eeb4c9f6ffc8f161e3e16f0cda86 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan peter.geoghega...@gmail.com
Date: Sun, 26 Jul 2015 12:28:39 -0700
Subject: [PATCH] Remove false comment about speculative insertion

There never was an executor README section that discussed speculative
insertion in the original ON CONFLICT commit.
---
 src/backend/executor/nodeModifyTable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 874ca6a..1ef76d0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -351,8 +351,7 @@ ExecInsert(ModifyTableState *mtstate,
 			 *
 			 * We loop back here if we find a conflict below, either during
 			 * the pre-check, or when we re-check after inserting the tuple
-			 * speculatively.  See the executor README for a full discussion
-			 * of speculative insertion.
+			 * speculatively.
 			 */
 	vlock:
 			specConflict = false;
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-26 Thread Craig Ringer
On 20 July 2015 at 01:18, Noah Misch n...@leadboat.com wrote:
 On Wed, Jul 15, 2015 at 11:08:53AM +0200, Andres Freund wrote:
 On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote:
  Andres Freund wrote:
   One thing worth mentioning is that arguably the problem is caused by the
   fact that we're here emitting database level information in pg_dump,
   normally only done for dumpall.

 Consistency with existing practice would indeed have pg_dump ignore
 pg_shseclabel and have pg_dumpall reproduce its entries.

Existing practice is pretty broken though, and not necessarily a good guide.

COMMENT ON DATABASE and SECURITY LABEL FOR DATABASE are dumped by
pg_dump, but always refer to the database's name at the time it was
dumped, so restoring it can break.

GRANTs on databases are ignored and not dumped by pg_dump or by
pg_dumpall --globals-only. The only way to dump them seems to be to
use pg_dumpall, which nobody uses in the real world.

I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT
ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then
dumping them in pg_dump --create, and in pg_dump -Fc .

In practice I see zero real use of pg_dumpall without --globals-only,
and almost everyone does pg_dump -Fc . I'd like to see that method
case actually preserve the whole state of the system and do the right
thing sensibly.

A pg_restore option to skip database-level settings could be useful,
but I think by default they should be restored.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-26 Thread Amit Kapila
On Sat, Jul 25, 2015 at 10:30 PM, Ildus Kurbangaliev 
i.kurbangal...@postgrespro.ru wrote:


  On Jul 24, 2015, at 10:02 PM, Robert Haas robertmh...@gmail.com wrote:
 
  Also, the patch should not invent a new array similar but not quite
  identical to LockTagTypeNames[].
 
  This is goofy:
 
  +   if (tranche_id  0)
  +   result-tranche = tranche_id;
  +   else
  +   result-tranche = LW_USERDEFINED;
 
  If we're going to make everybody pass a tranche ID when they call
  LWLockAssign(), then they should have to do that always, not sometimes
  pass a tranche ID and otherwise pass 0, which is actually a valid
  tranche ID but to which you've given a weird special meaning here.
  I'd also name the constants differently, like LWLOCK_TRANCHE_XXX or
  something.  LW_ is a somewhat ambiguous prefix.
 
  The idea of tranches is really that each tranche is an array of items
  each of which contains 1 or more lwlocks.  Here you are intermingling
  different tranches.  I guess that won't really break anything but it
  seems ugly.

 Maybe it will be better to split LWLockAssign to two functions then, keep
 name
 LWLockAssign for user defined locks and other function with tranche_id.

  On Jul 25, 2015, at 1:54 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 
  That anyway he has to do it either you go by defining individual arrays
  or having unified WaitEventType enum for individual arrays he has to
  find out that array.  Another thing is with that you can just encapsulate
  this information in one byte in structure PgBackendStatus, rather than
  using more number of bytes (4 bytes) and I think the function for
 reporting
  Waitevent will be much more simplified.

 In my way there are no special meaning for names. Array with names
 located in lwlock.c and lock.c, and can be used for other things (for
 example
 tracing). One byte sounds good only for this case.


Do you mean to say that you need more than 256 events? I am not sure
if we can add that many events without adding performance penalty
in some path.

The original idea was proposed for one-byte and the patch was written
considering the same, now you are planning to extend (which is okay), but
modifying it without any prior consent is what slightly a matter of concern.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Combining Aggregates

2015-07-26 Thread David Rowley
On 27 July 2015 at 04:58, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 04/01/2015 06:28 PM, Robert Haas wrote:

 On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 I've been thinking of bumping this patch to the June commitfest as the
 patch only exists to provide the basic infrastructure for things like
 parallel aggregation, aggregate before join, and perhaps auto updating
 materialised views.

 It seems unlikely that any of those things will happen for 9.5.


 Yeah, I guess so...

  Does anybody object to me moving this to June's commitfest?


 Not from my side FWIW. I think it actually makes sense.


 +1.  I'd like to devote some time to looking at this, but I don't have
 the time right now.  The chances that we can do something useful with
 it in 9.6 seem good.


 And the June commitfest is now in progress.

 This patch seems sane to me, as far as it goes. However, there's no
 planner or executor code to use the aggregate combining for anything. I'm
 not a big fan of dead code, I'd really like to see something to use this.


Thanks for looking. I partially agree with that, it is a little weird to
put in code that's yet to be used. I'd certainly agree 95% if this was the
final commitfest of 9.5, but we're in the first commitfest of 9.6 and I
think there's a very high probability of this getting used in 9.6, and
likely that probability would be even higher if the code is already in.
Perhaps it's a little bit in the same situation as to Robert's parallel
worker stuff?




 The main use case people have been talking about is parallel query, but is
 there some other case this would be useful right now, without the parallel
 query feature? You and Simon talked about this case:

  2. Queries such as:

 SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON
 s.product_id
 = p.product_id GROUP BY p.name;

 Such a query could be transformed into:

 SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
 GROUP BY product_id) s
 INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;

 Of course the outer query's SUM and GROUP BY would not be required if
 there
 happened to be a UNIQUE index on product(name), but assuming there's not
 then the above should produce the results faster. This of course works ok
 for SUM(), but for something like AVG() or STDDEV() the combine/merge
 aggregate functions would be required to process those intermediate
 aggregate results that were produced by the sub-query.


 Any chance you could implement that in the planner?


Yes! I'm actually working on it now and so far have it partially working.
Quite likely I'll be able to submit for CF2. There's still some costing
tweaks to do. So far it just works for GROUP BY with no aggs. I plan to fix
that later using this patch.

I don't want to talk too much about it on this thread, but in a test query
which is the one in my example, minus the SUM(qty), with 1 million sales
records, and 100 products, performance goes from 350ms to 200ms on my
machine, so looking good so far.

Regards

David Rowley
--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-26 Thread Kyotaro HORIGUCHI
Hello,

  Attatched is the revised version of this patch.
 
  The first patch is not changed from before.
 
  The second is fixed a kind of bug.
 
  Ths third is the new one to allow backslash continuation for
  backslash commands.
 
 Ah, thanks:-)
 
 Would you consider adding the patch to the next commitfest? I may put
 myself as a reviewer...

No problem.

  [...] I'm not satisfied by the design but I don't see another way..
 
 I'll try to have a look.

Thanks.

  I don't have idea how to deal with the copy of psqlscan.[lh] from
  psql. Currently they are simply the dead copies of those of psql.
 
  I think that there should be no copies, but it should use relative
  symbolic links so that the files are kept synchronized.
 
  Yeah, I think so but symlinks could harm on git and Windows.
  The another way would be make copies it from psql directory. They live
  next door to each other.
 
 Indeed there are plenty of links already which are generated by
 makefiles (see src/bin/pg_xlogdump/*), and probably a copy is made on
 windows. There should no file duplication within the source tree.

Thank you for pointing out that. Makefile of pg_xlogdump
re-creates symlinks to those files and they are replaced with cp
for the platforms where symlinks are not usable. But the files
are are explicitly added to .sln file on msvc. Anyway I'll
address it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-26 Thread Tom Lane
Andreas Seltenreich seltenre...@gmx.de writes:
 when running my random query generator contraption[1] against the
 regression database of 9.5 or master, it occasionally triggers one of
 the following three assertions.

I've fixed the first two of these --- thanks for the report!

 ,[ git bisect ]
 |   first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve
 |   predtest.c's ability to reason about operator expressions.
 `

I'm a bit confused about this aspect of your report though, because in
my hands that example fails clear back to 9.2.  It doesn't seem to require
the predtest.c improvement to expose the fault.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-26 Thread Peter Geoghegan
On Sun, Jul 26, 2015 at 7:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andreas Seltenreich seltenre...@gmx.de writes:
 when running my random query generator contraption[1] against the
 regression database of 9.5 or master, it occasionally triggers one of
 the following three assertions.

 Very very cool tool!  Please keep doing that testing.

The SQLite people have been using a tool like this for some time.
They've also had luck finding bugs with a generic fuzz-testing tool
called american fuzzy lop (yes, seriously, that's what it's called),
which apparently is the state of the art.

I myself ran that tool against Postgres. I didn't spend enough time to
tweak it in a way that might have been effective. I also didn't figure
out a way to make iterations fast enough for the tool to be effective,
because I was invoking Postgres in single-user mode. I might pick it
up again in the future, but probably for a more targeted case.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-07-26 Thread Kouhei Kaigai
 On 04/01/2015 06:28 PM, Robert Haas wrote:
  On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier
  michael.paqu...@gmail.com wrote:
  I've been thinking of bumping this patch to the June commitfest as the
  patch only exists to provide the basic infrastructure for things like
  parallel aggregation, aggregate before join, and perhaps auto updating
  materialised views.
 
  It seems unlikely that any of those things will happen for 9.5.
 
  Yeah, I guess so...
 
  Does anybody object to me moving this to June's commitfest?
 
  Not from my side FWIW. I think it actually makes sense.
 
  +1.  I'd like to devote some time to looking at this, but I don't have
  the time right now.  The chances that we can do something useful with
  it in 9.6 seem good.
 
 And the June commitfest is now in progress.
 
 This patch seems sane to me, as far as it goes. However, there's no
 planner or executor code to use the aggregate combining for anything.
 I'm not a big fan of dead code, I'd really like to see something to use
 this.

+1, this patch itself looks good for me, but...

 The main use case people have been talking about is parallel query, but
 is there some other case this would be useful right now, without the
 parallel query feature? You and Simon talked about this case:

  2. Queries such as:
 
  SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id
  = p.product_id GROUP BY p.name;
 
  Such a query could be transformed into:
 
  SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
  GROUP BY product_id) s
  INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;
 
  Of course the outer query's SUM and GROUP BY would not be required if there
  happened to be a UNIQUE index on product(name), but assuming there's not
  then the above should produce the results faster. This of course works ok
  for SUM(), but for something like AVG() or STDDEV() the combine/merge
  aggregate functions would be required to process those intermediate
  aggregate results that were produced by the sub-query.
 
 Any chance you could implement that in the planner?

It likely needs planner enhancement prior to other applications...
http://www.postgresql.org/message-id/ca+tgmobgwkhfzc09b+s2lxjtword5ht-avovdvaq4+rpwro...@mail.gmail.com

Once planner allowed to have both of normal path and partial aggregation
paths to compare according to the cost, it is the straightforward way to
do.

Here are various academic research, for example, below is the good starting
point to clarify aggregate queries that we can run with 2-phase approach.
http://www.researchgate.net/publication/2715288_Performing_Group-By_before_Join

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New functions

2015-07-26 Thread Michael Paquier
On Wed, Jul 8, 2015 at 10:18 PM, Michael Paquier
michael.paqu...@gmail.com wrote:

 On Wed, Jul 8, 2015 at 9:15 PM, Дмитрий Воронин
 carriingfat...@yandex.ru wrote:
 
 
  07.07.2015, 18:34, Michael Paquier michael.paqu...@gmail.com:
 
   Speaking of which, I have rewritten the patch as attached. This looks
   way cleaner than the previous version submitted. Dmitry, does that
   look fine for you?
   I am switching this patch as Waiting on Author.
   Regards,
   --
   Michael
 
  Michael, hello. I'm looking your variant of patch. You create function 
  ssl_extensions_info(), that gives information of SSL extensions in more 
  informative view. So, it's cool.

 OK cool. I think a committer could look at it, hence switching it to
 Ready for Committer.

Note for committers: attached is a small script that will generate a
client certificate with extensions enabled. This is helpful when
testing this patch. Once created, then simply connect with something
like this connection string:
host=127.0.0.1 sslmode=verify-full sslcert=client.crt
sslkey=client.key sslrootcert=server.crt
By querying the new function implemented by this patch the information
about the client certificate extensions will show up.
-- 
Michael


ssl_key_generate
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spgist recovery assertion failure

2015-07-26 Thread Michael Paquier
On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch n...@leadboat.com wrote:
 When I caused a crash during the create_index regression test, recovery hit an
 assertion failure.  Minimal test case:

 psql -X EOSQL
 CREATE TABLE t (c text);
 INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000);
 INSERT INTO t VALUES ('P0123456789abcdefF');
 CREATE INDEX ON t USING spgist (c);
 EOSQL
 pg_ctl -m immediate -w restart

On which platform are you seeing the failure? I am afraid I could not
reproduce the failure on Linux and OSX after testing it on HEAD.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-07-26 Thread David Rowley
On 27 July 2015 at 12:14, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  The main use case people have been talking about is parallel query, but
  is there some other case this would be useful right now, without the
  parallel query feature? You and Simon talked about this case:
 
   2. Queries such as:
  
   SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON
 s.product_id
   = p.product_id GROUP BY p.name;
  
   Such a query could be transformed into:
  
   SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM
 sales
   GROUP BY product_id) s
   INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;
  
   Of course the outer query's SUM and GROUP BY would not be required if
 there
   happened to be a UNIQUE index on product(name), but assuming there's
 not
   then the above should produce the results faster. This of course works
 ok
   for SUM(), but for something like AVG() or STDDEV() the combine/merge
   aggregate functions would be required to process those intermediate
   aggregate results that were produced by the sub-query.
 
  Any chance you could implement that in the planner?
 

It likely needs planner enhancement prior to other applications...

 http://www.postgresql.org/message-id/ca+tgmobgwkhfzc09b+s2lxjtword5ht-avovdvaq4+rpwro...@mail.gmail.com


I had thought this too, but I'm not sure that's 100% correct. As I just
said in the my previous email on this thread, I am now working on group by
before join. I had started it with the intentions of path-ifying the
grouping planner, but I soon realised that the grouping_planner() does
quite a bit more at that final stage that I propose to do to allow group
by before join. This is mostly around handling of DISTINCT, HAVING and
LIMIT. I don't think those need to be handled in my patch, perhaps with the
exception of DISTINCT without GROUP BY, but not when both are present.

Instead I've started by inventing GroupingPath and I'm now building these
new path types once there's enough tables joined for all Vars of the GROUP
BY and agg parameters.

I believe this optimisation needs to be costed as pushing the GROUP BY down
to happen as early as possible is *not* always a win. Perhaps the JOIN is
very selective and eliminates many groups. Hence I've added costing and
allowed the planner to choose what it thinks is cheapest.


 Once planner allowed to have both of normal path and partial aggregation
 paths to compare according to the cost, it is the straightforward way to
 do.


I've ended up with 2 boolean members to struct GroupingPath, combine_states
and finalize_aggs. I plan to modify nodeAgg.c to use these, and EXPLAIN to
give a better description of what its doing.



 Here are various academic research, for example, below is the good starting
 point to clarify aggregate queries that we can run with 2-phase approach.

 http://www.researchgate.net/publication/2715288_Performing_Group-By_before_Join


Thanks, I've been using that very paper.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


[HACKERS] Documentation tweak for row-valued expressions and null

2015-07-26 Thread Thomas Munro
Hi

I wonder if it might be worth adding a tiny note to the manual to
point out that the special logic for row-valued-expression IS [ NOT
] NULL doesn't apply anywhere else that we handle nulls or talk about
[non]-null values in the manual.  See attached.

-- 
Thomas Munro
http://www.enterprisedb.com


note-about-row-is-null.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] spgist recovery assertion failure

2015-07-26 Thread Noah Misch
When I caused a crash during the create_index regression test, recovery hit an
assertion failure.  Minimal test case:

psql -X EOSQL
CREATE TABLE t (c text);
INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000);
INSERT INTO t VALUES ('P0123456789abcdefF');
CREATE INDEX ON t USING spgist (c);
EOSQL
pg_ctl -m immediate -w restart

The log ends with:

29294 2015-07-27 04:41:43.330 GMT LOG:  database system was not properly shut 
down; automatic recovery in progress
29294 2015-07-27 04:41:43.330 GMT LOG:  redo starts at 0/17A4070
TRAP: FailedAssertion(!(xldata-parentBlk == -1), File: spgxlog.c, Line: 
338)
29292 2015-07-27 04:41:43.338 GMT LOG:  startup process (PID 29294) was 
terminated by signal 6: Aborted
29292 2015-07-27 04:41:43.338 GMT LOG:  aborting startup due to startup process 
failure

REL9_4_STABLE is unaffected.  I suspect, but have not checked, that a standby
replaying WAL from a cluster running make installcheck would observe the
same problem.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CustomScan and readfuncs.c

2015-07-26 Thread Kouhei Kaigai
 Kouhei Kaigai kai...@ak.jp.nec.com writes:
  Under the investigation of ParallelAppend, I noticed here is a few
  problems in CustomScan, that prevents to reproduce an equivalent
  plan node on the background worker from serialized string.
 
  1. CustomScanMethods-TextOutCustomScan callback
  
  This callback allows to output custom information on nodeToString.
  Originally, we intend to use this callback for debug only, because
  CustomScan must be copyObject() safe, thus, all the private data
  also must be stored in custom_exprs or custom_private.
  However, it will lead another problem when we try to reproduce
  CustomScan node from the string form generated by outfuncs.c.
  If TextOutCustomScan prints something, upcoming _readCustomScan
  has to deal with unexpected number of tokens in unexpected format.
 
 Um ... wait a second.  There is no support in readfuncs for any
 plan node type, and never has been, and I seriously doubt that there
 ever should be.  I do not think it makes sense to ship plans around
 in the way you seem to have in mind.  (Also, I don't think the
 problems you mention are exactly unique to CustomScan.  There's no
 reason to assume that FDW plans could survive this treatment either,
 since we do not know what's in the fdw_private stuff; certainly no
 one has ever suggested that it should not contain pointers to static
 data.)

Yep, no Plan node types are supported at this moment, however, will
appear soon by the Funnel + PartialSeqScan nodes.
It serializes a partial plan subtree using nodeToString() then gives
the flatten PlannedStmt to background workers.
I'm now investigating to apply same structure to Append not to kick
child nodes in parallel.
Once various plan node types appear in readfuncs.c, we have to care
about this problem, don't it? I'm working for the patch submission
of ParallelAppend on the next commit-fest, so like to make a consensus
how to treat this matter.

  I'd like to propose to omit this callback prior to v9.5 release,
  for least compatibility issues.
 
 I regard our commitment to cross-version compatibility for the
 custom scan APIs as being essentially zero, for reasons previously
 discussed.  So if this goes away in 9.6 it will not matter, but we
 might as well leave it in for now for debug support.

I don't argue this point strongly. If TextOutCustomScan shall be
obsoleted on v9.6, it is just kindness for developers not to use
this callback.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-26 Thread Michael Paquier
On Sun, Jul 26, 2015 at 10:32 PM, Andreas Seltenreich seltenre...@gmx.de
wrote:

 Michael Paquier writes:

  Footnotes:
  [1]  https://github.com/anse1/sqlsmith
 
  This is really interesting stuff. I think that it would be possible to
  extract self-contained test cases from your tool and those queries to
  reproduce the failures. It is written that this tools connects to a
  database to retrieve the schema, what is it exactly in the case of
  those failures?

 I used the database regression that pg_regress leaves behind when you
 remove the --temp-install from it's default invocation through make
 check.  Sorry about not being explicit about that.

 So, dropping one of the queries into src/test/regress/sql/smith.sql and
 invoking

 make check EXTRA_TESTS=smith

 was all that was needed to integrate them.  I was then able to perform
 git bisect run on this command.  Er, plus consing the expected output
 file.

 I'm using the regression db a lot when hacking on sqlsmith, as it
 contains much more nasty things than your average database.


Ah, OK. Thanks. The code is licensed as GPL, has a dependency on libpqxx
and is written in C++, so it cannot be integrated into core as a test
module in this state, but I think that it would be definitely worth having
something like that in the code tree that runs on the buildfarm. We could
have caught up those problems earlier.  Now I imagine that this is a costly
run, so we had better have a switch to control if it is run or not, like a
configure option or a flag. Thoughts?
-- 
Michael


Re: [HACKERS] raw output from copy

2015-07-26 Thread Craig Ringer
On 7 July 2015 at 14:32, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 previous patch was broken, and buggy

 Here is new version with fixed upload and more tests

I routinely see people trying to use COPY ... FORMAT binary to export
a single binary field (like an image, for example) and getting
confused by the header PostgreSQL adds. Or using text-format COPY and
struggling with the hex escaping. It's clearly something people have
trouble with.

It doesn't help that while lo_import and lo_export can read paths
outside the datadir (and refuse to read from within it),
pg_read_binary_file is superuser only and disallows absolute paths.
There's no corresponding pg_write_binary_file. So users who want to
import and export a single binary field tend to try to use COPY. We
have functionality for large objects that has no equivalent for
'bytea'.

I don't love the use of COPY for this, but it gets us support for
arbitrary clients pretty easily. Otherwise it'd be server-side only
via local filesystem access, or require special psql-specific
functionality like we have for lo_import etc.

The main point is that this is a real world thing. People want to do
it, try to do it, and have problems doing it. So it's a solution a
real issue.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-07-26 Thread Haribabu Kommi
On Thu, Jul 23, 2015 at 9:42 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Jul 22, 2015 at 9:14 PM, Robert Haas robertmh...@gmail.com wrote:

 One thing I noticed that is a bit dismaying is that we don't get a lot
 of benefit from having more workers.  Look at the 0.1 data.  At 2
 workers, if we scaled perfectly, we would be 3x faster (since the
 master can do work too), but we are actually 2.4x faster.  Each
 process is on the average 80% efficient.  That's respectable.  At 4
 workers, we would be 5x faster with perfect scaling; here we are 3.5x
 faster.   So the third and fourth worker were about 50% efficient.
 Hmm, not as good.  But then going up to 8 workers bought us basically
 nothing.


 I think the improvement also depends on how costly is the qualification,
 if it is costly, even for same selectivity the gains will be shown till
 higher
 number of clients and for simple qualifications, we will see that cost of
 having more workers will start dominating (processing data over multiple
 tuple queues) over the benefit we can achieve by them.

Yes, That's correct. when the qualification cost is increased, the performance
is also increasing with number of workers.

Instead of using all the configured workers per query, how about deciding number
of workers based on cost of the qualification? I am not sure whether we have
any information available to find out the qualification cost. This way
the workers
will be distributed to all backends properly.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spgist recovery assertion failure

2015-07-26 Thread Noah Misch
On Mon, Jul 27, 2015 at 02:19:09PM +0900, Michael Paquier wrote:
 On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch n...@leadboat.com wrote:
  When I caused a crash during the create_index regression test, recovery hit 
  an
  assertion failure.  Minimal test case:
 
  psql -X EOSQL
  CREATE TABLE t (c text);
  INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000);
  INSERT INTO t VALUES ('P0123456789abcdefF');
  CREATE INDEX ON t USING spgist (c);
  EOSQL
  pg_ctl -m immediate -w restart
 
 On which platform are you seeing the failure? I am afraid I could not
 reproduce the failure on Linux and OSX after testing it on HEAD.

I saw it on ppc64 Fedora and on {ppc32 PostgreSQL, ppc64 kernel} AIX.  Like
you, I don't see it on x86_64 Ubuntu.  Perhaps it is specific to big-endian.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-26 Thread Amit Langote
On 2015-07-16 PM 04:03, Jeff Janes wrote:
 On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 

 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the lock
 atomically with acquiring it. Before the lwlock-scalability changes, that
 was straightforward because you held the spinlock anyway, but it's a lot
 harder/expensive now. So I changed the way acquiring a lock with a variable
 works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
 the current lock holder has updated the variable. The LWLockAcquireWithVar
 function is gone - you now just use LWLockAcquire(), which always clears
 the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
 you want to set the variable immediately. LWLockWaitForVar() always waits
 if the flag is not set, i.e. it will not return regardless of the
 variable's value, if the current lock-holder has not updated it yet.


 I ran this for a while without casserts and it seems to work.  But with
 casserts, I get failures in the autovac process on the GIN index.
 
 I don't see how this is related to the LWLock issue, but I didn't see it
 without your patch.  Perhaps the system just didn't survive long enough to
 uncover it without the patch (although it shows up pretty quickly).  It
 could just be an overzealous Assert, since the casserts off didn't show
 problems.
 
 bt and bt full are shown below.
 

I got a similar assert failure but with a btree index
(pg_attribute_relid_attnum_index). The backtrace looks like Jeff's:

(gdb) bt
#0  0x003969632625 in raise () from /lib64/libc.so.6
#1  0x003969633e05 in abort () from /lib64/libc.so.6
#2  0x0092eb9e in ExceptionalCondition (conditionName=0x9c2220
!(((PageHeader) (page))-pd_special = (__builtin_offsetof
(PageHeaderData, pd_linp))),
errorType=0x9c0c41 FailedAssertion, fileName=0x9c0c10 nbtree.c,
lineNumber=903) at assert.c:54
#3  0x004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9,
orig_blkno=9) at nbtree.c:903
#4  0x004e0067 in btvacuumscan (info=0x7fff2c765cd0,
stats=0x279f7d0, callback=0x668f6d lazy_tid_reaped,
callback_state=0x279e338, cycleid=49190)
at nbtree.c:821
#5  0x004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676
#6  0x00939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0,
collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at
fmgr.c:1375
#7  0x004d2a01 in index_bulk_delete (info=0x7fff2c765cd0,
stats=0x0, callback=0x668f6d lazy_tid_reaped, callback_state=0x279e338)
at indexam.c:690
#8  0x0066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0,
stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367
#9  0x006678a8 in lazy_scan_heap (onerel=0x274ec90,
vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at
vacuumlazy.c:1098
#10 0x006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99,
params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244
#11 0x00665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0,
options=99, params=0x27bdc88) at vacuum.c:1388
#12 0x006643ce in vacuum (options=99, relation=0x7fff2c7662a0,
relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18,
isTopLevel=1 '\001')
at vacuum.c:293
#13 0x0075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80,
bstrategy=0x27bdd18) at autovacuum.c:2807
#14 0x0075c632 in do_autovacuum () at autovacuum.c:2328
#15 0x0075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1647
#16 0x0075b0a5 in StartAutoVacWorker () at autovacuum.c:1454
#17 0x0076f9cc in StartAutovacuumWorker () at postmaster.c:5261
#18 0x0076f28a in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#19 signal handler called
#20 0x0039696e1353 in __select_nocancel () from /lib64/libc.so.6
#21 0x0076ace7 in ServerLoop () at postmaster.c:1582
#22 0x0076a538 in PostmasterMain (argc=3, argv=0x26f9330) at
postmaster.c:1263
#23 0x006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-26 Thread David Rowley
On 27 July 2015 at 03:24, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/09/2015 12:44 PM, David Rowley wrote:

 On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com
 wrote:


 This basically allows an aggregate's state to be shared between other
 aggregate functions when both aggregate's transition functions (and a few
 other things) match
 There's quite a number of aggregates in our standard set which will
 benefit from this optimisation.

  After compiling the original patch with another compiler, I noticed a
 couple of warnings.

 The attached fixes these.


 I spent some time reviewing this. I refactored the ExecInitAgg code rather
 heavily to make it more readable (IMHO); see attached. What do you think?
 Did I break anything?


Thanks for taking the time to look at this and makes these fixes.

I'm just looking over your changes:

- ereport(ERROR,
- (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
- errmsg(aggregate %u needs to have compatible input type and transition
type,
- aggref-aggfnoid)));
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
+ errmsg(aggregate needs to have compatible input type and transition
type)));

I can't quite see the reason to remove the agg OID from the error message
here. It seems to be still valid to use as build_peraggstate_for_aggref()
only is called when nothing is shared.


- * agg_input_types, agg_state_type, agg_result_type identify the input,
- * transition, and result types of the aggregate.  These should all be
- * resolved to actual types (ie, none should ever be ANYELEMENT etc).
+ * agg_input_types identifies the input types of the aggregate.  These
should
+ * be resolved to actual types (ie, none should ever be ANYELEMENT etc).

I'm not sure I understand why agg_state_type and agg_result_type have
changed here.


+ peraggstate-sortstates = (Tuplesortstate **)
+ palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
+ for (currentsortno = 0; currentsortno  numGroupingSets; currentsortno++)
+ peraggstate-sortstates[currentsortno] = NULL;

This was not you, but this NULL setting looks unneeded due to the palloc0().


 Some comments:

 * In aggref_has_compatible_states(), you give up if aggtype or aggcollid
 differ. But those properties apply to the final function, so you were
 leaving some money on the table by disallowing state-sharing if they differ.


Good catch, and accurate analogy. Thanks for fixing.



 * The filter and input expressions are initialized for every AggRef,
 before the deduplication logic kicks in. The AggrefExprState.aggfilter,
 aggdirectargs and args fields really belong to the AggStatePerAggState
 struct instead. This is not a new issue, but now that we have a convenient
 per-aggstate struct to put them in, let's do so.


Good idea. I failed to notice that code over there in execQual.c so I agree
that where you've moved it to is much better.



 * There was a reference-after free bug in aggref_has_compatible_states;
 you cannot ReleaseSysCache and then continue pointing to the struct.


Thanks for fixing.

In this function I also wasn't quite sure if it was with comparing both
non-NULL INITCOND's here. I believe my code comments may slightly
contradict what the code actually does, as the comments talk about them
having to match, but the code just bails if any are non-NULL. The reason I
didn't check them was because it seems inevitable that some duplicate work
needs to be done when setting up the INITCOND. Perhaps it's worth it?


select aggfnoid || '(' || typname || ')',aggtransfn,agginitval
from pg_aggregate
inner join pg_type on aggtranstype = oid
where aggtransfn in (select aggtransfn
from pg_aggregate
group by aggtransfn
having count(*)1)
order by aggtransfn;

This indicates that everything using float4_accum as a transfn could
benefit from that. I just wasn't sure how far to go.



 * The code was a bit fuzzy on which parts of the per-aggstate are filled
 in at what time. Some of the fields were overwritten every time a match was
 found. With the same values, so no harm done, but I found it confusing. I
 refactored ExecInitAgg in the attached patch to clear that up.


 * There API of build_aggregate_fnexprs() was a bit strange now that some
 callers use it to only fill in the final function, some call it to fill
 both the transition functions and the final function. I split it to two
 separate functions.


That's much better.


 * I wonder if we should do this duplicate elimination at plan time. It's
 very fast, so I'm not worried about that right now, but you had grand plans
 to expand this so that an aggregate could optionally use one of many
 different kinds of state values. At that point, it certainly seems like a
 planning decision to decide which aggregates share state. I think we can
 leave it as it is for now, but it's something to perhaps consider later.


I don't think I'm going to get the time to work on the supporting
aggregate stuff you're talking about, but I think it's a good enough 

Re: [HACKERS] Buildfarm failure from overly noisy warning message

2015-07-26 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 + WARNING:  could not send signal to process 30123: No such process

 What's evidently happened here is that our session tried to boot an
 autovacuum process off a table lock, only that process was gone by the
 time we issued the kill() call.

 I'm inclined to reduce the WARNING to LOG, and/or skip it altogether
 if the error is ESRCH.

 One could also argue that both of these ereports ought to be downgraded to 
 DEBUG1
 or less, since this mechanism is pretty well shaken out by now.

 Thoughts?

I think a LOG entry when an autovacuum process is actually canceled
has value just in case it is happening on a particular table so
frequently that the table starts to bloat.  I see no reason to log
anything if there is an intention to cancel an autovacuum process
but it actually completes before we can do so.  IMV the best
solution would be to proceed straight to the kill() and only log
something if it was found by kill().

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more RLS oversights

2015-07-26 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/03/2015 10:03 AM, Noah Misch wrote:
 (2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend
 entry for each role in the TO clause.  Test case:

Please see the attached patch. Note that I used SHARED_DEPENDENCY_ACL
for this. It seems appropriate, but possibly we should invent a new
shared dependency type for this use case? Comments?

Thanks,

Joe

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVtSlAAAoJEDfy90M199hlrkIP/0BqMj30JpJVj3H5pIopU0RZ
pesBzFzzsWmt2QmasX9hajRfo6eXQAWPmKQmw/NDTJvHHNACxLdo0MHE7A9y7No7
aZIFTm0KhnG4jpzxfzpGqQ4ron5Vsc2TlNQgCBYCtbEEtuD0mV2qmcuTkqGte7iB
7iqneRBhmTYBy63X7S0ir4AhDi+JJg8P4F7YuU/XMcha5v5CpNJAToPpW7mCoZ8O
9w/RbXCHso7p1DIxfx4tfxVwLQ7j7G2j0rXbuA2e6OKfwZWWrk5E8Wnvc3xy3yCY
J37fcjOxFdhU/j1j+Tr90LYOSuRu5fQ4z6PmmsPkBM3T0Vllz/nNP64aVKbmHzga
szrIBERRs2icKa4OI08qZF42ObIEv0/t/NuyrXpegY6awRNNNsjyZvwM+51zdjw1
fuWh+2rObzh3RDH8lPB0N0rVfDMM+wU85Vaekckv/7UQ/pbWcwsYD/tykA1engQ8
Ww8kBAEct4dWdqppTqvLLxLgo4d+vuiS1mJ7SRY2aZFRX8QQ8TfB3eObETUsU4/X
9UWyMn+E0Au9ICX+wfD4whaBKst8EuO36qOZx0oZt+++EMOlzypgkCCxDtZO0KWd
KZHbtmJXHFVH+ihbEGXAKMIx+gLqSDG3IKy9MIJxpB3JY3XSdBNqobL2hiQy36/w
svK7i+mEYmUCQ6pB1j8c
=P1AS
-END PGP SIGNATURE-
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 17b48d4..20ac54e 100644
*** a/src/backend/commands/policy.c
--- b/src/backend/commands/policy.c
***
*** 22,27 
--- 22,28 
  #include catalog/indexing.h
  #include catalog/namespace.h
  #include catalog/objectaccess.h
+ #include catalog/pg_authid.h
  #include catalog/pg_policy.h
  #include catalog/pg_type.h
  #include commands/policy.h
***
*** 48,54 
  static void RangeVarCallbackForPolicy(const RangeVar *rv,
  		  Oid relid, Oid oldrelid, void *arg);
  static char parse_policy_command(const char *cmd_name);
! static ArrayType *policy_role_list_to_array(List *roles);
  
  /*
   * Callback to RangeVarGetRelidExtended().
--- 49,55 
  static void RangeVarCallbackForPolicy(const RangeVar *rv,
  		  Oid relid, Oid oldrelid, void *arg);
  static char parse_policy_command(const char *cmd_name);
! static Datum *policy_role_list_to_array(List *roles, int *num_roles);
  
  /*
   * Callback to RangeVarGetRelidExtended().
*** parse_policy_command(const char *cmd_nam
*** 130,159 
  
  /*
   * policy_role_list_to_array
!  *	 helper function to convert a list of RoleSpecs to an array of role ids.
   */
! static ArrayType *
! policy_role_list_to_array(List *roles)
  {
! 	ArrayType  *role_ids;
! 	Datum	   *temp_array;
  	ListCell   *cell;
- 	int			num_roles;
  	int			i = 0;
  
  	/* Handle no roles being passed in as being for public */
  	if (roles == NIL)
  	{
! 		temp_array = (Datum *) palloc(sizeof(Datum));
! 		temp_array[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);
  
! 		role_ids = construct_array(temp_array, 1, OIDOID, sizeof(Oid), true,
!    'i');
! 		return role_ids;
  	}
  
! 	num_roles = list_length(roles);
! 	temp_array = (Datum *) palloc(num_roles * sizeof(Datum));
  
  	foreach(cell, roles)
  	{
--- 131,158 
  
  /*
   * policy_role_list_to_array
!  *	 helper function to convert a list of RoleSpecs to an array of
!  *	 role id Datums.
   */
! static Datum *
! policy_role_list_to_array(List *roles, int *num_roles)
  {
! 	Datum	   *role_oids;
  	ListCell   *cell;
  	int			i = 0;
  
  	/* Handle no roles being passed in as being for public */
  	if (roles == NIL)
  	{
! 	   *num_roles = 1;
! 		role_oids = (Datum *) palloc(*num_roles * sizeof(Datum));
! 		role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);
  
! 		return role_oids;
  	}
  
!*num_roles = list_length(roles);
! 	role_oids = (Datum *) palloc(*num_roles * sizeof(Datum));
  
  	foreach(cell, roles)
  	{
*** policy_role_list_to_array(List *roles)
*** 164,187 
  		 */
  		if (spec-roletype == ROLESPEC_PUBLIC)
  		{
! 			if (num_roles != 1)
  ereport(WARNING,
  		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  		 errmsg(ignoring roles specified other than public),
  	  errhint(All roles are members of the public role.)));
! 			temp_array[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);
! 			num_roles = 1;
! 			break;
  		}
  		else
! 			temp_array[i++] =
  ObjectIdGetDatum(get_rolespec_oid((Node *) spec, false));
  	}
  
! 	role_ids = construct_array(temp_array, num_roles, OIDOID, sizeof(Oid), true,
! 			   'i');
! 
! 	return role_ids;
  }
  
  /*
--- 163,187 
  		 */
  		if (spec-roletype == ROLESPEC_PUBLIC)
  		{
! 			Datum   *tmp_role_oids;
! 
! 			if (*num_roles != 1)
  ereport(WARNING,
  		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  		 errmsg(ignoring roles specified other than public),
  	  errhint(All roles are members of the public role.)));
! 		   *num_roles = 1;
! 			tmp_role_oids = (Datum *) palloc(*num_roles * sizeof(Datum));
! 			tmp_role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);
! 
! 			return tmp_role_oids;
  		}
  		else