Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Amit Langote

KaiGai-san,

On 2015-07-27 PM 11:07, Kouhei Kaigai wrote:
 
   Append
-- Funnel
 -- PartialSeqScan on rel1 (num_workers = 4)
-- Funnel
 -- PartialSeqScan on rel2 (num_workers = 8)
-- SeqScan on rel3
 
  shall be rewritten to
   Funnel
 -- PartialSeqScan on rel1 (num_workers = 4)
 -- PartialSeqScan on rel2 (num_workers = 8)
 -- SeqScan on rel3(num_workers = 1)
 

In the rewritten plan, are respective scans (PartialSeq or Seq) on rel1,
rel2 and rel3 asynchronous w.r.t each other? Or does each one wait for the
earlier one to finish? I would think the answer is no because then it
would not be different from the former case, right? Because the original
premise seems that (partitions) rel1, rel2, rel3 may be on different
volumes so parallelism across volumes seems like a goal of parallelizing
Append.

From my understanding of parallel seqscan patch, each worker's
PartialSeqScan asks for a block to scan using a shared parallel heap scan
descriptor that effectively keeps track of division of work among
PartialSeqScans in terms of blocks. What if we invent a PartialAppend
which each worker would run in case of a parallelized Append. It would use
some kind of shared descriptor to pick a relation (Append member) to scan.
The shared structure could be the list of subplans including the mutex for
concurrency. It doesn't sound as effective as proposed
ParallelHeapScanDescData does for PartialSeqScan but any more granular
might be complicated. For example, consider (current_relation,
current_block) pair. If there are more workers than subplans/partitions,
then multiple workers might start working on the same relation after a
round-robin assignment of relations (but of course, a later worker would
start scanning from a later block in the same relation). I imagine that
might help with parallelism across volumes if that's the case. MergeAppend
parallelization might involve a bit more complication but may be feasible
with a PartialMergeAppend with slightly different kind of coordination
among workers. What do you think of such an approach?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Amit Kapila
On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

 Hello,

 I'm recently working/investigating on ParallelAppend feature
 towards the next commit fest. Below is my design proposal.

 1. Concept
 --
 Its concept is quite simple anybody might consider more than once.
 ParallelAppend node kicks background worker process to execute
 child nodes in parallel / asynchronous.
 It intends to improve the performance to scan a large partitioned
 tables from standpoint of entire throughput, however, latency of
 the first multi-hundred rows are not scope of this project.
 From standpoint of technology trend, it primarily tries to utilize
 multi-cores capability within a system, but also enables to expand
 distributed database environment using foreign-tables inheritance
 features.
 Its behavior is very similar to Funnel node except for several
 points, thus, we can reuse its infrastructure we have had long-
 standing discussion through the v9.5 development cycle.

 2. Problems to be solved
 -
 Typical OLAP workloads takes tons of tables join and scan on large
 tables which are often partitioned, and its KPI is query response
 time but very small number of sessions are active simultaneously.
 So, we are required to run a single query as rapid as possible even
 if it consumes larger computing resources than typical OLTP workloads.

 Current implementation to scan heap is painful when we look at its
 behavior from the standpoint - how many rows we can read within a
 certain time, because of synchronous manner.
 In the worst case, when SeqScan node tries to fetch the next tuple,
 heap_getnext() looks up a block on shared buffer, then ReadBuffer()
 calls storage manager to read the target block from the filesystem
 if not on the buffer. Next, operating system makes the caller
 process slept until required i/o get completed.
 Most of the cases are helped in earlier stage than the above worst
 case, however, the best scenario we can expect is: the next tuple
 already appear on top of the message queue (of course visibility
 checks are already done also) with no fall down to buffer manager
 or deeper.
 If we can run multiple scans in parallel / asynchronous, CPU core
 shall be assigned to another process by operating system, thus,
 it eventually improves the i/o density and enables higher processing
 throughput.
 Append node is an ideal point to be parallelized because
 - child nodes can have physically different location by tablespace,
   so further tuning is possible according to the system landscape.
 - it can control whether subplan is actually executed on background
   worker, per subplan basis. If subplan contains large tables and
   small tables, ParallelAppend may kick background worker to scan
   large tables only, but scan on small tables are by itself.
 - Like as Funnel node, we don't need to care about enhancement of
   individual node types. SeqScan, IndexScan, ForeignScan or others
   can perform as usual, but actually in parallel.


 3. Implementation
 --
 * Plan  Cost

 ParallelAppend shall appear where Appen can appear except for the
 usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
 both of AppendPath and ParallelAppendPath with cost for each.


Is there a real need to have new node like ParallelAppendPath?
Can't we have Funnel node beneath AppendNode and then each
worker will be responsible to have SeqScan on each inherited child
relation.  Something like

Append
   --- Funnel
  -- SeqScan rel1
  -- SeqScan rel2


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] 9.5a1 BUG FIX: pgbench negative latencies

2015-07-27 Thread Fabien COELHO


Under 9.5a1 pgbench -r negative latencies are reported on meta commands, 
probably as an oversight of 84f0ea3f.


This patch ensures that now is reset on each loop inside doCustom.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 2c3e365..cce67e8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1146,16 +1146,19 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 	bool		trans_needs_throttle = false;
 	instr_time	now;
 
+top:
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
 	 * first time it's needed, and reuse the same value throughout this
 	 * function after that. This also ensures that e.g. the calculated latency
 	 * reported in the log file and in the totals are the same. Zero means
 	 * not set yet.
+	 *
+	 * now must also be reset on goto top; issued when interpreting meta
+	 * commands, otherwise the per-command measured latency is wrong.
 	 */
 	INSTR_TIME_SET_ZERO(now);
 
-top:
 	commands = sql_files[st-use_file];
 
 	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Kouhei Kaigai
 On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 
  Hello,
 
  I'm recently working/investigating on ParallelAppend feature
  towards the next commit fest. Below is my design proposal.
 
  1. Concept
  --
  Its concept is quite simple anybody might consider more than once.
  ParallelAppend node kicks background worker process to execute
  child nodes in parallel / asynchronous.
  It intends to improve the performance to scan a large partitioned
  tables from standpoint of entire throughput, however, latency of
  the first multi-hundred rows are not scope of this project.
  From standpoint of technology trend, it primarily tries to utilize
  multi-cores capability within a system, but also enables to expand
  distributed database environment using foreign-tables inheritance
  features.
  Its behavior is very similar to Funnel node except for several
  points, thus, we can reuse its infrastructure we have had long-
  standing discussion through the v9.5 development cycle.
 
  2. Problems to be solved
  -
  Typical OLAP workloads takes tons of tables join and scan on large
  tables which are often partitioned, and its KPI is query response
  time but very small number of sessions are active simultaneously.
  So, we are required to run a single query as rapid as possible even
  if it consumes larger computing resources than typical OLTP workloads.
 
  Current implementation to scan heap is painful when we look at its
  behavior from the standpoint - how many rows we can read within a
  certain time, because of synchronous manner.
  In the worst case, when SeqScan node tries to fetch the next tuple,
  heap_getnext() looks up a block on shared buffer, then ReadBuffer()
  calls storage manager to read the target block from the filesystem
  if not on the buffer. Next, operating system makes the caller
  process slept until required i/o get completed.
  Most of the cases are helped in earlier stage than the above worst
  case, however, the best scenario we can expect is: the next tuple
  already appear on top of the message queue (of course visibility
  checks are already done also) with no fall down to buffer manager
  or deeper.
  If we can run multiple scans in parallel / asynchronous, CPU core
  shall be assigned to another process by operating system, thus,
  it eventually improves the i/o density and enables higher processing
  throughput.
  Append node is an ideal point to be parallelized because
  - child nodes can have physically different location by tablespace,
so further tuning is possible according to the system landscape.
  - it can control whether subplan is actually executed on background
worker, per subplan basis. If subplan contains large tables and
small tables, ParallelAppend may kick background worker to scan
large tables only, but scan on small tables are by itself.
  - Like as Funnel node, we don't need to care about enhancement of
individual node types. SeqScan, IndexScan, ForeignScan or others
can perform as usual, but actually in parallel.
 
 
  3. Implementation
  --
  * Plan  Cost
 
  ParallelAppend shall appear where Appen can appear except for the
  usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
  both of AppendPath and ParallelAppendPath with cost for each.
 
 
 Is there a real need to have new node like ParallelAppendPath?
 Can't we have Funnel node beneath AppendNode and then each
 worker will be responsible to have SeqScan on each inherited child
 relation.  Something like
 
 Append
--- Funnel
   -- SeqScan rel1
   -- SeqScan rel2

If Funnel can handle both of horizontal and vertical parallelism,
it is a great simplification. I never stick a new node.

Once Funnel get a capability to have multiple child nodes, probably,
Append node above will have gone. I expect set_append_rel_pathlist()
add two paths based on Append and Funnel, then planner will choose
the cheaper one according to its cost.

We will need to pay attention another issues we will look at when Funnel
kicks background worker towards asymmetric relations.

If number of rows of individual child nodes are various, we may
want to assign 10 background workers to scan rel1 with PartialSeqScan.
On the other hands, rel2 may have very small number of rows thus
its total_cost may be smaller than cost to launch a worker.
In this case, Funnel has child nodes to be executed asynchronously and
synchronously.

If cheapest path of the child relation is a pair of Funnel and
PartialSeqScan, we have to avoid to stack Funnel node. Probably,
Funnel node that performs like Append needs to pull up underlying
Funnel and assign equivalen number of workers as follows.

  Append
   -- Funnel
-- PartialSeqScan on rel1 (num_workers = 4)
   -- Funnel
-- PartialSeqScan on rel2 (num_workers = 8)
   -- SeqScan on rel3

 shall be rewritten to
  Funnel
-- PartialSeqScan on rel1 (num_workers = 4)
-- 

Re: [HACKERS] raw output from copy

2015-07-27 Thread Pavel Stehule
2015-07-27 10:41 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi:

 On 07/27/2015 06:55 AM, Craig Ringer wrote:

 On 7 July 2015 at 14:32, Pavel Stehule pavel.steh...@gmail.com wrote:

 Hi

 previous patch was broken, and buggy

 Here is new version with fixed upload and more tests


 I routinely see people trying to use COPY ... FORMAT binary to export
 a single binary field (like an image, for example) and getting
 confused by the header PostgreSQL adds. Or using text-format COPY and
 struggling with the hex escaping. It's clearly something people have
 trouble with.

 It doesn't help that while lo_import and lo_export can read paths
 outside the datadir (and refuse to read from within it),
 pg_read_binary_file is superuser only and disallows absolute paths.
 There's no corresponding pg_write_binary_file. So users who want to
 import and export a single binary field tend to try to use COPY. We
 have functionality for large objects that has no equivalent for
 'bytea'.

 I don't love the use of COPY for this, but it gets us support for
 arbitrary clients pretty easily. Otherwise it'd be server-side only
 via local filesystem access, or require special psql-specific
 functionality like we have for lo_import etc.


 COPY seems like a strange interface for this. I can see the point that the
 syntax is almost there already, for both input and output. But even that's
 not quite there yet, we'd need the new RAW format. And as an input method,
 COPY is a bit awkward, because you cannot easily pass the file to a
 function, for example. I think this should be implemented in psql, along
 the lines of Andrew's original \bcopy patch.

 There are a couple of related psql-features here actually, that would be
 useful on their own. The first is being able to send the query result to a
 file, for a single query only. You can currently do:

 \o /tmp/foo
 SELECT ...;
 \o

 But more often than not, when I try to do that, I forget to do the last
 \o, and run another query, and the output still goes to the file. So it'd
 be nice to have a \o option that only affects the next query. Something
 like:

 \O /tmp/foo
 SELECT ...;

 The second feature needed is to write the output without any headers, row
 delimiters and such. Just the datum. And the third feature is to write it
 in binary. Perhaps something like:

 \O /tmp/foo binary
 SELECT blob FROM foo WHERE id = 10;

 What about input? This is a whole new feature, but it would be nice to be
 able to pass the file contents as a query parameter. Something like:

 \P /tmp/foo binary
 INSERT INTO foo VALUES (?);


The example of input is strong reason, why don't do it via inserts. Only
parsing some special ? symbol needs lot of new code.

In this case, I don't see any advantage of  psql based solution. COPY is
standard interface for input/output from/to files, and it should be used
there.

Regards

Pavel




 - Heikki




Re: [HACKERS] Proposal for CSN based snapshots

2015-07-27 Thread Alexander Korotkov
On Sat, Jul 25, 2015 at 11:39 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 24 July 2015 at 19:21, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  It depends on the exact design we use to get that. Certainly we do not
 want
  them if they cause a significant performance regression.

 Yeah.  I think the performance worries expressed so far are:

 - Currently, if you see an XID that is between the XMIN and XMAX of
 the snapshot, you hit CLOG only on first access.  After that, the
 tuple is hinted.  With this approach, the hint bit doesn't avoid
 needing to hit CLOG anymore, because it's not enough to know whether
 or not the tuple committed; you have to know the CSN at which it
 committed, which means you have to look that up in CLOG (or whatever
 SLRU stores this data).  Heikki mentioned adding some caching to
 ameliorate this problem, but it sounds like he was worried that the
 impact might still be significant.


 This seems like the heart of the problem. Changing a snapshot from a list
 of xids into one number is easy. Making XidInMVCCSnapshot() work is the
 hard part because there needs to be a translation/lookup from CSN to
 determine if it contains the xid.

 That turns CSN into a reference to a cached snapshot, or a reference by
 which a snapshot can be derived on demand.


I got the problem. Currently, once we set hint bits don't have to visit
CLOG anymore. With CSN snapshots that is not so. We still have to translate
XID into CSN in order to compare it with snapshot CSN. In version of CSN
patch in this thread we still have XMIN and XMAX in the snapshot. AFAICS
with CSN snapshots XMIN and XMAX are not necessary required to express
snapshot, they were kept for optimization. That restricts usage of XID =
CSN map with given range of XIDs. However, with long running transactions
[XMIN; XMAX] range could be very wide and we could use XID = CSN map
heavily in wide range of XIDs.

As I can see in Huawei PGCon talk Dense Map in shared memory is proposed
for XID = CSN transformation. Having large enough Dense Map we can do
most of XID = CSN transformations with single shared memory access. PGCon
talk gives us result of pgbench. However, pgbench doesn't run any long
transactions. With long running transaction we can run out of Dense Map
for significant part of XID = CSN transformations. Dilip, did you test
your patch with long transactions?

I'm also thinking about different option for optimizing this. When we set
hint bits we can also change XMIN/XMAX with CSN. In this case we wouldn't
need to do XID = CSN transformation for this tuple anymore. However, we
have 32-bit XIDs for now. We could also have 32-bit CSNs. However, that
would doubles our troubles with wraparound: we will have 2 counters that
could wraparound. That could return us to thoughts about 64-bit XIDs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] REVOKE [ADMIN OPTION FOR] ROLE

2015-07-27 Thread Egor Rogov



On Thu, Jul 23, 2015 at 8:30 AM, Egor Rogov e.ro...@postgrespro.ru wrote:

So, the question: is it a documentation bug (as it seems to me), code bug,
or I missed something?

Your analysis looks right to me, but I don't know whether the code or
the documentation should be changed.  This claim was added by Tom Lane
in 2005 in commit 58d214e51fe50b10b4439da6ec263d54c155afbf.  It might
be worth checking whether the claim was true at that time and later
became false, or whether it was never true to begin with.

As far as I can see, modern revoke syntax for revoking membership in a 
role (along with admin option) was introduced in commit 7762619 (by 
Tom Lane, 2005). Code for handling this command didn't pay attention for 
restrict/cascade keywords then, as it does not now.
Before that, another syntax was in use: alter group groupname drop user 
username [, ...]. It did not include notion of cascade at all.
I guess that revoke role_name from role_name inherited 
[cascade|restrict] section from general revoke command but never 
actually used it. And I see no point in changing this, because role 
membership is somewhat more static than privileges.

So I would propose the attached fix for documentation.

Thanks,
Egor Rogov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/ref/revoke.sgml b/doc/src/sgml/ref/revoke.sgml
index 36c286b..8417abe 100644
--- a/doc/src/sgml/ref/revoke.sgml
+++ b/doc/src/sgml/ref/revoke.sgml
@@ -167,8 +167,10 @@ REVOKE [ ADMIN OPTION FOR ]
   /para
 
   para
-   When revoking membership in a role, literalGRANT OPTION/ is instead
-   called literalADMIN OPTION/, but the behavior is similar.
+   If literalADMIN OPTION FOR/ is specified, only the admin option is
+   revoked, not the group membership itself. When revoking membership in a
+   role or admin option, recursive revocation does not happen no matter
+   whether literalCASCADE/ is specified or not.
Note also that this form of the command does not
allow the noise word literalGROUP/.
   /para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating extension including dependencies

2015-07-27 Thread Michael Paquier
On Sun, Jul 26, 2015 at 1:01 AM, Petr Jelinek wrote:
 Yes that's what I meant by the change of checking order in the explanation
 above. I did that because I thought code would be more complicated
 otherwise, but apparently I was stupid...

+In case the extension specifies schema in its control file, the schema
s/schema/literalschema//

+++ b/src/test/modules/test_extensions/.gitignore
@@ -0,0 +1,3 @@
+# Generated subdirectories
+/results/
+/tmp_check/
test_extensions/.gitignore is missing /log/.

Something also has not been discussed yet: what to do with new_version
and old_version (the options of CreateExtensionStmt)? As of now if
those options are defined they are not passed down to the parent
extensions but shouldn't we raise an error if they are used in
combination with CASCADE? In any case, I think that the behavior
chosen should be documented.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Kouhei Kaigai
 Hello, can I ask some questions?

 I suppose we can take this as the analog of ParalleSeqScan.  I
 can see not so distinction between Append(ParalleSeqScan) and
 ParallelAppend(SeqScan). What difference is there between them?

Append does not start to execute the second or later node until
first node reaches end of the scan.
On the other hands, ParallelAppend will kick all the child nodes
(almost) simultaneously.

 If other nodes will have the same functionality as you mention at
 the last of this proposal, it might be better that some part of
 this feature is implemented as a part of existing executor
 itself, but not as a deidicated additional node, just as my
 asynchronous fdw execution patch patially does. (Although it
 lacks planner part and bg worker launching..) If that is the
 case, it might be better that ExecProcNode is modified so that it
 supports both in-process and inter-bgworker cases by the single
 API.
 
 What do you think about this?

Its downside is that we need to adjust all the existing nodes to
follow the new executor's capability. At this moment, we have 38
node types delivered from Plan. I think, it is not an easy job to
review a patch that changes multi-dozens files.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 regards,
 
  Hello,
 
  I'm recently working/investigating on ParallelAppend feature
  towards the next commit fest. Below is my design proposal.
 
  1. Concept
  --
  Its concept is quite simple anybody might consider more than once.
  ParallelAppend node kicks background worker process to execute
  child nodes in parallel / asynchronous.
  It intends to improve the performance to scan a large partitioned
  tables from standpoint of entire throughput, however, latency of
  the first multi-hundred rows are not scope of this project.
  From standpoint of technology trend, it primarily tries to utilize
  multi-cores capability within a system, but also enables to expand
  distributed database environment using foreign-tables inheritance
  features.
  Its behavior is very similar to Funnel node except for several
  points, thus, we can reuse its infrastructure we have had long-
  standing discussion through the v9.5 development cycle.
 
  2. Problems to be solved
  -
  Typical OLAP workloads takes tons of tables join and scan on large
  tables which are often partitioned, and its KPI is query response
  time but very small number of sessions are active simultaneously.
  So, we are required to run a single query as rapid as possible even
  if it consumes larger computing resources than typical OLTP workloads.
 
  Current implementation to scan heap is painful when we look at its
  behavior from the standpoint - how many rows we can read within a
  certain time, because of synchronous manner.
  In the worst case, when SeqScan node tries to fetch the next tuple,
  heap_getnext() looks up a block on shared buffer, then ReadBuffer()
  calls storage manager to read the target block from the filesystem
  if not on the buffer. Next, operating system makes the caller
  process slept until required i/o get completed.
  Most of the cases are helped in earlier stage than the above worst
  case, however, the best scenario we can expect is: the next tuple
  already appear on top of the message queue (of course visibility
  checks are already done also) with no fall down to buffer manager
  or deeper.
  If we can run multiple scans in parallel / asynchronous, CPU core
  shall be assigned to another process by operating system, thus,
  it eventually improves the i/o density and enables higher processing
  throughput.
  Append node is an ideal point to be parallelized because
  - child nodes can have physically different location by tablespace,
so further tuning is possible according to the system landscape.
  - it can control whether subplan is actually executed on background
worker, per subplan basis. If subplan contains large tables and
small tables, ParallelAppend may kick background worker to scan
large tables only, but scan on small tables are by itself.
  - Like as Funnel node, we don't need to care about enhancement of
individual node types. SeqScan, IndexScan, ForeignScan or others
can perform as usual, but actually in parallel.
 
 
  3. Implementation
  --
  * Plan  Cost
 
  ParallelAppend shall appear where Appen can appear except for the
  usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
  both of AppendPath and ParallelAppendPath with cost for each.
  Cost estimation logic shall take further discussions, however,
  I expect the logic below to estimate the cost for ParallelAppend.
1. Sum startup_cost and run_cost for each child pathnode, but
   distinguish according to synchronous or asynchronous.
   Probably, total cost of pathnode is less than:
(parallel_setup_cost + its total cost / parallel_append_degree

[HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Tom Lane
I challenge anybody to figure out what happened here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornetdt=2015-07-27%2010%3A25%3A17
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2015-07-04%2016%3A00%3A23
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crakedt=2015-07-07%2016%3A35%3A06

With no visibility of pg_ctl's output, and no copy of the postmaster log,
there is no chance of debugging intermittent failures like this one.
This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing
has inadequate error reporting by design.  If not ok isn't enough
information for you, tough beans.  (It might help if the farm script
captured the postmaster log after a failure, but that would do nothing
for prove's unwillingness to pass through client-side messages.)

I think we should disable TAP testing in the buildfarm until there is
some credible form of error reporting for it.  I've grown tired of
looking into buildfarm failure reports only to meet a dead end.
Aside from the wasted investigation time, which admittedly isn't huge,
there's an opportunity cost in that subsequent test steps didn't get run.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-27 Thread Tom Lane
Andreas Seltenreich seltenre...@gmx.de writes:
 Tom Lane writes:
 I've fixed the first two of these --- thanks for the report!

 I let sqlsmith run during the night, and it did no longer trigger the
 first two.  During roughly a million random queries it triggered the
 already mentioned brin one 10 times, but there was also one instance of
 this new one in the log:

Oh goody, more fun.  I'll take a look.

 I'm a bit confused about this aspect of your report though, because in
 my hands that example fails clear back to 9.2.  It doesn't seem to require
 the predtest.c improvement to expose the fault.

 Hmm, I actually used a different, uglier query to trigger this assertion
 for the bisection run.

Ah, okay.  The triggering condition for both those cases is
provably-contradictory restriction clauses on an inheritance relation.
In what you showed yesterday, that was something like x  x AND x IS
NULL, which the planner has been able to recognize as contradictory
for a long time because  is strict.  (It did not, and still doesn't,
notice that x  x all by itself is contradictory...).  But here it
looks like the trigger is

from public.b as rel4551420
where ( rel4551420.bbrel4551420.bb ) and ( rel4551420.bbrel4551420.bb )

It was the recent predtest improvements that allowed recognition that
bb  bb contradicts bb  bb.  So that's why this run started to fail
there, even though the bug it was tickling is much older.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Teodor Sigaev

I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum.


According to commit message:
commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Fri Nov 7 15:03:46 2014 +0200
..
The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
existed since we got rid of old-style VACUUM FULL. I kept the code that sets
the flag, although it's not used for anything anymore, because it might
still be interesting information for debugging purposes that some tuples
have been deleted from a page.
..

If Heikki doesn't change his opinion then introduce new flag. Although I don't 
think that we need to keep F_TUPLES_DELETED.



Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Anastasia Lubennikova
Hi,

I'm working on microvacuum for gist access method.
Briefly microvacuum includes two steps:
1. When search tells us that the tuple is invisible to all transactions it
is marked LP_DEAD and page is marked as has dead tuples,
2. Then, when insert touches full page which has dead tuples it calls
microvacuum instead of splitting page.
You can find a kind of review here [1].

While writing patch, I found strange GISTPageOpaqueData flag -
F_TUPLES_DELETED
http://doxygen.postgresql.org/gist_8h.html#a23812efd70313b9b10ae61376e2594f6
.
Its description looks like it is the same for BTP_HAS_GARBAGE
http://doxygen.postgresql.org/nbtree_8h.html#a3b7c77849276ff8617edc1f84441c230

#define F_TUPLES_DELETED   (1  2) /* some tuples on the page are dead */

#define BTP_HAS_GARBAGE   (1  6) /* page has LP_DEAD tuples */

But it's definitely not the same things. I found only two mentions of this
flag.
Function GistMarkTuplesDeleted
http://doxygen.postgresql.org/gist_8h.html#a96fc3c6bb5aecfc8d2818b7010d68aac
sets
the flag after dead tuples deletion.

Do anyone need it at all? I found no place where this flag is checked.
Is it correct using of the flag?

I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist
microvacuum.


[1]
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120
-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-27 Thread Fabien COELHO



Attatched is the revised version of this patch.

The first patch is not changed from before.

The second is fixed a kind of bug.

Ths third is the new one to allow backslash continuation for
backslash commands.


Ah, thanks:-)

Would you consider adding the patch to the next commitfest? I may put 
myself as a reviewer...



[...] I'm not satisfied by the design but I don't see another way..


I'll try to have a look.


I don't have idea how to deal with the copy of psqlscan.[lh] from
psql. Currently they are simply the dead copies of those of psql.


I think that there should be no copies, but it should use relative
symbolic links so that the files are kept synchronized.


Yeah, I think so but symlinks could harm on git and Windows.
The another way would be make copies it from psql directory. They live 
next door to each other.


Indeed there are plenty of links already which are generated by makefiles 
(see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There 
should no file duplication within the source tree.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] All-zero page in GIN index causes assertion failure

2015-07-27 Thread Heikki Linnakangas

On 07/23/2015 07:43 PM, Jeff Janes wrote:

On Mon, Jul 20, 2015 at 1:23 PM, Heikki Linnakangas hlinn...@iki.fi wrote:


I came up with the attached, for GIN and SP-GiST. Instead of WAL-logging
the page initialization in SP-GiST vacuum, I changed it so that it simply
leaves the page as all-zeroes, and adds it to the FSM. The code that grabs
a target page from the FSM handles that, and initializes the page anyway,
so that was simpler. This made the SP-GiST is-deleted flag obsolete, it's
no longer set, although the code still checks for it for
backwards-compatibility. (even that may actually be unnecessary, as a page
that's marked as deleted must also be empty, and wherever we check for the
deleted-flag, we also check for PageIsEmpty()))


This patch, in conjunction with the LWLock deadlock patch, fixes all the
issues I was having with GIN indexes in 9.5.


Thanks, I've pushed this, as well a fix to similar failure from B-tree 
vacuum that Amit Langote reported in the other thread.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Andrew Dunstan


On 07/27/2015 10:06 AM, Tom Lane wrote:

I challenge anybody to figure out what happened here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornetdt=2015-07-27%2010%3A25%3A17
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2015-07-04%2016%3A00%3A23
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crakedt=2015-07-07%2016%3A35%3A06

With no visibility of pg_ctl's output, and no copy of the postmaster log,
there is no chance of debugging intermittent failures like this one.
This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing
has inadequate error reporting by design.  If not ok isn't enough
information for you, tough beans.  (It might help if the farm script
captured the postmaster log after a failure, but that would do nothing
for prove's unwillingness to pass through client-side messages.)

I think we should disable TAP testing in the buildfarm until there is
some credible form of error reporting for it.  I've grown tired of
looking into buildfarm failure reports only to meet a dead end.
Aside from the wasted investigation time, which admittedly isn't huge,
there's an opportunity cost in that subsequent test steps didn't get run.




Well, it does create a lot of files that we don't pick up. An example 
list is show below, and I am attaching their contents in a single 
gzipped attachment. However, these are in the wrong location. This was a 
vpath build and yet these tmp_check directories are all created in the 
source tree. Let's fix that and then I'll set about having the buildfarm 
collect them. That should get us further down the track.


cheers

andrew

   
/home/andrew/pgl/pg_head/src/bin/pg_controldata/tmp_check/log/regress_log_001_pg_controldata
   
/home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_020_pg_receivexlog
   
/home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_010_pg_basebackup
   /home/andrew/pgl/pg_head/src/bin/pg_rewind/regress_log
   
/home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_003_extrafiles
   
/home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_001_basic
   
/home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_002_databases
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_091_reindexdb_all
   /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_050_dropdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_070_dropuser
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_020_createdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_102_vacuumdb_stages
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_030_createlang
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_060_droplang
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_040_createuser
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_010_clusterdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_011_clusterdb_all
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_101_vacuumdb_all
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_090_reindexdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_080_pg_isready
   
/home/andrew/pgl/pg_head/src/bin/pg_config/tmp_check/log/regress_log_001_pg_config
   
/home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_001_start_stop
   /home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_002_status
   /home/andrew/pgl/pg_head/src/bin/initdb/tmp_check/log/regress_log_001_initdb





binlog.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-27 Thread Alvaro Herrera
Tom Lane wrote:

 Bottom line is that somebody failed to consider the possibility of a
 null comparison value reaching the BRIN index lookup machinery.
 The code stanza that's failing supposes that only IS NULL or IS NOT NULL
 tests could have SK_ISNULL set, but that's just wrong.

Hm, okay, will look into that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-07-27 Thread Alvaro Herrera
Noah Misch wrote:
 On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote:
  Peter Geoghegan wrote:
   On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch n...@leadboat.com wrote:
  - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local 
pin
count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared 
buffer
when its global pin count falls to zero.
   
   Did a patch for this ever materialize?
  
  I think the first part would be something like the attached.
 
 Neat.  Does it produce any new complaints during make installcheck?

I only tried a few tests, for lack of time, and it didn't produce any.
(To verify that the whole thing was working properly, I reduced the
range of memory made available during PinBuffer and that resulted in a
crash immediately).  I am not really familiar with valgrind TBH and just
copied a recipe to run postmaster under it, so if someone with more
valgrind-fu could verify this, it would be great.


This part:

Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its
global pin count falls to zero.

can be done without any valgrind, I think.  Any takers?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Anastasia Lubennikova
2015-07-27 20:05 GMT+04:00 Heikki Linnakangas hlinn...@iki.fi:

 On 07/27/2015 06:46 PM, Teodor Sigaev wrote:

 I need an advice, what would be better:
 - to add new flag like F_HAS_GARBAGE,
 - or to delete all mentions of F_TUPLES_DELETED and use it in gist
 microvacuum.


 According to commit message:
 commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
 Author: Heikki Linnakangas heikki.linnakan...@iki.fi
 Date:   Fri Nov 7 15:03:46 2014 +0200
 ..
   The code that generated a record to clear the F_TUPLES_DELETED flag
 hasn't
   existed since we got rid of old-style VACUUM FULL. I kept the code
 that sets
   the flag, although it's not used for anything anymore, because it
 might
   still be interesting information for debugging purposes that some
 tuples
   have been deleted from a page.
 ..

 If Heikki doesn't change his opinion then introduce new flag. Although I
 don't
 think that we need to keep F_TUPLES_DELETED.


 It's certainly not needed for anything at the moment, although conceivably
 we might reintroduce code that needs it in the future. There are plenty of
 flag bits available, so let's use a new flag. If there was a shortage, I
 wouldn't blink reusing F_TUPLES_DELETED.

 - Heikki


Thanks for the quick reply

-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Buildfarm failure from overly noisy warning message

2015-07-27 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:

 I think a LOG entry when an autovacuum process is actually canceled
 has value just in case it is happening on a particular table so
 frequently that the table starts to bloat.  I see no reason to log
 anything if there is an intention to cancel an autovacuum process
 but it actually completes before we can do so.

 Hm.  By that logic, I'm not sure if we need anything to be logged here,
 because the autovacuum process will log something about having received
 a query cancel signal.

That seems sufficient to me for normal cases.

 If we're in the business of minimizing log chatter, I'd suggest that
 we remove the entirely-routine sending cancel log message, and only
 log something in the uncommon case where the kill() fails (but, per
 original point, reduce that entry to LOG or so; or else print something
 only for not-ESRCH cases).

+1 for only printing for the non-ESRCH cases.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce ProcArrayLock contention

2015-07-27 Thread Robert Haas
On Sat, Jul 25, 2015 at 12:42 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I thought that internal API will automatically take care of it,
 example for msvc it uses _InterlockedCompareExchange64
 which if doesn't work on 32-bit systems or is not defined, then
 we have to use 32-bit version, but I am not certain about
 that fact.

Instead of using pg_atomic_uint64, how about using pg_atomic_uint32
and storing the pgprocno rather than the pointer directly?  Then it
can work the same way (and be the same size) on every platform.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 05:06 PM, Tom Lane wrote:

I challenge anybody to figure out what happened here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornetdt=2015-07-27%2010%3A25%3A17
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2015-07-04%2016%3A00%3A23
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crakedt=2015-07-07%2016%3A35%3A06

With no visibility of pg_ctl's output, and no copy of the postmaster log,
there is no chance of debugging intermittent failures like this one.
This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing
has inadequate error reporting by design.  If not ok isn't enough
information for you, tough beans.  (It might help if the farm script
captured the postmaster log after a failure, but that would do nothing
for prove's unwillingness to pass through client-side messages.)


Yep.


I think we should disable TAP testing in the buildfarm until there is
some credible form of error reporting for it.  I've grown tired of
looking into buildfarm failure reports only to meet a dead end.
Aside from the wasted investigation time, which admittedly isn't huge,
there's an opportunity cost in that subsequent test steps didn't get run.


Commit 1ea06203b - Improve logging of TAP tests - made it a lot better. 
The pg_ctl log should be in the log file now. The buildfarm doesn't seem 
to capture those logs at the moment, but that should be easy to fix.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] All-zero page in GIN index causes assertion failure

2015-07-27 Thread Amit Langote
On Tue, Jul 28, 2015 at 12:21 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

 Thanks, I've pushed this, as well a fix to similar failure from B-tree
 vacuum that Amit Langote reported in the other thread.


Thanks Heikki and sorry I didn't notice this new thread.

Regards,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Robert Haas
On Sat, Jul 25, 2015 at 4:46 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 22 July 2015 at 14:59, Robert Haas robertmh...@gmail.com wrote:
 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.

 What is the reason why we don't do that already? Surely its a one liner?

I wish!

It actually doesn't look simple to me to modify heap_page_prune to
know whether any not-all-visible items remain at the end.  It's
definitely true that, in order for marking the page all-visible to be
a possibility, we need to reach heap_page_prune_execute() with ndead
== 0.  But that's not enough: we also need to know that any tuple that
survived the prune operation (that is, it wasn't redirected or marked
unused) has a new-enough xmin, which isn't tracked anywhere.
vacuumlazy.c solves that problem by making a second pass over all the
tuples on the page to determine whether the page is all-visible, but
that seems pretty inefficient and I think heap_page_prune() is already
a CPU hog on some workloads.  (Didn't Heikki have a patch to improve
that?)

Another sticky wicket is that, as things stand today, we'd have to
insert a second WAL record to mark the page all-visible, which would
add probably-significant overhead.  We could probably modify the
format of the record we're already emitting for pruning to carry a
flag indicating whether to additionally set PD_ALL_VISIBLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 06:46 PM, Teodor Sigaev wrote:

I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum.


According to commit message:
commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Fri Nov 7 15:03:46 2014 +0200
..
  The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
  existed since we got rid of old-style VACUUM FULL. I kept the code that 
sets
  the flag, although it's not used for anything anymore, because it might
  still be interesting information for debugging purposes that some tuples
  have been deleted from a page.
..

If Heikki doesn't change his opinion then introduce new flag. Although I don't
think that we need to keep F_TUPLES_DELETED.


It's certainly not needed for anything at the moment, although 
conceivably we might reintroduce code that needs it in the future. There 
are plenty of flag bits available, so let's use a new flag. If there was 
a shortage, I wouldn't blink reusing F_TUPLES_DELETED.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Several memory leaks for pg_rewind caused by missing PQclear calls

2015-07-27 Thread Heikki Linnakangas

On 07/23/2015 05:08 PM, Michael Paquier wrote:

Hi all,

After a run of valgrind on pg_rewind, I found a couple of code paths
missing some PQclear calls after running a query. Attached is a patch
to fix all those leaks.


Fixed, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 10:03 AM, Joe Conway wrote:
 On 07/26/2015 07:59 AM, Joe Conway wrote:
 On 07/26/2015 07:19 AM, Dean Rasheed wrote:
 Attached is an updated patch (still needs some docs for the
 functions).
 
 Thanks for that. I'll add the docs.
 
 Documentation added. Also added comment to check_enable_rls about 
 passing InvalidOid versus GetUserId().
 
 I believe this is ready to go -- any other comments?

Strike that - now I really think it is ready to go :-)

In this patch I additionally changed instances of:
  check_enable_rls(indrelid, GetUserId(), true)
to
  check_enable_rls(indrelid, InvalidOid, true)
per Dean's earlier remark and my new comment.

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVtma+AAoJEDfy90M199hl01wP+wYTV6VfBbpVEVf2+ZmbQlbJ
pgquLXkXsZ9vdsw/jY09+7HKwVQFjqq+E3zjqj/Pn9Q0h17cgflPuYSvde30Mb+l
86zVD5oKLttFlCb9Ablbauc8FoYTud3D+fJkGwDPBYh5VeIlFRwQMRSKQRxKHFfr
PvXmv3z7TmYGBe7dLEl24WyGncOtsJxPiHZDYA5Cna7lG+jlHqVIDz5itu6xGHgy
OOLfr07aZX3Bt9zmzg1NdxcBZNc6NkSVtKFzkqrJ+rCIcoMFxyIWsVp2IAEOItFI
o7hNEqrRk8yMcyX+Ej7K/6arOqCjQ6+RT+tJarCNDPv7WRXwt4PInircCjswt+uX
/vMM7zhzhrW+BMc2rbkU4TKfcEfI78SxUh3jKRTMbUWM6UJPZ+ca1mo6EQGNhUaS
mOMnpPD+huKXZpKlAC1ImH1boFPYqf9de6ToQRIdm7GKLUhKK8llWg3wC2GwMrtq
JDojJhPUohhofMaU7YjokJWx0vAa3NckgCO4nmYvL5Sc36+QUDlW4Amm43el7PvB
SkD2B0AvLZFmMJlrh3eAnuDleXzjRmVc1WoJtGGT2qwmL9oSDtT6y4Uh+0VnDJkh
T7XJ1NgvwFGNzG/heVTv346Mah2wRl/4A43jpojzQLjbNZ7t2gi8h9DkanA7/iGK
JOmMBbIfVlKnT+SKEOVJ
=WZhM
-END PGP SIGNATURE-
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 98d1497..fd82ea4 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SET search_path TO replaceableschema/
*** 15259,15264 
--- 15259,15270 
 entrytypeboolean/type/entry
 entrydoes current user have privilege for role/entry
/row
+   row
+entryliteralfunctionrow_security_active/function(parametertable/parameter)/literal
+/entry
+entrytypeboolean/type/entry
+entrydoes current user have row level security active for table/entry
+   /row
   /tbody
  /tgroup
 /table
*** SET search_path TO replaceableschema/
*** 15299,15304 
--- 15305,15313 
 indexterm
  primarypg_has_role/primary
 /indexterm
+indexterm
+ primaryrow_security_active/primary
+/indexterm
  
 para
  functionhas_table_privilege/function checks whether a user
*** SELECT has_function_privilege('joeuser',
*** 15462,15467 
--- 15471,15483 
  are immediately available without doing commandSET ROLE/.
 /para
  
+para
+ functionrow_security_active/function checks whether row level
+ security is active for the specified table in the context of the
+ functioncurrent_user/function and environment. The table can
+ be specified by name or by OID.
+/para
+ 
para
 xref linkend=functions-info-schema-table shows functions that
 determine whether a certain object is firsttermvisible/ in the
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 1043362..aa5b28c 100644
*** a/src/backend/access/index/genam.c
--- b/src/backend/access/index/genam.c
*** BuildIndexValueDescription(Relation inde
*** 204,210 
  	Assert(indexrelid == idxrec-indexrelid);
  
  	/* RLS check- if RLS is enabled then we don't return anything. */
! 	if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
  		return NULL;
--- 204,210 
  	Assert(indexrelid == idxrec-indexrelid);
  
  	/* RLS check- if RLS is enabled then we don't return anything. */
! 	if (check_enable_rls(indrelid, InvalidOid, true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
  		return NULL;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..c0bd6fa 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_indexes AS
*** 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
--- 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats WITH (security_barrier) AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
*** CREATE VIEW pg_stats AS
*** 211,217 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped AND has_column_privilege(c.oid, a.attnum, 'select');
  
  REVOKE ALL on 

Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
On 07/26/2015 07:59 AM, Joe Conway wrote:
 On 07/26/2015 07:19 AM, Dean Rasheed wrote:
 Attached is an updated patch (still needs some docs for the functions).
 
 Thanks for that. I'll add the docs.

Documentation added. Also added comment to check_enable_rls about
passing InvalidOid versus GetUserId().

I believe this is ready to go -- any other comments?

Joe

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 98d1497..fd82ea4 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SET search_path TO replaceableschema/
*** 15259,15264 
--- 15259,15270 
 entrytypeboolean/type/entry
 entrydoes current user have privilege for role/entry
/row
+   row
+entryliteralfunctionrow_security_active/function(parametertable/parameter)/literal
+/entry
+entrytypeboolean/type/entry
+entrydoes current user have row level security active for table/entry
+   /row
   /tbody
  /tgroup
 /table
*** SET search_path TO replaceableschema/
*** 15299,15304 
--- 15305,15313 
 indexterm
  primarypg_has_role/primary
 /indexterm
+indexterm
+ primaryrow_security_active/primary
+/indexterm
  
 para
  functionhas_table_privilege/function checks whether a user
*** SELECT has_function_privilege('joeuser',
*** 15462,15467 
--- 15471,15483 
  are immediately available without doing commandSET ROLE/.
 /para
  
+para
+ functionrow_security_active/function checks whether row level
+ security is active for the specified table in the context of the
+ functioncurrent_user/function and environment. The table can
+ be specified by name or by OID.
+/para
+ 
para
 xref linkend=functions-info-schema-table shows functions that
 determine whether a certain object is firsttermvisible/ in the
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..c0bd6fa 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_indexes AS
*** 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
--- 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats WITH (security_barrier) AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
*** CREATE VIEW pg_stats AS
*** 211,217 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped AND has_column_privilege(c.oid, a.attnum, 'select');
  
  REVOKE ALL on pg_statistic FROM public;
  
--- 211,219 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped
! AND has_column_privilege(c.oid, a.attnum, 'select')
! AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
  
  REVOKE ALL on pg_statistic FROM public;
  
diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c
index aaf0061..2386cf0 100644
*** a/src/backend/rewrite/rowsecurity.c
--- b/src/backend/rewrite/rowsecurity.c
*** get_row_security_policies(Query *root, C
*** 107,113 
  
  	Relation	rel;
  	Oid			user_id;
- 	int			sec_context;
  	int			rls_status;
  	bool		defaultDeny = false;
  
--- 107,112 
*** get_row_security_policies(Query *root, C
*** 117,138 
  	*hasRowSecurity = false;
  	*hasSubLinks = false;
  
! 	/* This is just to get the security context */
! 	GetUserIdAndSecContext(user_id, sec_context);
  
  	/* Switch to checkAsUser if it's set */
  	user_id = rte-checkAsUser ? rte-checkAsUser : GetUserId();
  
- 	/*
- 	 * If this is not a normal relation, or we have been told to explicitly
- 	 * skip RLS (perhaps because this is an FK check) then just return
- 	 * immediately.
- 	 */
- 	if (rte-relid  FirstNormalObjectId
- 		|| rte-relkind != RELKIND_RELATION
- 		|| (sec_context  SECURITY_ROW_LEVEL_DISABLED))
- 		return;
- 
  	/* Determine the state of RLS for this, pass checkAsUser explicitly */
  	rls_status = check_enable_rls(rte-relid, rte-checkAsUser, false);
  
--- 116,128 
  	*hasRowSecurity = false;
  	*hasSubLinks = false;
  
! 	/* If this is not a normal relation, just return immediately */
! 	if (rte-relkind != RELKIND_RELATION)
! 		return;
  
  	/* Switch to checkAsUser if it's set */
  	user_id = rte-checkAsUser ? rte-checkAsUser : 

Re: [HACKERS] Reduce ProcArrayLock contention

2015-07-27 Thread Pavan Deolasee
On Sat, Jul 25, 2015 at 10:12 AM, Amit Kapila amit.kapil...@gmail.com
wrote:


 
 
  Numbers look impressive and definitely shows that the idea is worth
 pursuing. I tried patch on my laptop. Unfortunately, at least for 4 and 8
 clients, I did not see any improvement.
 

 I can't help in this because I think we need somewhat
 bigger m/c to test the impact of patch.


I understand. IMHO it will be a good idea though to ensure that the patch
does not cause regression for other setups such as a less powerful machine
or while running with lower number of clients.



 I was telling that fact even without my patch. Basically I have
 tried by commenting ProcArrayLock in ProcArrayEndTransaction.


I did not get that. You mean the TPS is same if you run with commenting out
ProcArrayLock in ProcArrayEndTransaction? Is that safe to do?


  But those who don't get the lock will sleep and hence the contention is
 moved somewhere else, at least partially.
 

 Sure, if contention is reduced at one place it will move
 to next lock.


What I meant was that the lock may not show up in the contention because
all but one processes now sleep for their work to do be done by the group
leader. So the total time spent in the critical section remains the same,
but not shown in the profile. Sure, your benchmark numbers show this is
still better than all processes contending for the lock.




 No, autovacuum generates I/O due to which sometimes there
 is more variation in Write tests.


Sure, but on an average does it still show similar improvements? Or does
the test becomes IO bound and hence the bottleneck shifts somewhere else?
Can you please post those numbers as well when you get chance?



 I can do something like that if others also agree with this new
 API in LWLock series, but personally I don't think LWLock.c is
 the right place to expose API for this work.  Broadly the work
 we are doing can be thought of below sub-tasks.

 1. Advertise each backend's xid.
 2. Push all backend's except one on global list.
 3. wait till some-one wakes and check if the xid is cleared,
repeat untll the xid is clear
 4. Acquire the lock
 5. Pop all the backend's and clear each one's xid and used
their published xid to advance global latestCompleteXid.
 6. Release Lock
 7. Wake all the processes waiting for their xid to be cleared
and before waking mark that Xid of the backend is clear.

 So among these only step 2 can be common among different
 algorithms, other's need some work specific to each optimization.


Right, but if we could encapsulate that work in a function that just needs
to work on some shared memory, I think we can build such infrastructure.
Its possible though a more elaborate infrastructure is needed that just one
function pointer. For example, in this case, we also want to set the
latestCompletedXid after clearing xids for all pending processes.



 
  Regarding the patch, the compare-and-exchange function calls that you've
 used would work only for 64-bit machines, right? You would need to use
 equivalent 32-bit calls on a 32-bit machine.
 

 I thought that internal API will automatically take care of it,
 example for msvc it uses _InterlockedCompareExchange64
 which if doesn't work on 32-bit systems or is not defined, then
 we have to use 32-bit version, but I am not certain about
 that fact.


Hmm. The pointer will be a 32-bit field on a 32-bit machine. I don't know
how exchanging that with 64-bit integer be safe.

Thanks,
Pavan


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


[HACKERS] Alpha 2 wrapping August 3

2015-07-27 Thread Josh Berkus
... so please get those fixes/overhauls in in the next week.

Thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:37 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2015-07-27 20:32 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  It will be nice side effect, but my primary problem was a impossibility
  to
  combine VACUUM and any other statement to one simple psql call.

 Seems like you can do that easily enough:

 [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql
  ?column?
 --
 1
 (1 row)

 VACUUM
  ?column?
 --
 2
 (1 row)


 how I can do it with xargs?

I don't specifically what you're trying to do, but I bet it's not that hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:43 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  I think this is already possible, is it not?  You just have to look for
  an identically-identified pg_locks entry with granted=true.  That gives
  you a PID and vxid/xid.  You can self-join pg_locks with that, and join
  to pg_stat_activity.
 
  I remember we discussed having a layer of system views on top of
  pg_stat_activity and pg_locks, probably defined recursively, that would
  show the full graph of waiters/lockers.

 It isn't necessarily the case that A is waiting for a unique process
 B.  It could well be the case that A wants AccessExclusiveLock and
 many processes hold a variety of other lock types.

 Sure, but I don't think this makes it impossible to figure out who's
 locking who.  I think the only thing you need other than the data in
 pg_locks is the conflicts table, which is well documented.

 Oh, hmm, one thing missing is the ordering of the wait queue for each
 locked object.  If process A holds RowExclusive on some object, process
 B wants ShareLock (stalled waiting) and process C wants AccessExclusive
 (also stalled waiting), who of B and C is woken up first after A
 releases the lock depends on order of arrival.

Agreed - it would be nice to expose that somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Robert Haas
On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 It will be nice side effect, but my primary problem was a impossibility to
 combine VACUUM and any other statement to one simple psql call.

Seems like you can do that easily enough:

[rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql
 ?column?
--
1
(1 row)

VACUUM
 ?column?
--
2
(1 row)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 David Rowley wrote:
 I've not looked into the feasibility of it, but if it were also possible to
 have a waiting_for column which would store the process ID of the process
 that's holding a lock that this process is waiting on, then it would be
 possible for some smart guy to write some code which draws beautiful
 graphs, perhaps in Pg Admin 4 of which processes are blocking other
 processes. I imagine this as a chart with an icon for each process.
 Processes waiting on locks being released would have an arrow pointing to
 their blocking process, if we clicked on that blocking process we could see
 the query that it's running and various other properties that are existing
 columns in pg_stat_activity.

 I think this is already possible, is it not?  You just have to look for
 an identically-identified pg_locks entry with granted=true.  That gives
 you a PID and vxid/xid.  You can self-join pg_locks with that, and join
 to pg_stat_activity.

 I remember we discussed having a layer of system views on top of
 pg_stat_activity and pg_locks, probably defined recursively, that would
 show the full graph of waiters/lockers.

It isn't necessarily the case that A is waiting for a unique process
B.  It could well be the case that A wants AccessExclusiveLock and
many processes hold a variety of other lock types.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Alvaro Herrera
Robert Haas wrote:
 On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  I think this is already possible, is it not?  You just have to look for
  an identically-identified pg_locks entry with granted=true.  That gives
  you a PID and vxid/xid.  You can self-join pg_locks with that, and join
  to pg_stat_activity.
 
  I remember we discussed having a layer of system views on top of
  pg_stat_activity and pg_locks, probably defined recursively, that would
  show the full graph of waiters/lockers.
 
 It isn't necessarily the case that A is waiting for a unique process
 B.  It could well be the case that A wants AccessExclusiveLock and
 many processes hold a variety of other lock types.

Sure, but I don't think this makes it impossible to figure out who's
locking who.  I think the only thing you need other than the data in
pg_locks is the conflicts table, which is well documented.

Oh, hmm, one thing missing is the ordering of the wait queue for each
locked object.  If process A holds RowExclusive on some object, process
B wants ShareLock (stalled waiting) and process C wants AccessExclusive
(also stalled waiting), who of B and C is woken up first after A
releases the lock depends on order of arrival.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Alvaro Herrera
David Rowley wrote:

 I've not looked into the feasibility of it, but if it were also possible to
 have a waiting_for column which would store the process ID of the process
 that's holding a lock that this process is waiting on, then it would be
 possible for some smart guy to write some code which draws beautiful
 graphs, perhaps in Pg Admin 4 of which processes are blocking other
 processes. I imagine this as a chart with an icon for each process.
 Processes waiting on locks being released would have an arrow pointing to
 their blocking process, if we clicked on that blocking process we could see
 the query that it's running and various other properties that are existing
 columns in pg_stat_activity.

I think this is already possible, is it not?  You just have to look for
an identically-identified pg_locks entry with granted=true.  That gives
you a PID and vxid/xid.  You can self-join pg_locks with that, and join
to pg_stat_activity.

I remember we discussed having a layer of system views on top of
pg_stat_activity and pg_locks, probably defined recursively, that would
show the full graph of waiters/lockers.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVOKE [ADMIN OPTION FOR] ROLE

2015-07-27 Thread Stephen Frost
Egor,

* Egor Rogov (e.ro...@postgrespro.ru) wrote:
 On Thu, Jul 23, 2015 at 8:30 AM, Egor Rogov e.ro...@postgrespro.ru wrote:
 So, the question: is it a documentation bug (as it seems to me), code bug,
 or I missed something?
 Your analysis looks right to me, but I don't know whether the code or
 the documentation should be changed.  This claim was added by Tom Lane
 in 2005 in commit 58d214e51fe50b10b4439da6ec263d54c155afbf.  It might
 be worth checking whether the claim was true at that time and later
 became false, or whether it was never true to begin with.
 
 As far as I can see, modern revoke syntax for revoking membership in
 a role (along with admin option) was introduced in commit 7762619
 (by Tom Lane, 2005). Code for handling this command didn't pay
 attention for restrict/cascade keywords then, as it does not now.
 Before that, another syntax was in use: alter group groupname drop
 user username [, ...]. It did not include notion of cascade at
 all.
 I guess that revoke role_name from role_name inherited
 [cascade|restrict] section from general revoke command but never
 actually used it. And I see no point in changing this, because role
 membership is somewhat more static than privileges.
 So I would propose the attached fix for documentation.

Have you looked at the SQL spec at all for this..?  That's what we
really should be looking at to determine if this is a documentation
issue or a code issue.

I'll take a look in a day or two after I've caught up on other things,
if no one beats me to it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] spgist recovery assertion failure

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 04:24 PM, Michael Paquier wrote:

On Mon, Jul 27, 2015 at 2:33 PM, Piotr Stefaniak
postg...@piotr-stefaniak.me wrote:

On 07/27/2015 07:19 AM, Michael Paquier wrote:


On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch n...@leadboat.com wrote:


When I caused a crash during the create_index regression test, recovery
hit an
assertion failure.  Minimal test case:

psql -X EOSQL
CREATE TABLE t (c text);
INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000);
INSERT INTO t VALUES ('P0123456789abcdefF');
CREATE INDEX ON t USING spgist (c);
EOSQL
pg_ctl -m immediate -w restart



On which platform are you seeing the failure? I am afraid I could not
reproduce the failure on Linux and OSX after testing it on HEAD.



I'm having the same symptoms with
159cff58cf3b565be3c17901698a74238e9e23f8 on Ubuntu Linux 3.4.39 armv7l.


Yes, on armv7l this can be indeed reproduced.


Fixed, thanks everyone! The problem was that in the WAL format change 
patch, I had used char in a struct to hold -1, but char is unsigned 
on PowerPC and ARM.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more RLS oversights

2015-07-27 Thread Alvaro Herrera
Joe Conway wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 07/03/2015 10:03 AM, Noah Misch wrote:
  (2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend
  entry for each role in the TO clause.  Test case:
 
 Please see the attached patch. Note that I used SHARED_DEPENDENCY_ACL
 for this. It seems appropriate, but possibly we should invent a new
 shared dependency type for this use case? Comments?

Hmm, these are not ACL objects, so conceptually it seems cleaner to use
a different symbol for this.  I think the catalog state and the error
messages would be a bit confusing otherwise.

   if (spec-roletype == ROLESPEC_PUBLIC)
   {
 ! Datum   *tmp_role_oids;
 ! 
 ! if (*num_roles != 1)
   ereport(WARNING,
   
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg(ignoring roles 
 specified other than public),
 errhint(All roles are members of the 
 public role.)));
 !*num_roles = 1;
 ! tmp_role_oids = (Datum *) palloc(*num_roles * 
 sizeof(Datum));
 ! tmp_role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);

Isn't this leaking the previously allocated array?  Not sure it's all
that critical, but still.  (I don't think you really need to call palloc
at all here.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Andrew Dunstan


On 07/27/2015 02:53 PM, Pavel Stehule wrote:





I am trying to run parallel execution

psql -At -c select datname from pg_database postgres | xargs -n 1 -P 
3 psql -c select current_database()






I don't think it's going to be a hugely important feature, but I don't 
see a problem with creating a new option (-C seems fine) which would 
have the same effect as if the arguments were contatenated into a file 
which is then used with -f. IIRC -c has some special characteristics 
which means it's probably best not to try to extend it for this feature.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-27 20:32 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  It will be nice side effect, but my primary problem was a impossibility
 to
  combine VACUUM and any other statement to one simple psql call.

 Seems like you can do that easily enough:

 [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql
  ?column?
 --
 1
 (1 row)

 VACUUM
  ?column?
 --
 2
 (1 row)


how I can do it with xargs?

Regards

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] anole: assorted stability problems

2015-07-27 Thread Robert Haas
On Sun, Jul 26, 2015 at 11:36 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-07-07 13:25:24 +0200, Andres Freund wrote:
 So, it's starting to look good. Not exactly allowing for a lot of
 confidence yet, but still:
 http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anolebr=HEAD

 Since there have not been any relevant failures since, I'm going to
 remove this issue from the open items list.

Woohoo!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-27 20:47 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Mon, Jul 27, 2015 at 2:37 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  2015-07-27 20:32 GMT+02:00 Robert Haas robertmh...@gmail.com:
 
  On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com
 
  wrote:
   It will be nice side effect, but my primary problem was a
 impossibility
   to
   combine VACUUM and any other statement to one simple psql call.
 
  Seems like you can do that easily enough:
 
  [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') |
 psql
   ?column?
  --
  1
  (1 row)
 
  VACUUM
   ?column?
  --
  2
  (1 row)
 
 
  how I can do it with xargs?

 I don't specifically what you're trying to do, but I bet it's not that
 hard.


I am trying to run parallel execution

psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3
psql -c select current_database()

Pavel



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



[HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Andres Freund
Hi,

I recently once more noticed that timestamptz_out is really, really
slow. To quantify that, I created a bunch of similar sized tables:

CREATE TABLE tbl_timestamp AS SELECT NOW() FROM generate_series(1, 10) a, 
generate_series(1, 100) b;
CREATE TABLE tbl_int8 AS SELECT 1::bigint FROM generate_series(1,  10) a, 
generate_series(1, 100) b;
CREATE TABLE tbl_bytea AS SELECT '   '::bytea FROM generate_series(1, 10) 
a, generate_series(1, 100) b;

These all end up being 346MB large.

COPY tbl_bytea TO '/dev/null';
Time: 1173.484 ms
COPY tbl_int8 TO '/dev/null';
Time: 1030.756 ms
COPY tbl_timestamp TO '/dev/null';
Time: 6598.030

(all best of three)

Yes, timestamp outputs more data as a whole, but being 5 times as slow
is still pretty darn insane. To make sure that's not the cause, here's
another table:

CREATE TABLE tbl_timestamptext AS SELECT NOW()::text FROM generate_series(1, 
10) a, generate_series(1, 100) b;
COPY tbl_timestamptext TO '/dev/null';
Time: 1449.554 ms

So it's really just the timestamp code.


Profiling it shows:
  Overhead  Command Shared Object Symbol
  -   38.33%  postgres_stock  libc-2.19.so  [.] vfprintf
 - 97.92% vfprintf
  _IO_vsprintf
- sprintf
   + 70.25% EncodeDateTime
   + 29.75% AppendSeconds.constprop.10
 + 1.11% _IO_default_xsputn
  -8.22%  postgres_stock  libc-2.19.so  [.] _IO_default_xsputn
 - 99.43% _IO_default_xsputn
- 90.09% vfprintf
 _IO_vsprintf
   - sprintf
  + 74.15% EncodeDateTime
  + 25.85% AppendSeconds.constprop.10
+ 9.72% _IO_padn
 + 0.57% vfprintf
  +   7.76%  postgres_stock  postgres_stock[.] CopyOneRowTo   

So nearly all the time is spent somewhere inside the sprintf calls. Not
nice.

The only thing I could come up to make the sprintfs cheaper was to
combine them into one and remove some of the width specifiers that
aren't needed. That doesn't buy us very much.

I then proceeded to replace the sprintf call with hand-rolled
conversions. And wow, the benefit is far bigger than I'd assumed:
postgres[7236][1]=# COPY tbl_timestamp TO '/dev/null';
Time: 2430.521 ms

So, by hand-rolling the ISO conversion in EncodeDateTime() we got a
~250% performance improvement. I'd say that's worthwhile.

The attached patch shows what I did. While there's some polishing
possible, as a whole, it's pretty ugly. But I think timestamp data is so
common that it's worth the effort.

Does anybody have a fundamentally nicer idea than the attached to
improvide this?

Greetings,

Andres Freund
From 1d7b6110f8864ee00c1fe4f54d8937347ade7d80 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 27 Jul 2015 23:09:33 +0200
Subject: [PATCH] WIP: faster timestamp[tz]_out

---
 src/backend/utils/adt/datetime.c| 108 
 src/test/regress/expected/horology.out  |  24 ---
 src/test/regress/expected/timestamp.out |  62 +++---
 src/test/regress/sql/timestamp.sql  |   1 +
 4 files changed, 164 insertions(+), 31 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 2a44b6e..4c13f01 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3968,7 +3968,115 @@ EncodeDateTime(struct pg_tm * tm, fsec_t fsec, bool print_tz, int tz, const char
 
 	switch (style)
 	{
+#ifdef HAVE_INT64_TIMESTAMP
+		case USE_ISO_DATES:
+			/*
+			 * Fastpath for most common format and range. Not using sprintf
+			 * provides significant performance benefits, and timestamp data
+			 * is pretty common. All sane use cases dealing with large amounts
+			 * of data use iso timestamps, so we can focus on optimizing
+			 * those, and keep the rest of the code maintainable.
+			 */
+			if (tm-tm_year  0  tm-tm_year  1)
+			{
+int year  = (tm-tm_year  0) ? tm-tm_year : -(tm-tm_year - 1);
+
+*str++ = (year / 1000) + '0';
+*str++ = (year / 100) % 10 + '0';
+*str++ = (year / 10) % 10 + '0';
+*str++ = year % 10 + '0';
+*str++ = '-';
+*str++ = (tm-tm_mon / 10) + '0';
+*str++ = tm-tm_mon % 10 + '0';
+*str++ = '-';
+*str++ = (tm-tm_mday / 10) + '0';
+*str++ = tm-tm_mday % 10 + '0';
+*str++ = ' ';
+*str++ = (tm-tm_hour / 10) + '0';
+*str++ = tm-tm_hour % 10 + '0';
+*str++ = ':';
+*str++ = (tm-tm_min / 10) + '0';
+*str++ = tm-tm_min % 10 + '0';
+*str++ = ':';
+*str++ = (tm-tm_sec / 10) + '0';
+*str++ = tm-tm_sec % 10 + '0';
+
+/*
+ * Yes, this is darned ugly and would look nicer in a loop,
+ * but some versions of gcc can't convert the divisions into
+ * more efficient instructions unless manually unrolled.
+ */
+if (fsec != 0)
+{
+	int fseca = abs(fsec);
+
+	*str++ = '.';
+
+	if (fseca % 100 != 0)
+	{
+		*str++ = (fseca / 10) + '0';
+
+		if (fseca % 10 != 0)
+		{
+			*str++ = ((fseca / 

Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Dean Rasheed
On 27 July 2015 at 21:58, Stephen Frost sfr...@snowman.net wrote:
 Dean,

 * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 On 27 July 2015 at 18:13, Joe Conway m...@joeconway.com wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  On 07/27/2015 10:03 AM, Joe Conway wrote:
  On 07/26/2015 07:59 AM, Joe Conway wrote:
  On 07/26/2015 07:19 AM, Dean Rasheed wrote:
  Attached is an updated patch (still needs some docs for the
  functions).
 
  Thanks for that. I'll add the docs.
 
  Documentation added. Also added comment to check_enable_rls about
  passing InvalidOid versus GetUserId().
 
  I believe this is ready to go -- any other comments?
 
  Strike that - now I really think it is ready to go :-)
 
  In this patch I additionally changed instances of:
check_enable_rls(indrelid, GetUserId(), true)
  to
check_enable_rls(indrelid, InvalidOid, true)
  per Dean's earlier remark and my new comment.

 Looks good to me, except I'm not sure about those latest changes
 because I don't understand the reasoning behind the logic in
 check_enable_rls() when row_security is set to OFF.

 I would expect that if the current user has permission to bypass RLS,
 and they have set row_security to OFF, then it should be off for all
 tables that they have access to, regardless of how they access those
 tables (directly or through a view). If it really is intentional that
 RLS remains active when querying through a view not owned by the
 table's owner, then the other calls to check_enable_rls() should
 probably be left as they were, since the table might have been updated
 through such a view and that code can't really tell at that point.

 Joe and I were discussing this earlier and it was certainly intentional
 that RLS still be enabled if you're querying through a view as the RLS
 rights of the view owner are used, not your own.  Note that we don't
 allow a user to assume the BYPASSRLS right of the view owner though,
 also intentionally.

 As a comparison to what we do today, even if you have access to a table,
 if you query it through a view, it's the view owner's permissions which
 are used to determine access to the table through the view, not your
 own.  I agree that can be a bit odd at times, as you can get a
 permission denied error when using the view even though you have access
 to the table which is complained about, but that's how views have worked
 for quite a long time.


OK, fair enough.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Optimization idea: merging multiple EXISTS(...) with constraint-based join removal

2015-07-27 Thread Frédéric TERRAZZONI
I've frequently encountered queries where the WHERE clause contains a lot
of EXISTS(...) subqueries linked by logical ANDs. They are typically
generated queries, and the EXISTS(...) fragments are composable pieces of
SQL code acting as filters. Moreover, all those filters ends up being very
similar: the joins used to navigate in the objects graph often involves the
same tables.

Unfortunately, PostgreSQL doesn't seem to be smart enough to simplify them.
In fact, according to my experiments, even dead-simple uncorrelated
subqueries such as SELECT * FROM t1 WHERE EXISTS(SELECT * FROM t2) AND
EXISTS(SELECT * FROM t2) are not simplified. (the execution plan is rather
naive: t2 is checked twice for non-emptiness).

I've not seen anything related to that topic on the TODO list or in the
mailing list. Consequently, I would like to propose a method which can be
used to merge multiple EXISTS(...) semi-joins linked by logical ANDs.

For the sake of this example, the database schema is:

create table t1(id int primary key, t2_id int, t6_id int)
create table t2(id int primary key, t3_id int)
create table t3(id int primary key, t4_id int, t5_id int)
create table t4(id int primary key, val text)
create table t5(id int primary key, val text)
create table t6(id int primary key, val text)

Here is a machine-generated query. Notice that the path t1-t2-t3 is
used twice.

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2, t3, t4
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t4.id = t3.t4_id
AND t4.val = 'XYZ'
) AND EXISTS(
SELECT 1 FROM t2, t3, t5
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t5.id = t3.t5_id
AND t5.val = 'Blablabla'
) AND EXISTS(
SELECT 1 FROM t6
WHERE t6.id = t1.t6_id
AND t6.val = 'Hello'
)

The following transformation can be applied:

SELECT * FROM [...]
WHERE
EXISTS(SELECT 1 FROM [...tables_1...] WHERE [...conditions_1...])
AND EXISTS(SELECT 1 FROM [...tables_2...] WHERE [...conditions_2...])

=

SELECT * FROM [...]
WHERE
EXISTS(
SELECT 1 FROM
(SELECT 1 FROM [...tables_1...] WHERE [...conditions_1...])
q1,
(SELECT 1 FROM [...tables_2...] WHERE [...conditions_2...])
q2
)

The resulting query is:

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5,
t6
WHERE t2_a.id = t1.t2_id
AND t3_a.id = t2_a.t3_id
AND t4_a.id = t3_a.t4_id
AND t4_a.val = 'XYZ'
AND t2_b.id = t1.t2_id
AND t3_b.id = t2_b.t3_id
AND t5.id = t3_b.t5_id
AND t5.val = 'Blablabla'
AND t6.id = t1.t6_id
AND t6.val = 'Hello'
)

Given the constraints, it can be shown that t2_a = t2_b AND t3_a = t3_b :

* t2_a.id = t1.t2_id AND t2_b.id = t1.t2_id = t2_a.id = t2_b.id
* t2_a.id = t2_b.id AND t2(id) is a pkey = t2_a = t2_b
* t2_a = t2_b = t2_a.t3_id = t2_b.t3_id
* t2_a.t3_id = t2_b.t3_id AND t3_b.id = t2_b.t3_id AND t3_a.id =
t2_a.t3_id = t3_b.id = t3_a.id
* t3_b.id = t3_a.id AND t3(id) is a pkey = t3_a = t3_b

It is then safe to remove the joins t2_b and t3_b in the subquery,
because they have been proved to be duplicates of t2_a and t3_a.

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2, t3, t4, t2, t3, t5, t6
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t4.id = t3.t4_id
AND t5.id = t3.t5_id
AND t4.val = 'XYZ'
AND t5.val = 'Blablabla'
AND t6.id = t1.t6_id
AND t6.val = 'Hello'
)

Using a connected component analysis on join predicates, joined tables can
be separated and we get back our original query with the first two
EXISTS(...) semi-joins merged.

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2, t3, t4, t2, t3, t5
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t4.id = t3.t4_id
AND t5.id = t3.t5_id
AND t4.val = 'XYZ'
AND t5.val = 'Blablabla'
) AND EXISTS(
SELECT 1 FROM t6
WHERE t6.id = t1.t6_id
AND t6.val = 'Hello'
)

My questions are:
- Does PostgreSQL already supports this optimization ? Maybe it's not
enabled in my case only?
- If yes, is my reasoning incorrect ? Can you point me my mistake ?
- Otherwise is there any plan to add this optimization to PostgreSQL ?

Thank you !

Frédéric Terrazzoni


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Andres Freund
On 2015-07-27 17:31:41 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  So nearly all the time is spent somewhere inside the sprintf calls. Not
  nice.
 
 What happens if you force use of port/snprintf.c instead of glibc's
 version?

Good question.

Even worse. 15900.014 ms.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 01:25 PM, Dean Rasheed wrote:
 Looks good to me, except I'm not sure about those latest changes 
 because I don't understand the reasoning behind the logic in 
 check_enable_rls() when row_security is set to OFF.
 
 I would expect that if the current user has permission to bypass
 RLS, and they have set row_security to OFF, then it should be off
 for all tables that they have access to, regardless of how they
 access those tables (directly or through a view). If it really is
 intentional that RLS remains active when querying through a view
 not owned by the table's owner, then the other calls to
 check_enable_rls() should probably be left as they were, since the
 table might have been updated through such a view and that code
 can't really tell at that point.

After looking again at those three call sites, I'm now on the fence.
All three are in functions which are trying to avoid leaking info in
error messages. If we leave the original coding, then the messages
will remain the same for a given user regardless of the row_security
setting, whereas if we change them as in my latest patch, the messages
will be different depending on row_security for users with BYPASSRLS.

I think if we do the former (original coding) there should probably be
a comment added to indicate we are doing that intentionally.

Thoughts?

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVtqRuAAoJEDfy90M199hlIQEQAIli3fJHPbpBBPocIjzo04EH
78YTiRYlb58ZK9l+VKj+sA9JbOdUVEes168hJjHSdnw6HcjJnKY+J3+aKjcRgXu2
s197hMOiP2+nqj0r0+KX/W8cuHT/4x5NtQ46BR9UoAPGdW9139CSbf3nQ9gTIllR
PQs7TRJIsJRWhuZhX5eCvQqix+RUYY2PKPMNdo8OLQpZxlsA7ezWr5QuDBx0PYFd
WTkaOsRxpAtfPBQrt+0xnX8oKi1pF4QLGt0Nb0J5XQmxUbKUdQsYY7+iu7Utrmf2
i5TORkX5HIpyH1N6R5Zu9wyRiOpLQv8SyH0kWovDA2neMlrApkM8kYfTYZAPIQUE
H6fOs6bafMMP4vWH7CwDhOwasccoLwdkEg80wiGnn5Nu+K4nq4k6Dq05oq+G7ZL1
6vZCXR0zts1TuX4abwtAcURaNbw+y7D/1XN9Dn0kDwuV3cXRh2tc33r/0SJ9XQFj
+gEdqptm3gyIgBExGlZDwNtpHwHEs2xqFjIBChyDV3cMjvFeTlYgiFiJlm40Ac/3
zelJ6hpsttAHWBg+42MoogrV7wKdCLOH/npwRx0zw5hH3meMs3ydQibtUwb/z+yl
6zl7xDMrTDOg/gV+gXbruVzSQIgNmfDEXmcHDKRr2IQwcNXXkTzEiIxJBRB7FhJg
GgXBUGlGDKRGXF8Koauy
=jCXT
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:53 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I am trying to run parallel execution

 psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3
 psql -c select current_database()

Put this in a shell script called run-psql:

#!/bin/bash

test $# = 0  exit
for f in ${@:1:$(($#-1))}; do
echo $f \;
done | psql ${@:$#}

Then:

psql -At -c select datname from pg_database postgres | xargs -n 1 -P
3 ./run-psql select current_database() vacuum select 1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Simon Riggs
On 22 July 2015 at 17:11, Jeff Janes jeff.ja...@gmail.com wrote:

 On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  Attached is a patch that implements the vm scan for truncation.  It
  introduces a variable to hold the last blkno which was skipped during
 the
  forward portion.  Any blocks after both this blkno and after the last
  inspected nonempty page (which the code is already tracking) must have
 been
  observed to be empty by the current vacuum.  Any other process
 rendering the
  page nonempty are required to clear the vm bit, and no other process
 can set
  the bit again during the vacuum's lifetime.  So if the bit is still
 set, the
  page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.


 I wouldn't say forever, as it would be easy to revert the change if
 something more important came along that conflicted with it.


I think what is being said here is that someone is already using this
technique, or if not, then we actively want to encourage them to do so as
an extension or as a submission to core.

In that case, I think the rely-on-VM technique sinks again, sorry Jim,
Jeff. Probably needs code comments added.

That does still leave the prefetch technique, so all is not lost.

Can we see a patch with just prefetch, probably with a simple choice of
stride? Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Joel Jacobson
On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus j...@agliodbs.com wrote:
  Batch Jobs: large data-manipulation tasks which need to be broken up
  into segments, with each segment committing separately.  Example:
  updating 1 million records in batches of 1000.

 Autonomous transactions are not a good fit for this case; stored
 procedures are a better way to go for any scenario where you don't
 want be be in a snapshot (for example, suppose you want to change
 isolation level on the fly).


Hm, you mean we need real stored procedures in PostgreSQL and not just
functions?

If not, I think it would be sufficient to add Autonomous Transaction
support to the type of functions we already have in pg to allow writing a
batch job function which would commit after X numbers of modified rows,
instead of having to write a script in an external language such as Perl to
call the function in a while-loop and commit in between each function call.

However, we should also add a way for the caller to protect against an
Autonomous Transaction in a function called by the caller. Imagine if
you're the author of function X() and within X() make use of some other
function Y() which has been written by some other author, and within your
function X(), it's very important either all of your work or none at all
gets committed, then you need to make sure none of the changes you made
before calling Y() gets committed, and thus we need a way to prevent Y()
from starting and committing an Autonomous Transaction, otherwise we would
increase the risk and complexity of working with functions and plpgsql in
PostgreSQL as you would then need to be sure none of the functions you are
using within a function will start and commit an ATX.


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Dean Rasheed
On 27 July 2015 at 18:13, Joe Conway m...@joeconway.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 07/27/2015 10:03 AM, Joe Conway wrote:
 On 07/26/2015 07:59 AM, Joe Conway wrote:
 On 07/26/2015 07:19 AM, Dean Rasheed wrote:
 Attached is an updated patch (still needs some docs for the
 functions).

 Thanks for that. I'll add the docs.

 Documentation added. Also added comment to check_enable_rls about
 passing InvalidOid versus GetUserId().

 I believe this is ready to go -- any other comments?

 Strike that - now I really think it is ready to go :-)

 In this patch I additionally changed instances of:
   check_enable_rls(indrelid, GetUserId(), true)
 to
   check_enable_rls(indrelid, InvalidOid, true)
 per Dean's earlier remark and my new comment.


Looks good to me, except I'm not sure about those latest changes
because I don't understand the reasoning behind the logic in
check_enable_rls() when row_security is set to OFF.

I would expect that if the current user has permission to bypass RLS,
and they have set row_security to OFF, then it should be off for all
tables that they have access to, regardless of how they access those
tables (directly or through a view). If it really is intentional that
RLS remains active when querying through a view not owned by the
table's owner, then the other calls to check_enable_rls() should
probably be left as they were, since the table might have been updated
through such a view and that code can't really tell at that point.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 On 27 July 2015 at 18:13, Joe Conway m...@joeconway.com wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  On 07/27/2015 10:03 AM, Joe Conway wrote:
  On 07/26/2015 07:59 AM, Joe Conway wrote:
  On 07/26/2015 07:19 AM, Dean Rasheed wrote:
  Attached is an updated patch (still needs some docs for the
  functions).
 
  Thanks for that. I'll add the docs.
 
  Documentation added. Also added comment to check_enable_rls about
  passing InvalidOid versus GetUserId().
 
  I believe this is ready to go -- any other comments?
 
  Strike that - now I really think it is ready to go :-)
 
  In this patch I additionally changed instances of:
check_enable_rls(indrelid, GetUserId(), true)
  to
check_enable_rls(indrelid, InvalidOid, true)
  per Dean's earlier remark and my new comment.
 
 Looks good to me, except I'm not sure about those latest changes
 because I don't understand the reasoning behind the logic in
 check_enable_rls() when row_security is set to OFF.
 
 I would expect that if the current user has permission to bypass RLS,
 and they have set row_security to OFF, then it should be off for all
 tables that they have access to, regardless of how they access those
 tables (directly or through a view). If it really is intentional that
 RLS remains active when querying through a view not owned by the
 table's owner, then the other calls to check_enable_rls() should
 probably be left as they were, since the table might have been updated
 through such a view and that code can't really tell at that point.

Joe and I were discussing this earlier and it was certainly intentional
that RLS still be enabled if you're querying through a view as the RLS
rights of the view owner are used, not your own.  Note that we don't
allow a user to assume the BYPASSRLS right of the view owner though,
also intentionally.

As a comparison to what we do today, even if you have access to a table,
if you query it through a view, it's the view owner's permissions which
are used to determine access to the table through the view, not your
own.  I agree that can be a bit odd at times, as you can get a
permission denied error when using the view even though you have access
to the table which is complained about, but that's how views have worked
for quite a long time.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Jim Nasby

On 7/27/15 1:46 PM, Robert Haas wrote:

On Mon, Jul 27, 2015 at 2:43 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Robert Haas wrote:

On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:



I think this is already possible, is it not?  You just have to look for
an identically-identified pg_locks entry with granted=true.  That gives
you a PID and vxid/xid.  You can self-join pg_locks with that, and join
to pg_stat_activity.

I remember we discussed having a layer of system views on top of
pg_stat_activity and pg_locks, probably defined recursively, that would
show the full graph of waiters/lockers.


It isn't necessarily the case that A is waiting for a unique process
B.  It could well be the case that A wants AccessExclusiveLock and
many processes hold a variety of other lock types.


Sure, but I don't think this makes it impossible to figure out who's
locking who.  I think the only thing you need other than the data in
pg_locks is the conflicts table, which is well documented.

Oh, hmm, one thing missing is the ordering of the wait queue for each
locked object.  If process A holds RowExclusive on some object, process
B wants ShareLock (stalled waiting) and process C wants AccessExclusive
(also stalled waiting), who of B and C is woken up first after A
releases the lock depends on order of arrival.


Agreed - it would be nice to expose that somehow.


+1. It's very common to want to know who's blocking who, and not at all 
easy to do that today. We should at minimum have a canonical example of 
how to do it, but something built in would be even better.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Josh Berkus
On 07/27/2015 02:41 PM, Joel Jacobson wrote:
 However, we should also add a way for the caller to protect against an
 Autonomous Transaction in a function called by the caller. Imagine if
 you're the author of function X() and within X() make use of some other
 function Y() which has been written by some other author, and within
 your function X(), it's very important either all of your work or none
 at all gets committed, then you need to make sure none of the changes
 you made before calling Y() gets committed, and thus we need a way to
 prevent Y() from starting and committing an Autonomous Transaction,
 otherwise we would increase the risk and complexity of working with
 functions and plpgsql in PostgreSQL as you would then need to be sure
 none of the functions you are using within a function will start and
 commit an ATX.

Ah, you're missing how commits in ATX are expected to work.  Let me
illustrate:

X (
   Data write A1
   call Y(
Start ATX
Data write B1
Commit ATX
   )
   Data write A2
   Exception
)

In this workflow, B1 would be committed and persistent. Neither A1 nor
A2 would be committed, or visible to other users.  Depending on what
implementation we end up with, A1 might not even be visible to Y().

So that solves your use case without any need to block ATXs in called
functions.  However, it leads to some interesting cases involving
self-deadlocks; see the original post on this thread.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] copy.c handling for RLS is insecure

2015-07-27 Thread Stephen Frost
All,

* Stephen Frost (sfr...@snowman.net) wrote:
 * Andres Freund (and...@anarazel.de) wrote:
  On 2015-07-09 01:28:28 -0400, Noah Misch wrote:
- Keep the OID check, shouldn't hurt to have it
   
   What benefit is left?
  
  A bit of defense in depth. We execute user defined code in COPY
  (e.g. BEFORE triggers). That user defined code could very well replace
  the relation. Now I think right now that'd happen late enough, so the
  second lookup already happened. But a bit more robust defense against
  that sounds good to me.
 
 Attached patch keeps the relation locked, fully qualifies it when
 building up the query, and uses list_member_oid() to check that the
 relation's OID ends up in the resulting relationOids list (to address
 Noah's point that the planner doesn't guarantee the ordering; I doubt
 that list will ever be more than a few entries long).
 
 Also removes the misguided Assert().
 
 Barring objections, I'll commit this (and backpatch to 9.5) tomorrow.

Apologies for not pushing this before I left on vacation.  I've done so
now.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Jim Nasby

On 7/27/15 2:57 PM, Andrew Dunstan wrote:


psql -At -c select datname from pg_database postgres | xargs -n 1 -P
3 psql -c select current_database()





I don't think it's going to be a hugely important feature, but I don't
see a problem with creating a new option (-C seems fine) which would
have the same effect as if the arguments were contatenated into a file
which is then used with -f. IIRC -c has some special characteristics
which means it's probably best not to try to extend it for this feature.


+1. I've occasionally wanted this as well.

I'd also vote for -C returning every state string as well.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 So nearly all the time is spent somewhere inside the sprintf calls. Not
 nice.

What happens if you force use of port/snprintf.c instead of glibc's
version?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Josh Berkus
On 07/27/2015 02:47 AM, Rajeev rastogi wrote:
 Why have any fixed maximum?
 Since we are planning to have nested autonomous transaction, so it is 
 required to have limit on this so that resources can be controlled.

Is there a particular reason why this limit wouldn't just be
max_stack_depth?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 2015-07-27 17:31:41 -0400, Tom Lane wrote:
 What happens if you force use of port/snprintf.c instead of glibc's
 version?

 Even worse. 15900.014 ms.

Interesting.  So as a separate optimization problem, we might consider
try to put snprintf.c at least on a par with glibc.  I'm kind of
surprised by this result really, since snprintf.c lacks a lot of the
bells and whistles that are in glibc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics / patch v7

2015-07-27 Thread Tomas Vondra

Hello Horiguchi-san,

On 07/27/2015 09:04 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Sat, 25 Jul 2015 23:09:31 +0200, Tomas Vondra tomas.von...@2ndquadrant.com wrote 
in 55b3fb0b.7000...@2ndquadrant.com

Hi,

On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote:

Hi, I'd like to show you the modified constitution of
multivariate statistics application logic. Please find the
attached. They apply on your v7 patch.


Sadly I do have some trouble getting it to apply correctly :-(
So for now all my comments are based on just reading the code.


Ah. My modification to rebase to the master for the time should
be culprit. Sorry for the dirty patch.

# I would recreate the patch if you complained before struggling
# with the thing..

The core of the modification is on closesel.c. I attached the
patched closesel.c.


I don't see any attachment. Perhaps you forgot to actually attach it?



My concern about the code at the time was following,

- You embedded the logic of multivariate estimation into
   clauselist_selectivity. I think estimate using multivariate
   statistics is quite different from the ordinary estimate based
   on single column stats then they are logically separatable and
   we should do so.


I don't see them as very different, actually quite the opposite. The two 
kinds of statistics are complementary and should naturally coexist. 
Perhaps the current code is not perfect and a refactoring would make the 
code more readable, but I don't think it's primary aim should be to 
separate regular and multivariate stats.




- You are taking top-down approach and it runs tree-walking to
   check appliability of mv-stats for every stepping down in
   clause tree. If the subtree found to be mv-applicable, split it
   to two parts - mv-compatible and non-compatible. These steps
   requires expression tree walking, which looks using too-much
   CPU.


I'm taking top-down approach because that's what the regular stats do, 
and also because that's what allows implementing the features that I 
think are interesting - ability to combine multiple stats in an 
efficient way, pass conditions and such. I think those two features are 
very useful and allow very interesting things.


The bottom-up would work too, probably - I mean, we could start from 
leaves of the expression tree, and build the largest subtree 
compatible with multivariate stats and then try to estimate it. I don't 
see how we could pass conditions though, which works naturally in the 
top-down approach.


Or maybe a combination of both - identify the compatible subtrees 
first, then perform the top-down phase.



- You look to be considering the cases when users create many
   multivariate statistics on attribute sets having
   duplications. But it looks too-much for me. MV-stats are more
   resource-eating so we can assume the minimum usage of that.


Not really. I don't expect huge numbers of multivariate stats to be 
built on the tables.


But I think restricting the users to use a single multivariate 
statistics per table would be a significant limitation. And once you 
allow using multiple multivariate statistics for the set of clauses, 
supporting over-lapping stats is not that difficult.


What it however makes possible is combining multiple small stats into 
a larger one in a very efficient way - it assumes the overlap is 
sufficient, of course. But if that's true you may build multiple small 
(and very accurate) stats instead of one huge (or very inaccurate) 
statistics.


This also makes it possible to handle complex combinations of clauses 
that are compatible and incompatible with multivariate statistics, by 
passing the conditions.




My suggestion in the patch is a bottom-up approach to find
mv-applicable portion(s) in the expression tree, which is the
basic way of planner overall. The approach requires no repetitive
run of tree walker, that is, pull_varnos. It could fail to find
the 'optimal' solution for complex situations but needs far less
calculation for almost the same return (I think..).

Even though it doesn't consider the functional dependency, the
reduce of the code shows the efficiency. It does not nothing
tricky.


OK


On a conceptual level, I think the idea to split the estimation into
two phases - enrich the expression tree with nodes with details about
stats etc, and then actually do the estimation in the second phase
might be interesting. Not because it's somehow clearer, but because it
gives us a chance to see the expression tree as a whole, with details
about all the stats (with the current code we process/estimate the
tree incrementally). But I don't really know how useful that would be.


It is difficult to say which approach is better sinch it is
affected by what we think important than other things. However I
concern about that your code substantially reconstructs the
expression (clause) tree midst of processing it. I believe it
should be a separate phase for simplicity. Of course additional
required resource is also 

Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Merlin Moncure
On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson j...@trustly.com wrote:
 On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus j...@agliodbs.com wrote:
  Batch Jobs: large data-manipulation tasks which need to be broken up
  into segments, with each segment committing separately.  Example:
  updating 1 million records in batches of 1000.

 Autonomous transactions are not a good fit for this case; stored
 procedures are a better way to go for any scenario where you don't
 want be be in a snapshot (for example, suppose you want to change
 isolation level on the fly).


 Hm, you mean we need real stored procedures in PostgreSQL and not just
 functions?

Yes, exactly.

Autonomous transactions aren't really set up for cases where the
function runs for a very long time or indefinitely.  This is the
'advancing xmin' problem as Josh puts it but I think the problem is
much bigger than that.  Anyways, this is mostly irrelevant to
autonomous transactions as long as the design isn't extended to try
and cover that case.

Is the Autonomous Transaction feature only going to be exposed through pl/pgsql?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Dean Rasheed
On 27 July 2015 at 22:36, Joe Conway m...@joeconway.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 07/27/2015 01:25 PM, Dean Rasheed wrote:
 Looks good to me, except I'm not sure about those latest changes
 because I don't understand the reasoning behind the logic in
 check_enable_rls() when row_security is set to OFF.

 I would expect that if the current user has permission to bypass
 RLS, and they have set row_security to OFF, then it should be off
 for all tables that they have access to, regardless of how they
 access those tables (directly or through a view). If it really is
 intentional that RLS remains active when querying through a view
 not owned by the table's owner, then the other calls to
 check_enable_rls() should probably be left as they were, since the
 table might have been updated through such a view and that code
 can't really tell at that point.

 After looking again at those three call sites, I'm now on the fence.
 All three are in functions which are trying to avoid leaking info in
 error messages. If we leave the original coding, then the messages
 will remain the same for a given user regardless of the row_security
 setting, whereas if we change them as in my latest patch, the messages
 will be different depending on row_security for users with BYPASSRLS.

 I think if we do the former (original coding) there should probably be
 a comment added to indicate we are doing that intentionally.

 Thoughts?


I could go either way on that, but I don't think it's too serious to
worry about leaking information to users with BYPASSRLS.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization idea: merging multiple EXISTS(...) with constraint-based join removal

2015-07-27 Thread David Rowley
On 28 July 2015 at 09:37, Frédéric TERRAZZONI frederic.terrazz...@gmail.com
 wrote:


 SELECT * FROM t1
 WHERE EXISTS(
 SELECT 1 FROM t2, t3, t4
 WHERE t2.id = t1.t2_id
 AND t3.id = t2.t3_id
 AND t4.id = t3.t4_id
 AND t4.val = 'XYZ'
 ) AND EXISTS(
 SELECT 1 FROM t2, t3, t5
 WHERE t2.id = t1.t2_id
 AND t3.id = t2.t3_id
 AND t5.id = t3.t5_id
 AND t5.val = 'Blablabla'
 ) AND EXISTS(
 SELECT 1 FROM t6
 WHERE t6.id = t1.t6_id
 AND t6.val = 'Hello'
 )

 ...



 The resulting query is:

 SELECT * FROM t1
 WHERE EXISTS(
 SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5,
 t6
 WHERE t2_a.id = t1.t2_id
 AND t3_a.id = t2_a.t3_id
 AND t4_a.id = t3_a.t4_id
 AND t4_a.val = 'XYZ'
 AND t2_b.id = t1.t2_id
 AND t3_b.id = t2_b.t3_id
 AND t5.id = t3_b.t5_id
 AND t5.val = 'Blablabla'
 AND t6.id = t1.t6_id
 AND t6.val = 'Hello'
 )

 My questions are:
 - Does PostgreSQL already supports this optimization ? Maybe it's not
 enabled in my case only?


No, there's nothing which supports that currently.


 - If yes, is my reasoning incorrect ? Can you point me my mistake ?


It sounds reasonable to me.


 - Otherwise is there any plan to add this optimization to PostgreSQL ?


I did propose the idea here
http://www.postgresql.org/message-id/CAApHDvopmWq4i2BCf0VqU4mYmxzHCYwfnUMat9TWuKzdvo7=8...@mail.gmail.com
but I didn't focus just with semi-joins. Without re-reading, I think I was
talking about any join that could be proved to not duplicate rows could be
consolidated.

I do believe that this optimisation would be useful in more cases than most
people might think, for example:

UPDATE t1 SET col1 = (SELECT col1 FROM t2 WHERE t1.id=t2.id), col2 =
(SELECT col2 FROM t2 WHERE t1.id=t2.id), ...;

Of course, this query could have been written using UPDATE/FROM,
(non-standard), or UPDATE t1 SET (col1,col2) = (SELECT ...), which was only
added recently.

There's also the case of the view which just has 1 column missing, so the
consumer joins a table that's already been joined to in the view.

I think it would be quite nice to have this, and I don't think it would be
all that expensive for the planner to detect this.

I think the planner would have to do something like:

1. Scan simple_rte_array looking for relids which are the same as another
entry's.
2. If found, is the join condition the same as the other one?
3. Is there a unique index to prove that joining to this does not duplicate
rows, or is it a semi-join?
4. Remove relation and replace all Vars from the removed relation with the
one from the other table, mark relation as REL_DEAD.

I think 1 is quite cheap to perform, so normal queries wouldn't suffer much
of a slow-down from these extra checks, as most queries won't have self
joins.

Are you thinking of working on it?

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Jim Nasby

On 7/27/15 5:56 PM, Josh Berkus wrote:

Can you explain what use case you have where simply telling the staff
if you use ATX without clearing it, you'll be fired is not sufficient?
  Possibly there's something we failed to account for in the unconference
discussion.


That there's no way to enforce that, short of hand-auditing code? 
There's already enough things that are difficult/impossible to enforce, 
I'd rather not add another one.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Joel Jacobson
On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus j...@agliodbs.com wrote:

 Ah, you're missing how commits in ATX are expected to work.  Let me
 illustrate:

 X (
Data write A1
call Y(
 Start ATX
 Data write B1
 Commit ATX
)
Data write A2
Exception
 )

 In this workflow, B1 would be committed and persistent. Neither A1 nor
 A2 would be committed, or visible to other users.  Depending on what
 implementation we end up with, A1 might not even be visible to Y().

 So that solves your use case without any need to block ATXs in called
 functions.  However, it leads to some interesting cases involving
 self-deadlocks; see the original post on this thread.


I don't follow. In your example above, if I'm X(), how do I ensure Y()
won't have committed anyting at all when I later at Exception decide to
rollback everything from Data write A1 to Data write A2 including any
writes made by Y() (in the example Data write B1)?

I understand the Exception will take care of rollbacking my (X's) writes,
but that's not sufficient if you want to make sure you rollback
*everything*, including any writes made by functions you call.

Right now, when writing a function, if you raise an exception, you can be
sure all writes you have made will be rollbacked, but your caller function
might caught the exception and decide to carry on and commit work made
before your function was called, but at least you can be confident your
writes won't be committed as long as you don't caught the exception you
raised in your own function. If I understand it correctly, that would
change with the addition of Autonomous Transaction, unless given a way to
prevent a function you call from starting and commiting a Autonomous
Transaction. Wrong? If so, then please show how to prevent Y() from
commiting the Data write B1 in your example, I don't get it.


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Stephen Frost
On Monday, July 27, 2015, Dean Rasheed dean.a.rash...@gmail.com wrote:

 On 27 July 2015 at 22:36, Joe Conway m...@joeconway.com javascript:;
 wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  On 07/27/2015 01:25 PM, Dean Rasheed wrote:
  Looks good to me, except I'm not sure about those latest changes
  because I don't understand the reasoning behind the logic in
  check_enable_rls() when row_security is set to OFF.
 
  I would expect that if the current user has permission to bypass
  RLS, and they have set row_security to OFF, then it should be off
  for all tables that they have access to, regardless of how they
  access those tables (directly or through a view). If it really is
  intentional that RLS remains active when querying through a view
  not owned by the table's owner, then the other calls to
  check_enable_rls() should probably be left as they were, since the
  table might have been updated through such a view and that code
  can't really tell at that point.
 
  After looking again at those three call sites, I'm now on the fence.
  All three are in functions which are trying to avoid leaking info in
  error messages. If we leave the original coding, then the messages
  will remain the same for a given user regardless of the row_security
  setting, whereas if we change them as in my latest patch, the messages
  will be different depending on row_security for users with BYPASSRLS.
 
  I think if we do the former (original coding) there should probably be
  a comment added to indicate we are doing that intentionally.
 
  Thoughts?
 

 I could go either way on that, but I don't think it's too serious to
 worry about leaking information to users with BYPASSRLS.


AFK at the moment, but my thinking was that we should avoid having the
error message change based on what a GUC is set to. I agree that there
should be comments which explain that.

Thanks!

Stephen


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Josh Berkus
On 07/27/2015 03:12 PM, Joel Jacobson wrote:
 On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus j...@agliodbs.com
 mailto:j...@agliodbs.com wrote:
 
 Ah, you're missing how commits in ATX are expected to work.  Let me
 illustrate:
 
 X (
Data write A1
call Y(
 Start ATX
 Data write B1
 Commit ATX
)
Data write A2
Exception
 )
 
 In this workflow, B1 would be committed and persistent. Neither A1 nor
 A2 would be committed, or visible to other users.  Depending on what
 implementation we end up with, A1 might not even be visible to Y().
 
 So that solves your use case without any need to block ATXs in called
 functions.  However, it leads to some interesting cases involving
 self-deadlocks; see the original post on this thread.
 
 
 I don't follow. In your example above, if I'm X(), how do I ensure Y()
 won't have committed anyting at all when I later at Exception decide
 to rollback everything from Data write A1 to Data write A2 including
 any writes made by Y() (in the example Data write B1)?

Ah, ok.  The goal of the project is that the writer of X() *cannot*
prevent Y() from writing its data (B1) and committing it.

One of the primary use cases for ATX is audit triggers.  If a function
writer could override ATX and prevent the audit triggers from
committing, then that use case would be violated.

Can you explain what use case you have where simply telling the staff
if you use ATX without clearing it, you'll be fired is not sufficient?
 Possibly there's something we failed to account for in the unconference
discussion.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Jim Nasby

On 7/27/15 10:39 AM, Robert Haas wrote:

But that's not enough: we also need to know that any tuple that
survived the prune operation (that is, it wasn't redirected or marked
unused) has a new-enough xmin, which isn't tracked anywhere.


Wouldn't that be old-enough xmin?

heap_prune_chain() is already calling HeapTupleSatisfiesVacuum, so it 
should be able to figure out if the page is all visible without a lot of 
extra work, and pass that info back to heap_page_prune (which would then 
pass it down to _execute()).


Definitely not a one-liner though.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread David Rowley
On 28 July 2015 at 09:17, Andres Freund and...@anarazel.de wrote:

 Hi,

 I recently once more noticed that timestamptz_out is really, really
 slow. To quantify that, I created a bunch of similar sized tables:

 CREATE TABLE tbl_timestamp AS SELECT NOW() FROM generate_series(1, 10)
 a, generate_series(1, 100) b;
 CREATE TABLE tbl_int8 AS SELECT 1::bigint FROM generate_series(1,  10)
 a, generate_series(1, 100) b;
 CREATE TABLE tbl_bytea AS SELECT '   '::bytea FROM generate_series(1,
 10) a, generate_series(1, 100) b;

 These all end up being 346MB large.

 COPY tbl_bytea TO '/dev/null';
 Time: 1173.484 ms
 COPY tbl_int8 TO '/dev/null';
 Time: 1030.756 ms
 COPY tbl_timestamp TO '/dev/null';
 Time: 6598.030

 (all best of three)

 Yes, timestamp outputs more data as a whole, but being 5 times as slow
 is still pretty darn insane. To make sure that's not the cause, here's
 another table:

 CREATE TABLE tbl_timestamptext AS SELECT NOW()::text FROM
 generate_series(1, 10) a, generate_series(1, 100) b;
 COPY tbl_timestamptext TO '/dev/null';
 Time: 1449.554 ms

 So it's really just the timestamp code.


 Profiling it shows:
   Overhead  Command Shared Object Symbol
   -   38.33%  postgres_stock  libc-2.19.so  [.] vfprintf
  - 97.92% vfprintf
   _IO_vsprintf
 - sprintf
+ 70.25% EncodeDateTime
+ 29.75% AppendSeconds.constprop.10
  + 1.11% _IO_default_xsputn
   -8.22%  postgres_stock  libc-2.19.so  [.] _IO_default_xsputn
  - 99.43% _IO_default_xsputn
 - 90.09% vfprintf
  _IO_vsprintf
- sprintf
   + 74.15% EncodeDateTime
   + 25.85% AppendSeconds.constprop.10
 + 9.72% _IO_padn
  + 0.57% vfprintf
   +   7.76%  postgres_stock  postgres_stock[.] CopyOneRowTo

 So nearly all the time is spent somewhere inside the sprintf calls. Not
 nice.

 The only thing I could come up to make the sprintfs cheaper was to
 combine them into one and remove some of the width specifiers that
 aren't needed. That doesn't buy us very much.

 I then proceeded to replace the sprintf call with hand-rolled
 conversions. And wow, the benefit is far bigger than I'd assumed:
 postgres[7236][1]=# COPY tbl_timestamp TO '/dev/null';
 Time: 2430.521 ms

 So, by hand-rolling the ISO conversion in EncodeDateTime() we got a
 ~250% performance improvement. I'd say that's worthwhile.

 The attached patch shows what I did. While there's some polishing
 possible, as a whole, it's pretty ugly. But I think timestamp data is so
 common that it's worth the effort.

 Does anybody have a fundamentally nicer idea than the attached to
 improvide this?


It won't be quite as fast as what you've written, but I think it will be
much neater and more likely to be used in other places if we invent a
function like pg_ltoa() which returns a pointer to the new end of string.

Also if we're specifying padding with zeros then we can skip the reverse
part that's in pg_ltoa(), (normally needed since the numeric string is
build in reverse)

The code could then be written as:

str = pg_int2str_pad(str, year, 4);
*str++ = '-';
str = pg_int2str_pad(str, tm-tm_mon, 2);
*str++ = '-';
str = pg_int2str_pad(str, tm-tm_mday, 2);

etc

I've used this method before and found it to be about 10 times faster than
snprintf(), but I was reversing the string, so quite likely it be more than
10 times.

I'm interested to see how much you're really gaining by manually unrolling
the part that builds the fractional part of the second.

We could just build that part with: (untested)

if (fsec != 0)
{
int fseca = abs(fsec);
while (fseca % 10 == 0  fseca  0)
fseca /= 10;
 *str++ = '.';
str = pg_int2str(str, fseca);
}

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Jim Nasby

On 7/27/15 5:12 PM, Joel Jacobson wrote:

Right now, when writing a function, if you raise an exception, you can
be sure all writes you have made will be rollbacked, but your caller
function might caught the exception and decide to carry on and commit
work made before your function was called, but at least you can be
confident your writes won't be committed as long as you don't caught the
exception you raised in your own function. If I understand it correctly,
that would change with the addition of Autonomous Transaction, unless
given a way to prevent a function you call from starting and commiting
a Autonomous Transaction. Wrong? If so, then please show how to prevent
Y() from commiting the Data write B1 in your example, I don't get it.


What's being described here doesn't make sense in either use case ([1]  
[2]), but I do understand the concern about what 3rd party software is 
doing. It would be nice to have the ability to disallow and/or disable 
autonomous transactions, but I don't see a practical way of doing that 
other than introducing a new GUC. I'm not sure if it's worth that effort.


[1] the batch process use case: batches that still hold their own 
transaction open don't gain anything.


[2] the audit logging case. If you didn't care about auditing 
surviving regardless of a rollback then you wouldn't go to the extra 
work of an autonomous transaction to begin with.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner debug views

2015-07-27 Thread Qingqing Zhou
On Thu, Jul 23, 2015 at 4:11 PM, Tatsuo Ishii is...@postgresql.org wrote:
 Sounds like a great feature!


Thanks!

Attached is a draft patch implementing the idea. To play with it, you
shall create the follow two foreign tables:
CREATE EXTENSION file_fdw;
CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;
create foreign table pg_planner_rels(rel text, content text)server
pglog options(filename 'your_install/data/debug_planner_relopt.csv',
format 'csv');
create foreign table pg_planner_paths(rel text, path text, replacedby
text, reason int, startupcost float, totalcost float, cheapest text,
innerp text, outerp text, content text) server pglog options(filename
'your_install/data/debug_planner_paths.csv', format 'csv');
Example output attached.

Questions:
1. Which document shall we update? This is more than existing
debug_print_ knobs.
2. GEQO is not supported yet. I would suggest we do that with a
separate check in.
3. Where do we want to put the csv files? Currently I just put them under /data.
4. Do we want to push these two foreign tables into system_view.sql?
One problem is that foreign table needs a absolute path. Any way to
handle this?
5. As the output is csv file: I wrap strings with '' but not sure
within the string itself if there any. Do we have any guarantee here?

Thanks,
Qingqing

---

postgres=# select p.rel, p.path, p.replacedby, p.reason,
p.startupcost, p.totalcost, p.cheapest, p.innerp, p.outerp,
substr(p.content, 1,30),r.content from pg_planner_paths p join
pg_planner_rels r on p.rel=r.rel;
rel|   path| replacedby | reason | startupcost | totalcost
|   cheapest   |  innerp   |  outerp   | substr
 |content
---+---+++-+---+--+---+---++
 0x2791a10 | 0x279d4b0 |||   0 |  40.1
| +total+startup+param |   |   | ForeignScan(1)
rows=301 cost=0 | RELOPTINFO (1): rows=301 width=244
 0x279f998 | 0x27a2238 |||   0 |   1.1
| +total+startup+param |   |   | ForeignScan(1) rows=1
cost=0.0 | RELOPTINFO (1): rows=1 width=244
 0x279fbd0 | 0x27a28b8 |||   0 |   1.1
| +total+startup+param |   |   | ForeignScan(2) rows=1
cost=0.0 | RELOPTINFO (2): rows=1 width=64
 0x27a2ab0 | 0x27a3c68 |||   0 |  2.21
| +total+startup+param | 0x27a28b8 | 0x27a2238 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4608 | 0x27a4608  |  2 |1.11 |  2.23
|  | 0x27a2238 | 0x27a28b8 | HashJoin(1 2) rows=1
cost=1.11 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4498 | 0x27a4498  |  0 |   0 |  2.22
|  | 0x27a4330 | 0x27a28b8 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4388 | 0x27a4388  |  0 |   0 |  2.21
|  | 0x27a2238 | 0x27a28b8 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4220 | 0x27a4220  |  2 |2.22 |  2.25
|  | 0x27a2238 | 0x27a28b8 | MergeJoin(1 2) rows=1
cost=2.2 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a3f90 | 0x27a3f90  |  2 |1.11 |  2.23
|  | 0x27a28b8 | 0x27a2238 | HashJoin(1 2) rows=1
cost=1.11 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a3e20 | 0x27a3e20  |  0 |   0 |  2.22
|  | 0x27a3c10 | 0x27a2238 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a3b18 | 0x27a3c68  |  1 |2.22 |  2.25
|  | 0x27a28b8 | 0x27a2238 | MergeJoin(1 2) rows=1
cost=2.2 | RELOPTINFO (1 2): rows=1 width=308


0002-local-change.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread David Rowley
On 27 July 2015 at 20:11, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/27/2015 08:34 AM, David Rowley wrote:

 - * agg_input_types, agg_state_type, agg_result_type identify the input,
 - * transition, and result types of the aggregate.  These should all be
 - * resolved to actual types (ie, none should ever be ANYELEMENT etc).
 + * agg_input_types identifies the input types of the aggregate.  These
 should
 + * be resolved to actual types (ie, none should ever be ANYELEMENT etc).

 I'm not sure I understand why agg_state_type and agg_result_type have
 changed here.


 The function no longer has the agg_result_type argument, but the removal
 of agg_state_type from the comment was a mistake.


I've put agg_state_type back in the attached delta which is again based on
your version of the patch.




  + peraggstate-sortstates = (Tuplesortstate **)
 + palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
 + for (currentsortno = 0; currentsortno  numGroupingSets;
 currentsortno++)
 + peraggstate-sortstates[currentsortno] = NULL;

 This was not you, but this NULL setting looks unneeded due to the
 palloc0().


 Yeah, I noticed that too. Ok, let's take it out.


Removed in attached.


  In this function I also wasn't quite sure if it was with comparing both
 non-NULL INITCOND's here. I believe my code comments may slightly
 contradict what the code actually does, as the comments talk about them
 having to match, but the code just bails if any are non-NULL. The reason I
 didn't check them was because it seems inevitable that some duplicate work
 needs to be done when setting up the INITCOND. Perhaps it's worth it?


 It would be nice to handle non-NULL initconds. I think you'll have to
 check that the input function isn't volatile. Or perhaps just call the
 input function, and check that the resulting Datum is byte-per-byte
 identical, although that might be awkward to do with the current code
 structure.


I've not done anything with this.
I'd not thought of an input function being volatile before, but I guess
it's possible, which makes me a bit scared that we could be treading on
ground we shouldn't be. I know it's more of an output function thing than
an input function thing, but a GUC like extra_float_digits could cause
problems here.

In summary, I'm much less confident it's safe to enable the optimisation in
this case.



  BTW, the name of the AggStatePerAggStateData struct is pretty horrible.
 The repeated AggState feels awkward. Now that I've stared at the patch
 for a some time, it doesn't bother me anymore, but it took me quite a
 while
 to over that. I'm sure it will for others too. And it's not just that
 struct, the comments talk about aggregate state, which could be
 confused
 to mean AggState, but it actually means AggStatePerAggStateData. I
 don't
 have any great suggestions, but can you come up a better naming scheme?


 I agree, they're horrible. The thing that's causing the biggest problem is
 the struct named AggState, which carries state for *all* aggregates, and
 has nothing to do with transition state, so it seems there's two
 different meanings if the word state and I've used both meanings in the
 name for AggStatePerAggStateData.

 Perhaps just renaming AggStatePerAggStateData to AggStateTransStateData
 would be good enough?


 Hmm. I think it should be AggStatePerTransData then, to keep the same
 pattern as AggStatePerAggData and AggStatePerGroupData.


Sounds good. I've renamed it to that in the attached delta patch.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


sharing_aggstate-heikki-1_delta2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Michael Paquier
On Tue, Jul 28, 2015 at 1:15 AM, Andrew Dunstan and...@dunslane.net wrote:
 Well, it does create a lot of files that we don't pick up. An example list
 is show below, and I am attaching their contents in a single gzipped
 attachment. However, these are in the wrong location. This was a vpath build
 and yet these tmp_check directories are all created in the source tree.
 Let's fix that and then I'll set about having the buildfarm collect them.
 That should get us further down the track.

 [log list]

The patch attached fixes that. I suggest that we use env{TESTDIR}/log
as a location for the logs so as even a vpath build will locate
correctly the log files.
-- 
Michael


20150728_tap_logs_vpath.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 03:05 PM, Stephen Frost wrote:
 AFK at the moment, but my thinking was that we should avoid having
 the error message change based on what a GUC is set to. I agree
 that there should be comments which explain that.

I changed back to using GetUserId() for the call to check_enable_rls()
at those three call sites, and added to the comments to explain why.

While looking at ri_ReportViolation() I spotted what I believe to be a
bug in the current logic -- namely, has_perm is initialized to true,
and when check_enable_rls() returns RLS_ENABLED we never reset
has_perm to false, and thus leak info even though the comments claim
we don't. I fixed that here, but someone please take a look and
confirm I am reading that correctly.

Beyond that, any additional comments?

Thanks,

Joe

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVtuadAAoJEDfy90M199hl67kQAJw9iekYVAm52+kOxmBi8YLK
NMoRUWLv5AZ7coaltQBBTiYYbjWHVKoWaMrOg2OjtxjyeshYaZ+xsBQl4umznI9j
b2unSfUmRPcCgy7O6R63+IXePh6krKowlMAIkSelYjvV05nSgQfy87/xjcBXS12r
MajLambTyJycS8fQXdt9sG8uGZh7ncXUtip3mUOYgl9Omn5GiDcgbdV1xQR+GBRJ
48S9lTHIJenpvi83Y9/7CXfDwxdcvkziUkR67UL4jxqmjWBDrrGZWEWOE1KOn78W
dIvItOnuw/tKoxmhcwkgys+u92uhfQUUwhDQsJRVKsqzvPcVt6Oh15rtlqipbYEn
YfcM35Sa4sUtxL9JIoyof+8audagPy9nzD26c4jA2A7EWXHt8Dim/z7D5RgrOpdn
xBqlwViuR5Zt+kLynf3aZyDtmaMIRA+tvzJPam1vrl7g86LCJbZslFNktveiGeYl
17+PKLTDcVO5f6CdT1NTnlaks0J7D4VThxGemDs09KX6P8iCe6VFaUqfEONpjb41
WsumlDJkT9bu5i8W68xtEskXBYgBmDCo6yho4bKn/f6tpHc10yyh8RpK48P5xPt9
ZLSTLmYkfLL7wsINw6eNrQ4OZCtWwiydD9CmjQZOzscyBBusOvlIcI9Kfrle0I1V
r2gQN651WyY4YJIoEggu
=hlUr
-END PGP SIGNATURE-
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 98d1497..fd82ea4 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SET search_path TO replaceableschema/
*** 15259,15264 
--- 15259,15270 
 entrytypeboolean/type/entry
 entrydoes current user have privilege for role/entry
/row
+   row
+entryliteralfunctionrow_security_active/function(parametertable/parameter)/literal
+/entry
+entrytypeboolean/type/entry
+entrydoes current user have row level security active for table/entry
+   /row
   /tbody
  /tgroup
 /table
*** SET search_path TO replaceableschema/
*** 15299,15304 
--- 15305,15313 
 indexterm
  primarypg_has_role/primary
 /indexterm
+indexterm
+ primaryrow_security_active/primary
+/indexterm
  
 para
  functionhas_table_privilege/function checks whether a user
*** SELECT has_function_privilege('joeuser',
*** 15462,15467 
--- 15471,15483 
  are immediately available without doing commandSET ROLE/.
 /para
  
+para
+ functionrow_security_active/function checks whether row level
+ security is active for the specified table in the context of the
+ functioncurrent_user/function and environment. The table can
+ be specified by name or by OID.
+/para
+ 
para
 xref linkend=functions-info-schema-table shows functions that
 determine whether a certain object is firsttermvisible/ in the
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 1043362..2797c56 100644
*** a/src/backend/access/index/genam.c
--- b/src/backend/access/index/genam.c
*** BuildIndexValueDescription(Relation inde
*** 203,209 
  	indrelid = idxrec-indrelid;
  	Assert(indexrelid == idxrec-indexrelid);
  
! 	/* RLS check- if RLS is enabled then we don't return anything. */
  	if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
--- 203,213 
  	indrelid = idxrec-indrelid;
  	Assert(indexrelid == idxrec-indexrelid);
  
! 	/*
! 	 * RLS check - if RLS is enabled then we don't return anything.
! 	 * Explicitly pass GetUserId() to ensure the result is not
! 	 * dependent on the value of row_security for users with BYPASSRLS.
! 	 */
  	if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..c0bd6fa 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_indexes AS
*** 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
--- 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats WITH (security_barrier) AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
*** CREATE VIEW pg_stats AS
*** 211,217 
  

Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Jim Nasby

On 7/27/15 6:40 PM, Jim Nasby wrote:

On 7/27/15 5:12 PM, Joel Jacobson wrote:

Right now, when writing a function, if you raise an exception, you can
be sure all writes you have made will be rollbacked, but your caller
function might caught the exception and decide to carry on and commit
work made before your function was called, but at least you can be
confident your writes won't be committed as long as you don't caught the
exception you raised in your own function. If I understand it correctly,
that would change with the addition of Autonomous Transaction, unless
given a way to prevent a function you call from starting and commiting
a Autonomous Transaction. Wrong? If so, then please show how to prevent
Y() from commiting the Data write B1 in your example, I don't get it.


What's being described here doesn't make sense in either use case ([1] 
[2]), but I do understand the concern about what 3rd party software is
doing. It would be nice to have the ability to disallow and/or disable
autonomous transactions, but I don't see a practical way of doing that
other than introducing a new GUC. I'm not sure if it's worth that effort.


It just occurred to me that another option would be to have an event 
trigger for beginning an autonomous transaction.



[1] the batch process use case: batches that still hold their own
transaction open don't gain anything.

[2] the audit logging case. If you didn't care about auditing
surviving regardless of a rollback then you wouldn't go to the extra
work of an autonomous transaction to begin with.



--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Several memory leaks for pg_rewind caused by missing PQclear calls

2015-07-27 Thread Michael Paquier
On Tue, Jul 28, 2015 at 2:45 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 07/23/2015 05:08 PM, Michael Paquier wrote:

 Hi all,

 After a run of valgrind on pg_rewind, I found a couple of code paths
 missing some PQclear calls after running a query. Attached is a patch
 to fix all those leaks.


 Fixed, thanks.

Thanks for taking care of this.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more RLS oversights

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 01:13 PM, Alvaro Herrera wrote:
 Hmm, these are not ACL objects, so conceptually it seems cleaner to
 use a different symbol for this.  I think the catalog state and the
 error messages would be a bit confusing otherwise.

Ok -- done

 Isn't this leaking the previously allocated array?  Not sure it's
 all that critical, but still.  (I don't think you really need to
 call palloc at all here.)

Agreed -- I think the attached is a bit cleaner.

Any other comments or concerns?

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVts3+AAoJEDfy90M199hlzTIQAJ6LEOrEEhHEjsoVvCT6TUAu
pMQl/LWmb0s3/vF9o5VFTpVd21k0LxcggD+DdbxSagiw1WpLK5x67C9Lj5uuFn/d
3a95nFnQje3ciQJaAKS4XcGyx8+6rHPZqfBRC1rARbLuDHrwpKqmbKwvpQCud4xN
kD7OolYk5Gd3cId0xH0XBHuwLVJg4Bt/MAexrcHn+kXuQoF8V6iOjnmBI/BHvTQy
w4j4ov7DpWSAR1ZiCTCkF2ZuNd9TJ8FmAEtSDVrlWMxB9J+9PU5oTfD50jp62BTz
wycANVYmbpCyZ7DAAiqopt3IQFIiF/1bYwzWH3/M8RRMKJF8HNyE8KBPDyC/doO5
0c0poCugJI2wOhumLGJShizgSAS85vqijh2Uxpp2yQxdStMnADykzT4Nb5EnEJVE
i7OKy6w+2xyilfFGEWhHfS7uo5Y0zBorhpjgT9fRaqPBGoK4jYwZoO8w97SUd9aS
kNXOCfmFxvcDFSZIYZP77pOeJZR2dLCbr+X9bF1Fz5S4FVkgCXVOp9rmsqzgxoDh
lcpkDh9zPPezdczRkfq/qCf0lmzGg8apdqdr7+g8DxU+01OvPspEdpovQQN0HXlM
m5Kbny4KCWhAgBTyAsOFTEy6lK7u4KsHV1cee3bG+ev65czbQ14gGRMJc/Lhm6Gg
Apcn/UcF14vLWxYVo6lS
=pS6L
-END PGP SIGNATURE-
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9096ee5..7781c56 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 5793,5798 
--- 5793,5808 
  /varlistentry
  
  varlistentry
+  termsymbolSHARED_DEPENDENCY_POLICY/ (literalr/)/term
+  listitem
+   para
+The referenced object (which must be a role) is mentioned as the
+target of a dependent policy object.
+   /para
+  /listitem
+ /varlistentry
+ 
+ varlistentry
   termsymbolSHARED_DEPENDENCY_PIN/ (literalp/)/term
   listitem
para
diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 34fe4e2..43076c9 100644
*** a/src/backend/catalog/pg_shdepend.c
--- b/src/backend/catalog/pg_shdepend.c
*** storeObjectDescription(StringInfo descs,
*** 1083,1088 
--- 1083,1090 
  appendStringInfo(descs, _(owner of %s), objdesc);
  			else if (deptype == SHARED_DEPENDENCY_ACL)
  appendStringInfo(descs, _(privileges for %s), objdesc);
+ 			else if (deptype == SHARED_DEPENDENCY_POLICY)
+ appendStringInfo(descs, _(target of %s), objdesc);
  			else
  elog(ERROR, unrecognized dependency type: %d,
  	 (int) deptype);
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 17b48d4..9544f75 100644
*** a/src/backend/commands/policy.c
--- b/src/backend/commands/policy.c
***
*** 22,27 
--- 22,28 
  #include catalog/indexing.h
  #include catalog/namespace.h
  #include catalog/objectaccess.h
+ #include catalog/pg_authid.h
  #include catalog/pg_policy.h
  #include catalog/pg_type.h
  #include commands/policy.h
***
*** 48,54 
  static void RangeVarCallbackForPolicy(const RangeVar *rv,
  		  Oid relid, Oid oldrelid, void *arg);
  static char parse_policy_command(const char *cmd_name);
! static ArrayType *policy_role_list_to_array(List *roles);
  
  /*
   * Callback to RangeVarGetRelidExtended().
--- 49,55 
  static void RangeVarCallbackForPolicy(const RangeVar *rv,
  		  Oid relid, Oid oldrelid, void *arg);
  static char parse_policy_command(const char *cmd_name);
! static Datum *policy_role_list_to_array(List *roles, int *num_roles);
  
  /*
   * Callback to RangeVarGetRelidExtended().
*** parse_policy_command(const char *cmd_nam
*** 130,159 
  
  /*
   * policy_role_list_to_array
!  *	 helper function to convert a list of RoleSpecs to an array of role ids.
   */
! static ArrayType *
! policy_role_list_to_array(List *roles)
  {
! 	ArrayType  *role_ids;
! 	Datum	   *temp_array;
  	ListCell   *cell;
- 	int			num_roles;
  	int			i = 0;
  
  	/* Handle no roles being passed in as being for public */
  	if (roles == NIL)
  	{
! 		temp_array = (Datum *) palloc(sizeof(Datum));
! 		temp_array[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);
  
! 		role_ids = construct_array(temp_array, 1, OIDOID, sizeof(Oid), true,
!    'i');
! 		return role_ids;
  	}
  
! 	num_roles = list_length(roles);
! 	temp_array = (Datum *) palloc(num_roles * sizeof(Datum));
  
  	foreach(cell, roles)
  	{
--- 131,158 
  
  /*
   * policy_role_list_to_array
!  *	 helper function to convert a list of RoleSpecs to an array of
!  *	 role id Datums.
   */
! static Datum *
! policy_role_list_to_array(List *roles, int *num_roles)
  {
! 	Datum	   *role_oids;
  	ListCell   *cell;
  	int			i = 0;
  
  	/* Handle no roles being passed in as being for public */
  	if (roles == NIL)
  	{
! 	   *num_roles = 1;
! 		role_oids = (Datum *) palloc(*num_roles * 

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Ildus Kurbangaliev

Hello.
In the attached patch I've made a refactoring for tranches.
The prefix for them was extended,  and I've did a split of LWLockAssign 
to two

functions (one with tranche and second for user defined LWLocks).

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 0eb991c..e75ca4d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -480,7 +480,7 @@ pgss_shmem_startup(void)
 	if (!found)
 	{
 		/* First time through ... */
-		pgss-lock = LWLockAssign();
+		pgss-lock = LWLockUserAssign();
 		pgss-cur_median_usage = ASSUMED_MEDIAN_INIT;
 		pgss-mean_query_len = ASSUMED_LENGTH_INIT;
 		SpinLockInit(pgss-mutex);
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 9c15950..b365565 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3356,7 +3356,7 @@ if (!ptr)
 {
 initialize contents of shmem area;
 acquire any requested LWLocks using:
-ptr-mylockid = LWLockAssign();
+ptr-mylockid = LWLockUserAssign();
 }
 LWLockRelease(AddinShmemInitLock);
 }
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3a58f1e..3a55511 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -457,7 +457,7 @@ CLOGShmemInit(void)
 {
 	ClogCtl-PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(ClogCtl, CLOG Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-  CLogControlLock, pg_clog);
+  CLogControlLock, pg_clog, LW_TRANCHE_CLOG_BUFFERS);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 5ad35c0..2571203 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -466,7 +466,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl-PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, CommitTs Ctl, CommitTsShmemBuffers(), 0,
-  CommitTsControlLock, pg_commit_ts);
+  CommitTsControlLock, pg_commit_ts,
+  LW_TRANCHE_COMMITTS_BUFFERS);
 
 	commitTsShared = ShmemInitStruct(CommitTs shared,
 	 sizeof(CommitTimestampShared),
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1933a87..e8d2474 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1842,10 +1842,12 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
   MultiXactOffset Ctl, NUM_MXACTOFFSET_BUFFERS, 0,
-  MultiXactOffsetControlLock, pg_multixact/offsets);
+  MultiXactOffsetControlLock, pg_multixact/offsets,
+  LW_TRANCHE_MULTIXACT_BUFFERS);
 	SimpleLruInit(MultiXactMemberCtl,
   MultiXactMember Ctl, NUM_MXACTMEMBER_BUFFERS, 0,
-  MultiXactMemberControlLock, pg_multixact/members);
+  MultiXactMemberControlLock, pg_multixact/members,
+  LW_TRANCHE_MULTIXACT_BUFFERS);
 
 	/* Initialize our shared state struct */
 	MultiXactState = ShmemInitStruct(Shared MultiXact State,
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 5fcea11..7709f6d 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -161,7 +161,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir)
+			  LWLock *ctllock, const char *subdir, int lwlocktranche)
 {
 	SlruShared	shared;
 	bool		found;
@@ -218,7 +218,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			shared-page_status[slotno] = SLRU_PAGE_EMPTY;
 			shared-page_dirty[slotno] = false;
 			shared-page_lru_count[slotno] = 0;
-			shared-buffer_locks[slotno] = LWLockAssign();
+			shared-buffer_locks[slotno] = LWLockAssign(lwlocktranche);
 			ptr += BLCKSZ;
 		}
 	}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6b70982..80690bc 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -179,7 +179,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl-PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, SUBTRANS Ctl, NUM_SUBTRANS_BUFFERS, 0,
-  SubtransControlLock, pg_subtrans);
+  SubtransControlLock, pg_subtrans,
+  LW_TRANCHE_SUBTRANS_BUFFERS);
 	/* Override default assumption that writes should be fsync'd */
 	SubTransCtl-do_fsync = false;
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..fd04258 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -621,6 +621,7 @@ CREATE VIEW pg_stat_activity AS
 S.query_start,
 S.state_change,
 S.waiting,
+

Re: [HACKERS] Feature - Index support on an lquery field (from the ltree module)

2015-07-27 Thread Heikki Linnakangas

On 07/28/2015 06:40 AM, David Belle wrote:

I have a requirement for a project that I am working on and was
hoping to gain some discussion on the idea of implementing an index
type for lquery fields (see ltree’s
http://www.postgresql.org/docs/9.4/static/ltree.html
http://www.postgresql.org/docs/9.4/static/ltree.html)

Currently ltree fields can have GIST/GIN indexes, but you cannot
create a index for lquery or ltxtquery. In my scope, an index for
lquery would be enough, although if it wasn’t too difficult it for to
be expanded for ltxtquery fields, then this could also be
implemented.


Hmm. My first thought is to extract labels from the query which are 
required for any matches, and put those in the GIN index. When querying, 
find all lqueries that have at least one label in common with the search 
key.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Kouhei Kaigai
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
 Sent: Monday, July 27, 2015 11:07 PM
 To: Amit Kapila
 Cc: pgsql-hackers@postgresql.org; Robert Haas; Kyotaro HORIGUCHI
 Subject: Re: [HACKERS] [DESIGN] ParallelAppend
 
  On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  
   Hello,
  
   I'm recently working/investigating on ParallelAppend feature
   towards the next commit fest. Below is my design proposal.
  
   1. Concept
   --
   Its concept is quite simple anybody might consider more than once.
   ParallelAppend node kicks background worker process to execute
   child nodes in parallel / asynchronous.
   It intends to improve the performance to scan a large partitioned
   tables from standpoint of entire throughput, however, latency of
   the first multi-hundred rows are not scope of this project.
   From standpoint of technology trend, it primarily tries to utilize
   multi-cores capability within a system, but also enables to expand
   distributed database environment using foreign-tables inheritance
   features.
   Its behavior is very similar to Funnel node except for several
   points, thus, we can reuse its infrastructure we have had long-
   standing discussion through the v9.5 development cycle.
  
   2. Problems to be solved
   -
   Typical OLAP workloads takes tons of tables join and scan on large
   tables which are often partitioned, and its KPI is query response
   time but very small number of sessions are active simultaneously.
   So, we are required to run a single query as rapid as possible even
   if it consumes larger computing resources than typical OLTP workloads.
  
   Current implementation to scan heap is painful when we look at its
   behavior from the standpoint - how many rows we can read within a
   certain time, because of synchronous manner.
   In the worst case, when SeqScan node tries to fetch the next tuple,
   heap_getnext() looks up a block on shared buffer, then ReadBuffer()
   calls storage manager to read the target block from the filesystem
   if not on the buffer. Next, operating system makes the caller
   process slept until required i/o get completed.
   Most of the cases are helped in earlier stage than the above worst
   case, however, the best scenario we can expect is: the next tuple
   already appear on top of the message queue (of course visibility
   checks are already done also) with no fall down to buffer manager
   or deeper.
   If we can run multiple scans in parallel / asynchronous, CPU core
   shall be assigned to another process by operating system, thus,
   it eventually improves the i/o density and enables higher processing
   throughput.
   Append node is an ideal point to be parallelized because
   - child nodes can have physically different location by tablespace,
 so further tuning is possible according to the system landscape.
   - it can control whether subplan is actually executed on background
 worker, per subplan basis. If subplan contains large tables and
 small tables, ParallelAppend may kick background worker to scan
 large tables only, but scan on small tables are by itself.
   - Like as Funnel node, we don't need to care about enhancement of
 individual node types. SeqScan, IndexScan, ForeignScan or others
 can perform as usual, but actually in parallel.
  
  
   3. Implementation
   --
   * Plan  Cost
  
   ParallelAppend shall appear where Appen can appear except for the
   usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
   both of AppendPath and ParallelAppendPath with cost for each.
  
 
  Is there a real need to have new node like ParallelAppendPath?
  Can't we have Funnel node beneath AppendNode and then each
  worker will be responsible to have SeqScan on each inherited child
  relation.  Something like
 
  Append
 --- Funnel
-- SeqScan rel1
-- SeqScan rel2
 
 If Funnel can handle both of horizontal and vertical parallelism,
 it is a great simplification. I never stick a new node.
 
 Once Funnel get a capability to have multiple child nodes, probably,
 Append node above will have gone. I expect set_append_rel_pathlist()
 add two paths based on Append and Funnel, then planner will choose
 the cheaper one according to its cost.

In the latest v16 patch, Funnel is declared as follows:

  typedef struct Funnel
  {
  Scanscan;
  int num_workers;
  } Funnel;

If we try to add Append capability here, I expects the structure will
be adjusted as follows, for example:

  typedef struct Funnel
  {
  Scanscan;
  List   *funnel_plans;
  List   *funnel_num_workers;
  } Funnel;

As literal, funnel_plans saves underlying Plan nodes instead of the 
lefttree. Also, funnel_num_workers saves number of expected workers
to be assigned on individual child plans.

Even 

Re: [HACKERS] MultiXact member wraparound protections are now enabled

2015-07-27 Thread Noah Misch
On Mon, Jul 27, 2015 at 07:59:40AM +0100, Simon Riggs wrote:
 On 26 July 2015 at 20:15, Noah Misch n...@leadboat.com wrote:
  On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote:
   On 7/22/15 4:45 PM, Robert Haas wrote:
But it seemed to me that this could be rather confusing.  I thought it
would be better to be explicit about whether the protections are
enabled in all cases.  That way, (1) if you see the message saying
they are enabled, they are enabled; (2) if you see the message saying
they are disabled, they are disabled; and (3) if you see neither
message, your version does not have those protections.
  
   But this is not documented, AFAICT, so I don't think anyone is going to
   be able to follow that logic.  I don't see anything in the release notes
   saying, look for this message to see how this applies to you, or
  whatever.
 
  I supported inclusion of the message, because it has good potential to help
  experts studying historical logs to find the root cause of data corruption.
  The complex histories of clusters showing corruption from this series of
  bugs
  have brought great expense to the task of debugging new reports.  Given a
  cluster having full mxact wraparound protections since last corruption-free
  backup (or since initdb), one can rule out some causes.
 
 
 Would it be better to replace it with a less specific and more generally
 useful message?
 
 For example, Server started with release X.y.z
 from which we could infer various useful things.

That message does sound generally useful, but we couldn't infer $subject from
it.  While the $subject message appears at startup in simple cases, autovacuum
prerequisite work can delay it indefinitely.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CustomScan and readfuncs.c

2015-07-27 Thread Kouhei Kaigai
 2. Reproduce method table on background worker
 --
 The method field of CustomPath/Scan/ScanState is expected to be
 a reference to a static structure. Thus, copyObject() does not
 copy the entire table, but only pointers.
 However, we have no way to guarantee the callback functions have
 same entrypoint addressed on background workers. So, we may need
 an infrastructure to reproduce same CustomScan node with same
 callback function tables, probably, identified by name.
 We may have a few ways to solve the problem.
 
 * Add system catalog, function returns pointer
 The simplest way, like FDW. System catalog has a name and function
 to return callback pointers. It also needs SQL statement support,
 even a little down side.

I tried to design a DDL statement and relevant system catalog as follows.

  #define CustomPlanRelationId3999
  
  CATALOG(pg_custom_plan,3999)
  {
  NameDatacustom_name;
  regproc custom_handler;
  } FormData_pg_custom_plan;

This simple catalog saves a pair of name and handler function of custom
plan provider. Like FDW, this handler function returns pointers to the
entrypoint to be called by set_(rel|join)_pathlist_hook and relevant
CustomXXXMethods table.

User can register a custom plan provider using the following statement:
  CREATE CUSTOM PLAN name HANDLER function_name;

And unregister:
  DROP CUSTOM PLAN name;

This enhancement allows background workers to reproduce CustomScan node
that was serialized by nodeToString(), as long as provider is specified
by the name.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Kaigai Kouhei(海外 浩平)
 Sent: Monday, July 27, 2015 8:42 AM
 To: 'Tom Lane'
 Cc: pgsql-hackers@postgresql.org
 Subject: RE: [HACKERS] CustomScan and readfuncs.c
 
  Kouhei Kaigai kai...@ak.jp.nec.com writes:
   Under the investigation of ParallelAppend, I noticed here is a few
   problems in CustomScan, that prevents to reproduce an equivalent
   plan node on the background worker from serialized string.
 
   1. CustomScanMethods-TextOutCustomScan callback
   
   This callback allows to output custom information on nodeToString.
   Originally, we intend to use this callback for debug only, because
   CustomScan must be copyObject() safe, thus, all the private data
   also must be stored in custom_exprs or custom_private.
   However, it will lead another problem when we try to reproduce
   CustomScan node from the string form generated by outfuncs.c.
   If TextOutCustomScan prints something, upcoming _readCustomScan
   has to deal with unexpected number of tokens in unexpected format.
 
  Um ... wait a second.  There is no support in readfuncs for any
  plan node type, and never has been, and I seriously doubt that there
  ever should be.  I do not think it makes sense to ship plans around
  in the way you seem to have in mind.  (Also, I don't think the
  problems you mention are exactly unique to CustomScan.  There's no
  reason to assume that FDW plans could survive this treatment either,
  since we do not know what's in the fdw_private stuff; certainly no
  one has ever suggested that it should not contain pointers to static
  data.)
 
 Yep, no Plan node types are supported at this moment, however, will
 appear soon by the Funnel + PartialSeqScan nodes.
 It serializes a partial plan subtree using nodeToString() then gives
 the flatten PlannedStmt to background workers.
 I'm now investigating to apply same structure to Append not to kick
 child nodes in parallel.
 Once various plan node types appear in readfuncs.c, we have to care
 about this problem, don't it? I'm working for the patch submission
 of ParallelAppend on the next commit-fest, so like to make a consensus
 how to treat this matter.
 
   I'd like to propose to omit this callback prior to v9.5 release,
   for least compatibility issues.
 
  I regard our commitment to cross-version compatibility for the
  custom scan APIs as being essentially zero, for reasons previously
  discussed.  So if this goes away in 9.6 it will not matter, but we
  might as well leave it in for now for debug support.
 
 I don't argue this point strongly. If TextOutCustomScan shall be
 obsoleted on v9.6, it is just kindness for developers not to use
 this callback.
 
 Thanks,
 --
 NEC Business Creation Division / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-27 21:57 GMT+02:00 Andrew Dunstan and...@dunslane.net:


 On 07/27/2015 02:53 PM, Pavel Stehule wrote:





 I am trying to run parallel execution

 psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3
 psql -c select current_database()




 I don't think it's going to be a hugely important feature, but I don't see
 a problem with creating a new option (-C seems fine) which would have the
 same effect as if the arguments were contatenated into a file which is then
 used with -f. IIRC -c has some special characteristics which means it's
 probably best not to try to extend it for this feature.


ok, I'll try to write patch.

Pavel



 cheers

 andrew



Re: [HACKERS] more RLS oversights

2015-07-27 Thread Joe Conway
On 07/03/2015 10:03 AM, Noah Misch wrote:
 +static void
 +dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 ...
 +if (polinfo-polqual != NULL)
 +appendPQExpBuffer(query,  USING %s, polinfo-polqual);
 
 (3) The USING clause needs parentheses; a dump+reload failed like so:

Also needed for WITH CHECK. Fix pushed to 9.5 and HEAD.

Joe




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-27 Thread Tom Lane
Andreas Seltenreich seltenre...@gmx.de writes:
 I let sqlsmith run during the night, and it did no longer trigger the
 first two.  During roughly a million random queries it triggered the
 already mentioned brin one 10 times, but there was also one instance of
 this new one in the log:

 TRAP: FailedAssertion(!(join_clause_is_movable_into(rinfo, joinrel-relids, 
 join_and_req)), File: relnode.c, Line: 987)
 LOG:  server process (PID 12851) was terminated by signal 6: Aborted
 DETAIL:  Failed process was running: select  
 rel65543066.tmplname as c0, 
 rel65543064.umuser as c1
   from 
 public.dept as rel65543059
   inner join pg_catalog.pg_user_mappings as rel65543064
   left join pg_catalog.pg_enum as rel65543065
   on (rel65543064.srvname = rel65543065.enumlabel )
 inner join pg_catalog.pg_ts_template as rel65543066
 on (rel65543065.enumtypid = rel65543066.tmplnamespace )
   on (rel65543059.dname = rel65543064.srvname )
   where ((rel65543059.mgrname = rel65543059.mgrname) 
   and (rel65543064.usename = rel65543066.tmplname)) 
 and (rel65543059.mgrname ~~ rel65543059.mgrname)
   fetch first 128 rows only;

I looked into this one.  What seems to be the story is that
join_clause_is_movable_into() is approximate in the conservative
direction, that is it sometimes can return false when a strict analysis
would conclude true (a fact that is undocumented in its comments;
I shall fix that).  This is acceptable, as long as the answers are
consistent across different jointree levels, since the worst consequence
is that we might fail to push a join clause as far down the jointree as it
could be pushed.  However, the Assert that's failing here supposes that
the answer is exact.  I think a sufficient fix in the near term is to
disable that Assert and add suitable commentary.  In the long run it might
be nice if the answers were exact, but that would require more information
than is currently passed to join_clause_is_movable_into(), which would
imply non-back-patchable API changes.

 ,[ git bisect ]
 |   first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve
 |   predtest.c's ability to reason about operator expressions.
 `

There seems to be something odd going on with your bisection tests,
because once again the query fails clear back to 9.2 for me, which is what
I'd expect based on the above analysis --- the movable-join-clause logic
all came in with parameterized paths in 9.2.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Feature - Index support on an lquery field (from the ltree module)

2015-07-27 Thread David Belle
Hi 

Forgive me if this is not the right place to discuss this. I am new to the 
postgresql community.

I have a requirement for a project that I am working on and was hoping to gain 
some discussion on the idea of implementing an index type for lquery fields 
(see ltree’s http://www.postgresql.org/docs/9.4/static/ltree.html 
http://www.postgresql.org/docs/9.4/static/ltree.html)

Currently ltree fields can have GIST/GIN indexes, but you cannot create a index 
for lquery or ltxtquery. In my scope, an index for lquery would be enough, 
although if it wasn’t too difficult it for to be expanded for ltxtquery fields, 
then this could also be implemented. 

I have not contributed to postgres yet so I’m not really sure how to get 
started. 

Thoughts?








Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Pavel Stehule
2015-07-27 23:59 GMT+02:00 Merlin Moncure mmonc...@gmail.com:

 On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson j...@trustly.com wrote:
  On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure mmonc...@gmail.com
 wrote:
 
  On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus j...@agliodbs.com wrote:
   Batch Jobs: large data-manipulation tasks which need to be broken up
   into segments, with each segment committing separately.  Example:
   updating 1 million records in batches of 1000.
 
  Autonomous transactions are not a good fit for this case; stored
  procedures are a better way to go for any scenario where you don't
  want be be in a snapshot (for example, suppose you want to change
  isolation level on the fly).
 
 
  Hm, you mean we need real stored procedures in PostgreSQL and not just
  functions?

 Yes, exactly.

 Autonomous transactions aren't really set up for cases where the
 function runs for a very long time or indefinitely.  This is the
 'advancing xmin' problem as Josh puts it but I think the problem is
 much bigger than that.  Anyways, this is mostly irrelevant to
 autonomous transactions as long as the design isn't extended to try
 and cover that case.

 Is the Autonomous Transaction feature only going to be exposed through
 pl/pgsql?


I hope not.

The integration with plpgsql can be secondary question. In this case I
prefer a relation to block statement without possibility to explicit
COMMIT. Minimally in functions.

some like

BEGIN
  BEGIN AUTONOMOUS
   ...
  END;
END;

This is consistent with current subtransaction support, and disallow some
corner cases like forgotten COMMIT.

Regards

Pavel


 merlin


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-28 5:24 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-07-27 21:57 GMT+02:00 Andrew Dunstan and...@dunslane.net:


 On 07/27/2015 02:53 PM, Pavel Stehule wrote:





 I am trying to run parallel execution

 psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3
 psql -c select current_database()




 I don't think it's going to be a hugely important feature, but I don't
 see a problem with creating a new option (-C seems fine) which would have
 the same effect as if the arguments were contatenated into a file which is
 then used with -f. IIRC -c has some special characteristics which means
 it's probably best not to try to extend it for this feature.


 ok, I'll try to write patch.


I have a question. Can be -C option multiple?

The SQL is without problem, but what about \x command?

postgres=# \dt \dn select 10;
No relations found.
List of schemas
┌──┬───┐
│ Name │ Owner │
╞══╪═══╡
└──┴───┘
(0 rows)

\dn: extra argument 10; ignored


some like

psql -C \dt \dn -C select 10

It is looking better than psql -c \dt \dn \n select 10

Regards

Pavel




 Pavel



 cheers

 andrew





Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Amit Kapila
On Tue, Jul 28, 2015 at 7:59 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
  Sent: Monday, July 27, 2015 11:07 PM
  To: Amit Kapila
  
   Is there a real need to have new node like ParallelAppendPath?
   Can't we have Funnel node beneath AppendNode and then each
   worker will be responsible to have SeqScan on each inherited child
   relation.  Something like
  
   Append
  --- Funnel
 -- SeqScan rel1
 -- SeqScan rel2
  
  If Funnel can handle both of horizontal and vertical parallelism,
  it is a great simplification. I never stick a new node.
 
  Once Funnel get a capability to have multiple child nodes, probably,
  Append node above will have gone. I expect set_append_rel_pathlist()
  add two paths based on Append and Funnel, then planner will choose
  the cheaper one according to its cost.
 
 In the latest v16 patch, Funnel is declared as follows:

   typedef struct Funnel
   {
   Scanscan;
   int num_workers;
   } Funnel;

 If we try to add Append capability here, I expects the structure will
 be adjusted as follows, for example:

   typedef struct Funnel
   {
   Scanscan;
   List   *funnel_plans;
   List   *funnel_num_workers;
   } Funnel;

 As literal, funnel_plans saves underlying Plan nodes instead of the
 lefttree. Also, funnel_num_workers saves number of expected workers
 to be assigned on individual child plans.


or shall we have a node like above and name it as FunnelAppend or
AppenFunnel?

 Even though create_parallelscan_paths() in v16 set num_workers not
 larger than parallel_seqscan_degree, total number of the concurrent
 background workers may exceed this configuration if more than two
 PartialSeqScan nodes are underlying.
 It is a different configuration from max_worker_processes, so it is
 not a matter as long as we have another restriction.
 However, how do we control the cap of number of worker processes per
 appendable Funnel node? For example, if a parent table has 200
 child tables but max_worker_processes are configured to 50.
 It is obviously impossible to launch all the background workers
 simultaneously. One idea I have is to suspend launch of some plans
 until earlier ones are completed.


Okay, but I think in that idea you need to re-launch the workers again for
new set of relation scan's which could turn out to be costly, how about
designing some way where workers after completing their assigned work
check for new set of task/'s (which in this case would be to scan a new) and
then execute the same.  I think in this way we can achieve dynamic
allocation
of work and achieve maximum parallelism with available set of workers.
We have achieved this in ParallelSeqScan by scanning at block level, once
a worker finishes a block, it checks for new block to scan.


  We will need to pay attention another issues we will look at when Funnel
  kicks background worker towards asymmetric relations.
 
  If number of rows of individual child nodes are various, we may
  want to assign 10 background workers to scan rel1 with PartialSeqScan.
  On the other hands, rel2 may have very small number of rows thus
  its total_cost may be smaller than cost to launch a worker.
  In this case, Funnel has child nodes to be executed asynchronously and
  synchronously.
 

I think this might turn out to be slightly tricky, for example how do we
know
for what size of relation, how many workers are sufficient?
Another way to look at dividing the work in this case could be in terms of
chunk-of-blocks, once a worker finishes it current set of block/'s, it
should be
able to get new set of block's to scan.  So let us assume if we decide
chunk-size as 32 and total number of blocks in whole inheritance hierarchy
are 3200, then the max workers we should allocate to this scan are 100 and
if we have parallel_seqscan degree lesser than that then we can use those
many workers and then let them scan 32-blocks-at-a-time.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-27 Thread Andreas Seltenreich
Tom Lane writes:

 Andreas Seltenreich seltenre...@gmx.de writes:
 when running my random query generator contraption[1] against the
 regression database of 9.5 or master, it occasionally triggers one of
 the following three assertions.

 I've fixed the first two of these --- thanks for the report!

I let sqlsmith run during the night, and it did no longer trigger the
first two.  During roughly a million random queries it triggered the
already mentioned brin one 10 times, but there was also one instance of
this new one in the log:

TRAP: FailedAssertion(!(join_clause_is_movable_into(rinfo, joinrel-relids, 
join_and_req)), File: relnode.c, Line: 987)
LOG:  server process (PID 12851) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: select  
  rel65543066.tmplname as c0, 
  rel65543064.umuser as c1
from 
  public.dept as rel65543059
inner join pg_catalog.pg_user_mappings as rel65543064
left join pg_catalog.pg_enum as rel65543065
on (rel65543064.srvname = rel65543065.enumlabel )
  inner join pg_catalog.pg_ts_template as rel65543066
  on (rel65543065.enumtypid = rel65543066.tmplnamespace )
on (rel65543059.dname = rel65543064.srvname )
where ((rel65543059.mgrname = rel65543059.mgrname) 
and (rel65543064.usename = rel65543066.tmplname)) 
  and (rel65543059.mgrname ~~ rel65543059.mgrname)
fetch first 128 rows only;

 ,[ git bisect ]
 |   first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve
 |   predtest.c's ability to reason about operator expressions.
 `

 I'm a bit confused about this aspect of your report though, because in
 my hands that example fails clear back to 9.2.  It doesn't seem to require
 the predtest.c improvement to expose the fault.

Hmm, I actually used a different, uglier query to trigger this assertion
for the bisection run.  I'll attach it[1] along with the complete git
bisect log[2].

regards,
andreas

Footnotes: 
[1]  select  subq_717608.c3 as c0, rel4551421.inhrelid as c1, 
rel4551421.inhrelid as c2, subq_717608.c3 as c3 from 
information_schema.foreign_tables as rel4551363 right join public.hash_f8_heap 
as rel4551366 inner join pg_catalog.pg_constraint as rel4551419 inner join 
(select  rel4551420.bb as c0, rel4551420.aa as c1, rel4551420.aa as c2, 
rel4551420.aa as c3 from public.b as rel4551420 where ( 
rel4551420.bbrel4551420.bb ) and ( rel4551420.bbrel4551420.bb ) ) as 
subq_717608 on (rel4551419.coninhcount = subq_717608.c1 ) left join 
pg_catalog.pg_inherits as rel4551421 on (subq_717608.c1 = rel4551421.inhseqno ) 
on (rel4551366.seqno = subq_717608.c1 ) on (rel4551363.foreign_table_schema = 
rel4551419.conname ) where ( ( rel4551419.contypidrel4551419.connamespace ) 
and ( rel4551419.connamespace=rel4551419.conrelid ) ) and ( 
rel4551421.inhparentrel4551419.contypid )  fetch first 9 rows only ;

[2] git bisect start
# bad: [3b5a89c4820fb11c337838c1ad71e8e93f2937d1] Fix resource leak pointed out 
by Coverity.
git bisect bad 3b5a89c4820fb11c337838c1ad71e8e93f2937d1
# good: [e6df2e1be6330660ba4d81daa726ae4a71535aa9] Stamp 9.4beta1.
git bisect good e6df2e1be6330660ba4d81daa726ae4a71535aa9
# bad: [68e66923ff629c324e219090860dc9e0e0a6f5d6] Add missing volatile 
qualifier.
git bisect bad 68e66923ff629c324e219090860dc9e0e0a6f5d6
# bad: [a16bac36eca8158cbf78987e95376f610095f792] Remove dependency on 
wsock32.lib in favor of ws2_32
git bisect bad a16bac36eca8158cbf78987e95376f610095f792
# good: [55d5b3c08279b487cfa44d4b6e6eea67a0af89e4] Remove unnecessary output 
expressions from unflattened subqueries.
git bisect good 55d5b3c08279b487cfa44d4b6e6eea67a0af89e4
# bad: [1cbc9480106241aaa8db112331e93d0a265b6db0] Check interrupts during 
logical decoding more frequently.
git bisect bad 1cbc9480106241aaa8db112331e93d0a265b6db0
# bad: [686f362bee126e50280bcd3b35807b02f18a8966] Fix 
contrib/pg_upgrade/test.sh for $PWD containing spaces.
git bisect bad 686f362bee126e50280bcd3b35807b02f18a8966
# bad: [be76a6d39e2832d4b88c0e1cc381aa44a7f86881] Secure Unix-domain sockets of 
make check temporary clusters.
git bisect bad be76a6d39e2832d4b88c0e1cc381aa44a7f86881
# good: [6554656ea2043c5bb877b427237dc5ddd7c5e5c8] Improve tuplestore's error 
messages for I/O failures.
git bisect good 6554656ea2043c5bb877b427237dc5ddd7c5e5c8
# bad: [a7205d81573cb0c979f2d463a1d9edb6f97c94aa] Adjust 9.4 release notes.
git bisect bad a7205d81573cb0c979f2d463a1d9edb6f97c94aa
# bad: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve predtest.c's ability 
to reason about operator expressions.
git bisect bad 3f8c23c4d31d4a0e801041733deb2c7cfa577b32
# good: [c81e63d85f0c2c39d3fdfd8b95fc1ead6fdcb89f] Fix pg_restore's processing 
of old-style BLOB COMMENTS data.
git bisect good c81e63d85f0c2c39d3fdfd8b95fc1ead6fdcb89f
# first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve 
predtest.c's ability to reason about operator 

Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread Haribabu Kommi
On Thu, Jul 9, 2015 at 7:44 PM, David Rowley
david.row...@2ndquadrant.com wrote:
 On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote:


 This basically allows an aggregate's state to be shared between other
 aggregate functions when both aggregate's transition functions (and a few
 other things) match
 There's quite a number of aggregates in our standard set which will
 benefit from this optimisation.


 After compiling the original patch with another compiler, I noticed a couple
 of warnings.

 The attached fixes these.

I did some performance tests on the patch. This patch shown good
improvement for same column aggregates. With int or bigint datatype columns,
this patch doesn't show any visible performance difference. But with numeric
datatype it shows good improvement.

select sum(x), avg(y) from test where x  $1;

Different columns:

selectivityHeadpatch
(millions)
0.1   315  322
0.3   367  376
0.5   419  427
1  551  558
2  824  826

select sum(x), avg(x) from test where x  $1;

Same column:

selectivityHeadpatch
(millions)
0.1   314  314
0.3   363  343
0.5   412  373
1  536  440
2  795  586

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXact member wraparound protections are now enabled

2015-07-27 Thread Simon Riggs
On 26 July 2015 at 20:15, Noah Misch n...@leadboat.com wrote:

 On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote:
  On 7/22/15 4:45 PM, Robert Haas wrote:
   But it seemed to me that this could be rather confusing.  I thought it
   would be better to be explicit about whether the protections are
   enabled in all cases.  That way, (1) if you see the message saying
   they are enabled, they are enabled; (2) if you see the message saying
   they are disabled, they are disabled; and (3) if you see neither
   message, your version does not have those protections.
 
  But this is not documented, AFAICT, so I don't think anyone is going to
  be able to follow that logic.  I don't see anything in the release notes
  saying, look for this message to see how this applies to you, or
 whatever.

 I supported inclusion of the message, because it has good potential to help
 experts studying historical logs to find the root cause of data corruption.
 The complex histories of clusters showing corruption from this series of
 bugs
 have brought great expense to the task of debugging new reports.  Given a
 cluster having full mxact wraparound protections since last corruption-free
 backup (or since initdb), one can rule out some causes.


Would it be better to replace it with a less specific and more generally
useful message?

For example, Server started with release X.y.z
from which we could infer various useful things.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread David Rowley
On 27 July 2015 at 18:15, Haribabu Kommi kommi.harib...@gmail.com wrote:

 On Thu, Jul 9, 2015 at 7:44 PM, David Rowley
 david.row...@2ndquadrant.com wrote:
  On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com
 wrote:
 
 
  This basically allows an aggregate's state to be shared between other
  aggregate functions when both aggregate's transition functions (and a
 few
  other things) match
  There's quite a number of aggregates in our standard set which will
  benefit from this optimisation.
 
 
  After compiling the original patch with another compiler, I noticed a
 couple
  of warnings.
 
  The attached fixes these.

 I did some performance tests on the patch. This patch shown good
 improvement for same column aggregates. With int or bigint datatype
 columns,
 this patch doesn't show any visible performance difference. But with
 numeric
 datatype it shows good improvement.


Thanks for testing this.

You should only see an improvement on aggregates listed here:

select aggfnoid::oid, aggfnoid || '(' || typname ||
')',aggtransfn,agginitval
from pg_aggregate ag
inner join pg_proc pr on aggfnoid = pr.oid
inner join pg_type tp on pr.proargtypes[0] = tp.oid
where ag.aggtransfn in (select aggtransfn
from pg_aggregate
group by aggtransfn
having count(*)1)
  and ag.agginitval is null
order by ag.aggtransfn;

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] multivariate statistics / patch v7

2015-07-27 Thread Kyotaro HORIGUCHI
Hello,

At Sat, 25 Jul 2015 23:09:31 +0200, Tomas Vondra tomas.von...@2ndquadrant.com 
wrote in 55b3fb0b.7000...@2ndquadrant.com
 Hi,
 
 On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote:
  Hi, I'd like to show you the modified constitution of
  multivariate statistics application logic. Please find the
  attached. They apply on your v7 patch.
 
 Sadly I do have some trouble getting it to apply correctly :-(
 So for now all my comments are based on just reading the code.

Ah. My modification to rebase to the master for the time should
be culprit. Sorry for the dirty patch. 

# I would recreate the patch if you complained before struggling
# with the thing..

The core of the modification is on closesel.c. I attached the
patched closesel.c.

 FWIW I've rebased my patch to the current master, it's available on
 github as usual:
 
 https://github.com/tvondra/postgres/commits/mvstats

Thanks.

  The code to find mv-applicable clause is moved out of the main
  flow of clauselist_selectivity. As I said in the previous mail,
  the new function transformRestrictInfoForEstimate (too bad name
  but just for PoC:) scans clauselist and generates
  RestrictStatsData struct which drives mv-aware selectivity
  calculation. This struct isolates MV and non-MV estimation.
 
  The struct RestrictStatData mainly consists of the following
  three parts,
 
 - clause to be estimated by current logic (MV is not applicable)
 - clause to be estimated by MV-staistics.
 - list of child RestrictStatDatas, which are to be run
   recursively.
 
  mvclause_selectivty() is the topmost function where mv stats
  works. This structure effectively prevents main estimation flow
  from being broken by modifying mvstats part. Although I haven't
  measured but I'm positive the code is far reduced from yours.
 
  I attached two patches to this message. The first one is to
  rebase v7 patch to current(maybe) master and the second applies
  the refactoring.
 
  I'm a little anxious about performance but I think this makes the
  process to apply mv-stats far clearer. Regtests for mvstats
  succeeded asis except for fdep, which is not implememted in this
  patch.
 
  What do you think about this?
 
 I'm not sure, at this point. I'm having a hard time understanding how
 exactly the code works - there are pretty much no comments explaining
 the implementation, so it takes time to understand the code. This is
 especially true about transformRestrictInfoForEstimate which is also
 quite long. I understand it's a PoC, but comments would really help.

The patch itself shold hardly readable because it's not from
master but from your last patch plus somthing.

My concern about the code at the time was following,

- You embedded the logic of multivariate estimation into
  clauselist_selectivity. I think estimate using multivariate
  statistics is quite different from the ordinary estimate based
  on single column stats then they are logically separatable and
  we should do so.

- You are taking top-down approach and it runs tree-walking to
  check appliability of mv-stats for every stepping down in
  clause tree. If the subtree found to be mv-applicable, split it
  to two parts - mv-compatible and non-compatible. These steps
  requires expression tree walking, which looks using too-much
  CPU.

- You look to be considering the cases when users create many
  multivariate statistics on attribute sets having
  duplications. But it looks too-much for me. MV-stats are more
  resource-eating so we can assume the minimum usage of that.

My suggestion in the patch is a bottom-up approach to find
mv-applicable portion(s) in the expression tree, which is the
basic way of planner overall. The approach requires no repetitive
run of tree walker, that is, pull_varnos. It could fail to find
the 'optimal' solution for complex situations but needs far less
calculation for almost the same return (I think..).

Even though it doesn't consider the functional dependency, the
reduce of the code shows the efficiency. It does not nothing
tricky.

 On a conceptual level, I think the idea to split the estimation into
 two phases - enrich the expression tree with nodes with details about
 stats etc, and then actually do the estimation in the second phase
 might be interesting. Not because it's somehow clearer, but because it
 gives us a chance to see the expression tree as a whole, with details
 about all the stats (with the current code we process/estimate the
 tree incrementally). But I don't really know how useful that would be.

It is difficult to say which approach is better sinch it is
affected by what we think important than other things. However I
concern about that your code substantially reconstructs the
expression (clause) tree midst of processing it. I believe it
should be a separate phase for simplicity. Of course additional
required resource is also should be considered but it is rather
reduced for this case.

 I don't think the proposed change makes the 

Re: [HACKERS] We need to support ForeignRecheck for late row locking, don't we?

2015-07-27 Thread Etsuro Fujita
KaiGai-san,

On 2015/07/24 23:51, Kouhei Kaigai wrote:
 On 2015/07/22 19:10, Etsuro Fujita wrote:
 While working on the issue Foreign join pushdown vs EvalPlanQual, I
 happened to notice odd behaviors of late row locking in FDWs.

 I think the reason for that is because we don't check pushed-down quals
 inside an EPQ testing even if what was fetched by RefetchForeignRow was
 an updated version of the tuple rather than the same version previously
 obtained.  So, to fix this, I'd like to propose that pushed-down quals
 be checked in ForeignRecheck.

 * I've modified ForeignRecheck so as to check pushed-down quals whether
 doing late locking or early locking.

 Isn't it an option to put a new callback in ForeignRecheck?
 
 FDW driver knows its private data structure includes expression node
 that was pushed down to the remote side. So, it seems to me the best
 way to consult FDW driver whether the supplied tuple should be visible
 according to the pushed down qualifier.
 
 More or less, this fix need a new interface contract around EvalPlanQual
 logic. It is better to give FDW driver more flexibility of its private
 data structure and the way to process recheck logic, rather than special
 purpose variable.
 
 If FDW driver managed pushed-down expression in its own format, requirement
 to pushedDownQual makes them to have qualifier redundantly.
 The callback approach does not have such kind of concern.

That might be an idea, but is there any performance disadvantage as
discussed in [1]?; it looks like that that needs to perform another
remote query to see if the supplied tuple satisfies the pushed-down
quals during EPQ testing.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/5590ed5c.2040...@lab.ntt.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 08:34 AM, David Rowley wrote:

- * agg_input_types, agg_state_type, agg_result_type identify the input,
- * transition, and result types of the aggregate.  These should all be
- * resolved to actual types (ie, none should ever be ANYELEMENT etc).
+ * agg_input_types identifies the input types of the aggregate.  These
should
+ * be resolved to actual types (ie, none should ever be ANYELEMENT etc).

I'm not sure I understand why agg_state_type and agg_result_type have
changed here.


The function no longer has the agg_result_type argument, but the removal 
of agg_state_type from the comment was a mistake.



+ peraggstate-sortstates = (Tuplesortstate **)
+ palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
+ for (currentsortno = 0; currentsortno  numGroupingSets; currentsortno++)
+ peraggstate-sortstates[currentsortno] = NULL;

This was not you, but this NULL setting looks unneeded due to the palloc0().


Yeah, I noticed that too. Ok, let's take it out.


In this function I also wasn't quite sure if it was with comparing both
non-NULL INITCOND's here. I believe my code comments may slightly
contradict what the code actually does, as the comments talk about them
having to match, but the code just bails if any are non-NULL. The reason I
didn't check them was because it seems inevitable that some duplicate work
needs to be done when setting up the INITCOND. Perhaps it's worth it?


It would be nice to handle non-NULL initconds. I think you'll have to 
check that the input function isn't volatile. Or perhaps just call the 
input function, and check that the resulting Datum is byte-per-byte 
identical, although that might be awkward to do with the current code 
structure.



BTW, the name of the AggStatePerAggStateData struct is pretty horrible.
The repeated AggState feels awkward. Now that I've stared at the patch
for a some time, it doesn't bother me anymore, but it took me quite a while
to over that. I'm sure it will for others too. And it's not just that
struct, the comments talk about aggregate state, which could be confused
to mean AggState, but it actually means AggStatePerAggStateData. I don't
have any great suggestions, but can you come up a better naming scheme?


I agree, they're horrible. The thing that's causing the biggest problem is
the struct named AggState, which carries state for *all* aggregates, and
has nothing to do with transition state, so it seems there's two
different meanings if the word state and I've used both meanings in the
name for AggStatePerAggStateData.

Perhaps just renaming AggStatePerAggStateData to AggStateTransStateData
would be good enough?


Hmm. I think it should be AggStatePerTransData then, to keep the same 
pattern as AggStatePerAggData and AggStatePerGroupData.



I've attached a delta patch based on your patch, in this I've:

1. Renamed AggStatePerAggStateData to AggStateTransStateData and all
variables using that are renamed to suit better.
2. Removed surplus peraggstate-sortstates[currentsortno] = NULL; (not
related to this patch, but since we're moving that part of the code, we'd
better fix)
3. Put back the missing aggfnoid from the error message.
4. Changed initialize_aggregates() to not pass the states. They're already
in AggState and we're using aggstate-numstates to get the count of the
items in that array, so it seems wrong to allow a different array to ever
be passed in.
5. Changed wording of a few comments to try and reduce confusing of 'state'
and 'transition state'.
6. Renamed AggState.peraggstate to transstates. I pluralised this to try to
reduce confusion of the single state pointers named 'transstate' in the
functions in nodeAgg.c. I did think that peragg should also become peraggs
and pergroup should become pergroups, but didn't change those.

Anything else I changed is self explanatory.

What do you think?


Looks good, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump -Fd and compression level

2015-07-27 Thread Marc Mamin


 As per attached patch.

 Comments?

 It seems that the first test on the compression in pg_backup_tar.c is now 
 obsolete.
 It didn't make much sense anyway.



211 if (AH-compression  0 || AH-compression  9)
212 AH-compression = Z_DEFAULT_COMPRESSION;
213
214 /* Don't compress into tar files unless asked to do so */
215 if (AH-compression == Z_DEFAULT_COMPRESSION)
216 AH-compression = 0;
217
218 /*
219  * We don't support compression because reading the files 
 back is not
220  * possible since gzdopen uses buffered IO which totally 
 screws file
221  * positioning.
222  */
223 if (AH-compression != 0)
224 exit_horribly(modulename,
225  compression is not supported by tar archive 
 format\n);
226 }



In fact, the first two tests look unnecessary. Neither condition should
be possible now.


Hello,

Isn't the second test still required if you call pg_dump -Ft without setting 
-Z0 explicitly ?
(= AH-compression == Z_DEFAULT_COMPRESSION)

There still are a few suspicious places in pg_backup_tar.c 
that refer to the compression although not supported (except for blob ?)
(C programming is beyond my capabilities, I can roughly read simple code ... )

regards,
Marc Mamin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread David Rowley
On 27 July 2015 at 20:11, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/27/2015 08:34 AM, David Rowley wrote:

  In this function I also wasn't quite sure if it was with comparing both
 non-NULL INITCOND's here. I believe my code comments may slightly
 contradict what the code actually does, as the comments talk about them
 having to match, but the code just bails if any are non-NULL. The reason I
 didn't check them was because it seems inevitable that some duplicate work
 needs to be done when setting up the INITCOND. Perhaps it's worth it?


 It would be nice to handle non-NULL initconds. I think you'll have to
 check that the input function isn't volatile. Or perhaps just call the
 input function, and check that the resulting Datum is byte-per-byte
 identical, although that might be awkward to do with the current code
 structure.


Yeah, it's awkward, as I just managed to remind myself:

The aggtranstype needs to be known before we can call GetAggInitVal() on
the initval datum.

That currently happens in build_transstate_for_aggref() only when we've
decided to create a new state.

transstate-initValue = GetAggInitVal(textInitVal,
  transstate-aggtranstype);

And to get the aggtranstype:

transstate-aggtranstype =
resolve_aggregate_transtype(aggref-aggfnoid,
aggform-aggtranstype,
inputTypes,
numArguments);

Of course, not impossible, but lots of code gets duplicated.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] We need to support ForeignRecheck for late row locking, don't we?

2015-07-27 Thread Kouhei Kaigai
 On 2015/07/24 23:51, Kouhei Kaigai wrote:
  On 2015/07/22 19:10, Etsuro Fujita wrote:
  While working on the issue Foreign join pushdown vs EvalPlanQual, I
  happened to notice odd behaviors of late row locking in FDWs.
 
  I think the reason for that is because we don't check pushed-down quals
  inside an EPQ testing even if what was fetched by RefetchForeignRow was
  an updated version of the tuple rather than the same version previously
  obtained.  So, to fix this, I'd like to propose that pushed-down quals
  be checked in ForeignRecheck.
 
  * I've modified ForeignRecheck so as to check pushed-down quals whether
  doing late locking or early locking.
 
  Isn't it an option to put a new callback in ForeignRecheck?
 
  FDW driver knows its private data structure includes expression node
  that was pushed down to the remote side. So, it seems to me the best
  way to consult FDW driver whether the supplied tuple should be visible
  according to the pushed down qualifier.
 
  More or less, this fix need a new interface contract around EvalPlanQual
  logic. It is better to give FDW driver more flexibility of its private
  data structure and the way to process recheck logic, rather than special
  purpose variable.
 
  If FDW driver managed pushed-down expression in its own format, requirement
  to pushedDownQual makes them to have qualifier redundantly.
  The callback approach does not have such kind of concern.
 
 That might be an idea, but is there any performance disadvantage as
 discussed in [1]?; it looks like that that needs to perform another
 remote query to see if the supplied tuple satisfies the pushed-down
 quals during EPQ testing.

I expect the callback of ForeignRecheck runs ExecQual() towards
the qualifier expression pushed-down but saved on the private data
of ForeignScanState. It does not need to kick another remote query
(unless FDW driver is not designed), so performance disadvantage is
none or quite limited.

Also, let's assume the case when scanrelid == 0 (join pushdown).
It is easy to put special code path if scanrelid == 0, that
implies ScanState is either ForeignScan or CustomScan.
If ForeignRecheck (= recheckMtd) is called instead of the if-
block below of the Assert() on ExecScanFetch, FDW driver will be
able to put its own special code path to run alternative sub-plan.
How this alternative sub-plan works? It walks down the sub-plan
tree that is typically consists of NestLoop + ForeignScan for
example, then ExecScanFetch() is called again towards ScanState
with scanrelid  0 at that time.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >