Re: [HACKERS] [DESIGN] ParallelAppend
KaiGai-san, On 2015-07-27 PM 11:07, Kouhei Kaigai wrote: Append -- Funnel -- PartialSeqScan on rel1 (num_workers = 4) -- Funnel -- PartialSeqScan on rel2 (num_workers = 8) -- SeqScan on rel3 shall be rewritten to Funnel -- PartialSeqScan on rel1 (num_workers = 4) -- PartialSeqScan on rel2 (num_workers = 8) -- SeqScan on rel3(num_workers = 1) In the rewritten plan, are respective scans (PartialSeq or Seq) on rel1, rel2 and rel3 asynchronous w.r.t each other? Or does each one wait for the earlier one to finish? I would think the answer is no because then it would not be different from the former case, right? Because the original premise seems that (partitions) rel1, rel2, rel3 may be on different volumes so parallelism across volumes seems like a goal of parallelizing Append. From my understanding of parallel seqscan patch, each worker's PartialSeqScan asks for a block to scan using a shared parallel heap scan descriptor that effectively keeps track of division of work among PartialSeqScans in terms of blocks. What if we invent a PartialAppend which each worker would run in case of a parallelized Append. It would use some kind of shared descriptor to pick a relation (Append member) to scan. The shared structure could be the list of subplans including the mutex for concurrency. It doesn't sound as effective as proposed ParallelHeapScanDescData does for PartialSeqScan but any more granular might be complicated. For example, consider (current_relation, current_block) pair. If there are more workers than subplans/partitions, then multiple workers might start working on the same relation after a round-robin assignment of relations (but of course, a later worker would start scanning from a later block in the same relation). I imagine that might help with parallelism across volumes if that's the case. MergeAppend parallelization might involve a bit more complication but may be feasible with a PartialMergeAppend with slightly different kind of coordination among workers. What do you think of such an approach? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DESIGN] ParallelAppend
On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Hello, I'm recently working/investigating on ParallelAppend feature towards the next commit fest. Below is my design proposal. 1. Concept -- Its concept is quite simple anybody might consider more than once. ParallelAppend node kicks background worker process to execute child nodes in parallel / asynchronous. It intends to improve the performance to scan a large partitioned tables from standpoint of entire throughput, however, latency of the first multi-hundred rows are not scope of this project. From standpoint of technology trend, it primarily tries to utilize multi-cores capability within a system, but also enables to expand distributed database environment using foreign-tables inheritance features. Its behavior is very similar to Funnel node except for several points, thus, we can reuse its infrastructure we have had long- standing discussion through the v9.5 development cycle. 2. Problems to be solved - Typical OLAP workloads takes tons of tables join and scan on large tables which are often partitioned, and its KPI is query response time but very small number of sessions are active simultaneously. So, we are required to run a single query as rapid as possible even if it consumes larger computing resources than typical OLTP workloads. Current implementation to scan heap is painful when we look at its behavior from the standpoint - how many rows we can read within a certain time, because of synchronous manner. In the worst case, when SeqScan node tries to fetch the next tuple, heap_getnext() looks up a block on shared buffer, then ReadBuffer() calls storage manager to read the target block from the filesystem if not on the buffer. Next, operating system makes the caller process slept until required i/o get completed. Most of the cases are helped in earlier stage than the above worst case, however, the best scenario we can expect is: the next tuple already appear on top of the message queue (of course visibility checks are already done also) with no fall down to buffer manager or deeper. If we can run multiple scans in parallel / asynchronous, CPU core shall be assigned to another process by operating system, thus, it eventually improves the i/o density and enables higher processing throughput. Append node is an ideal point to be parallelized because - child nodes can have physically different location by tablespace, so further tuning is possible according to the system landscape. - it can control whether subplan is actually executed on background worker, per subplan basis. If subplan contains large tables and small tables, ParallelAppend may kick background worker to scan large tables only, but scan on small tables are by itself. - Like as Funnel node, we don't need to care about enhancement of individual node types. SeqScan, IndexScan, ForeignScan or others can perform as usual, but actually in parallel. 3. Implementation -- * Plan Cost ParallelAppend shall appear where Appen can appear except for the usage for dummy. So, I'll enhance set_append_rel_pathlist() to add both of AppendPath and ParallelAppendPath with cost for each. Is there a real need to have new node like ParallelAppendPath? Can't we have Funnel node beneath AppendNode and then each worker will be responsible to have SeqScan on each inherited child relation. Something like Append --- Funnel -- SeqScan rel1 -- SeqScan rel2 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] 9.5a1 BUG FIX: pgbench negative latencies
Under 9.5a1 pgbench -r negative latencies are reported on meta commands, probably as an oversight of 84f0ea3f. This patch ensures that now is reset on each loop inside doCustom. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 2c3e365..cce67e8 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -1146,16 +1146,19 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa bool trans_needs_throttle = false; instr_time now; +top: /* * gettimeofday() isn't free, so we get the current timestamp lazily the * first time it's needed, and reuse the same value throughout this * function after that. This also ensures that e.g. the calculated latency * reported in the log file and in the totals are the same. Zero means * not set yet. + * + * now must also be reset on goto top; issued when interpreting meta + * commands, otherwise the per-command measured latency is wrong. */ INSTR_TIME_SET_ZERO(now); -top: commands = sql_files[st-use_file]; /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DESIGN] ParallelAppend
On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Hello, I'm recently working/investigating on ParallelAppend feature towards the next commit fest. Below is my design proposal. 1. Concept -- Its concept is quite simple anybody might consider more than once. ParallelAppend node kicks background worker process to execute child nodes in parallel / asynchronous. It intends to improve the performance to scan a large partitioned tables from standpoint of entire throughput, however, latency of the first multi-hundred rows are not scope of this project. From standpoint of technology trend, it primarily tries to utilize multi-cores capability within a system, but also enables to expand distributed database environment using foreign-tables inheritance features. Its behavior is very similar to Funnel node except for several points, thus, we can reuse its infrastructure we have had long- standing discussion through the v9.5 development cycle. 2. Problems to be solved - Typical OLAP workloads takes tons of tables join and scan on large tables which are often partitioned, and its KPI is query response time but very small number of sessions are active simultaneously. So, we are required to run a single query as rapid as possible even if it consumes larger computing resources than typical OLTP workloads. Current implementation to scan heap is painful when we look at its behavior from the standpoint - how many rows we can read within a certain time, because of synchronous manner. In the worst case, when SeqScan node tries to fetch the next tuple, heap_getnext() looks up a block on shared buffer, then ReadBuffer() calls storage manager to read the target block from the filesystem if not on the buffer. Next, operating system makes the caller process slept until required i/o get completed. Most of the cases are helped in earlier stage than the above worst case, however, the best scenario we can expect is: the next tuple already appear on top of the message queue (of course visibility checks are already done also) with no fall down to buffer manager or deeper. If we can run multiple scans in parallel / asynchronous, CPU core shall be assigned to another process by operating system, thus, it eventually improves the i/o density and enables higher processing throughput. Append node is an ideal point to be parallelized because - child nodes can have physically different location by tablespace, so further tuning is possible according to the system landscape. - it can control whether subplan is actually executed on background worker, per subplan basis. If subplan contains large tables and small tables, ParallelAppend may kick background worker to scan large tables only, but scan on small tables are by itself. - Like as Funnel node, we don't need to care about enhancement of individual node types. SeqScan, IndexScan, ForeignScan or others can perform as usual, but actually in parallel. 3. Implementation -- * Plan Cost ParallelAppend shall appear where Appen can appear except for the usage for dummy. So, I'll enhance set_append_rel_pathlist() to add both of AppendPath and ParallelAppendPath with cost for each. Is there a real need to have new node like ParallelAppendPath? Can't we have Funnel node beneath AppendNode and then each worker will be responsible to have SeqScan on each inherited child relation. Something like Append --- Funnel -- SeqScan rel1 -- SeqScan rel2 If Funnel can handle both of horizontal and vertical parallelism, it is a great simplification. I never stick a new node. Once Funnel get a capability to have multiple child nodes, probably, Append node above will have gone. I expect set_append_rel_pathlist() add two paths based on Append and Funnel, then planner will choose the cheaper one according to its cost. We will need to pay attention another issues we will look at when Funnel kicks background worker towards asymmetric relations. If number of rows of individual child nodes are various, we may want to assign 10 background workers to scan rel1 with PartialSeqScan. On the other hands, rel2 may have very small number of rows thus its total_cost may be smaller than cost to launch a worker. In this case, Funnel has child nodes to be executed asynchronously and synchronously. If cheapest path of the child relation is a pair of Funnel and PartialSeqScan, we have to avoid to stack Funnel node. Probably, Funnel node that performs like Append needs to pull up underlying Funnel and assign equivalen number of workers as follows. Append -- Funnel -- PartialSeqScan on rel1 (num_workers = 4) -- Funnel -- PartialSeqScan on rel2 (num_workers = 8) -- SeqScan on rel3 shall be rewritten to Funnel -- PartialSeqScan on rel1 (num_workers = 4) --
Re: [HACKERS] raw output from copy
2015-07-27 10:41 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi: On 07/27/2015 06:55 AM, Craig Ringer wrote: On 7 July 2015 at 14:32, Pavel Stehule pavel.steh...@gmail.com wrote: Hi previous patch was broken, and buggy Here is new version with fixed upload and more tests I routinely see people trying to use COPY ... FORMAT binary to export a single binary field (like an image, for example) and getting confused by the header PostgreSQL adds. Or using text-format COPY and struggling with the hex escaping. It's clearly something people have trouble with. It doesn't help that while lo_import and lo_export can read paths outside the datadir (and refuse to read from within it), pg_read_binary_file is superuser only and disallows absolute paths. There's no corresponding pg_write_binary_file. So users who want to import and export a single binary field tend to try to use COPY. We have functionality for large objects that has no equivalent for 'bytea'. I don't love the use of COPY for this, but it gets us support for arbitrary clients pretty easily. Otherwise it'd be server-side only via local filesystem access, or require special psql-specific functionality like we have for lo_import etc. COPY seems like a strange interface for this. I can see the point that the syntax is almost there already, for both input and output. But even that's not quite there yet, we'd need the new RAW format. And as an input method, COPY is a bit awkward, because you cannot easily pass the file to a function, for example. I think this should be implemented in psql, along the lines of Andrew's original \bcopy patch. There are a couple of related psql-features here actually, that would be useful on their own. The first is being able to send the query result to a file, for a single query only. You can currently do: \o /tmp/foo SELECT ...; \o But more often than not, when I try to do that, I forget to do the last \o, and run another query, and the output still goes to the file. So it'd be nice to have a \o option that only affects the next query. Something like: \O /tmp/foo SELECT ...; The second feature needed is to write the output without any headers, row delimiters and such. Just the datum. And the third feature is to write it in binary. Perhaps something like: \O /tmp/foo binary SELECT blob FROM foo WHERE id = 10; What about input? This is a whole new feature, but it would be nice to be able to pass the file contents as a query parameter. Something like: \P /tmp/foo binary INSERT INTO foo VALUES (?); The example of input is strong reason, why don't do it via inserts. Only parsing some special ? symbol needs lot of new code. In this case, I don't see any advantage of psql based solution. COPY is standard interface for input/output from/to files, and it should be used there. Regards Pavel - Heikki
Re: [HACKERS] Proposal for CSN based snapshots
On Sat, Jul 25, 2015 at 11:39 AM, Simon Riggs si...@2ndquadrant.com wrote: On 24 July 2015 at 19:21, Robert Haas robertmh...@gmail.com wrote: On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs si...@2ndquadrant.com wrote: It depends on the exact design we use to get that. Certainly we do not want them if they cause a significant performance regression. Yeah. I think the performance worries expressed so far are: - Currently, if you see an XID that is between the XMIN and XMAX of the snapshot, you hit CLOG only on first access. After that, the tuple is hinted. With this approach, the hint bit doesn't avoid needing to hit CLOG anymore, because it's not enough to know whether or not the tuple committed; you have to know the CSN at which it committed, which means you have to look that up in CLOG (or whatever SLRU stores this data). Heikki mentioned adding some caching to ameliorate this problem, but it sounds like he was worried that the impact might still be significant. This seems like the heart of the problem. Changing a snapshot from a list of xids into one number is easy. Making XidInMVCCSnapshot() work is the hard part because there needs to be a translation/lookup from CSN to determine if it contains the xid. That turns CSN into a reference to a cached snapshot, or a reference by which a snapshot can be derived on demand. I got the problem. Currently, once we set hint bits don't have to visit CLOG anymore. With CSN snapshots that is not so. We still have to translate XID into CSN in order to compare it with snapshot CSN. In version of CSN patch in this thread we still have XMIN and XMAX in the snapshot. AFAICS with CSN snapshots XMIN and XMAX are not necessary required to express snapshot, they were kept for optimization. That restricts usage of XID = CSN map with given range of XIDs. However, with long running transactions [XMIN; XMAX] range could be very wide and we could use XID = CSN map heavily in wide range of XIDs. As I can see in Huawei PGCon talk Dense Map in shared memory is proposed for XID = CSN transformation. Having large enough Dense Map we can do most of XID = CSN transformations with single shared memory access. PGCon talk gives us result of pgbench. However, pgbench doesn't run any long transactions. With long running transaction we can run out of Dense Map for significant part of XID = CSN transformations. Dilip, did you test your patch with long transactions? I'm also thinking about different option for optimizing this. When we set hint bits we can also change XMIN/XMAX with CSN. In this case we wouldn't need to do XID = CSN transformation for this tuple anymore. However, we have 32-bit XIDs for now. We could also have 32-bit CSNs. However, that would doubles our troubles with wraparound: we will have 2 counters that could wraparound. That could return us to thoughts about 64-bit XIDs. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] REVOKE [ADMIN OPTION FOR] ROLE
On Thu, Jul 23, 2015 at 8:30 AM, Egor Rogov e.ro...@postgrespro.ru wrote: So, the question: is it a documentation bug (as it seems to me), code bug, or I missed something? Your analysis looks right to me, but I don't know whether the code or the documentation should be changed. This claim was added by Tom Lane in 2005 in commit 58d214e51fe50b10b4439da6ec263d54c155afbf. It might be worth checking whether the claim was true at that time and later became false, or whether it was never true to begin with. As far as I can see, modern revoke syntax for revoking membership in a role (along with admin option) was introduced in commit 7762619 (by Tom Lane, 2005). Code for handling this command didn't pay attention for restrict/cascade keywords then, as it does not now. Before that, another syntax was in use: alter group groupname drop user username [, ...]. It did not include notion of cascade at all. I guess that revoke role_name from role_name inherited [cascade|restrict] section from general revoke command but never actually used it. And I see no point in changing this, because role membership is somewhat more static than privileges. So I would propose the attached fix for documentation. Thanks, Egor Rogov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/ref/revoke.sgml b/doc/src/sgml/ref/revoke.sgml index 36c286b..8417abe 100644 --- a/doc/src/sgml/ref/revoke.sgml +++ b/doc/src/sgml/ref/revoke.sgml @@ -167,8 +167,10 @@ REVOKE [ ADMIN OPTION FOR ] /para para - When revoking membership in a role, literalGRANT OPTION/ is instead - called literalADMIN OPTION/, but the behavior is similar. + If literalADMIN OPTION FOR/ is specified, only the admin option is + revoked, not the group membership itself. When revoking membership in a + role or admin option, recursive revocation does not happen no matter + whether literalCASCADE/ is specified or not. Note also that this form of the command does not allow the noise word literalGROUP/. /para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating extension including dependencies
On Sun, Jul 26, 2015 at 1:01 AM, Petr Jelinek wrote: Yes that's what I meant by the change of checking order in the explanation above. I did that because I thought code would be more complicated otherwise, but apparently I was stupid... +In case the extension specifies schema in its control file, the schema s/schema/literalschema// +++ b/src/test/modules/test_extensions/.gitignore @@ -0,0 +1,3 @@ +# Generated subdirectories +/results/ +/tmp_check/ test_extensions/.gitignore is missing /log/. Something also has not been discussed yet: what to do with new_version and old_version (the options of CreateExtensionStmt)? As of now if those options are defined they are not passed down to the parent extensions but shouldn't we raise an error if they are used in combination with CASCADE? In any case, I think that the behavior chosen should be documented. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DESIGN] ParallelAppend
Hello, can I ask some questions? I suppose we can take this as the analog of ParalleSeqScan. I can see not so distinction between Append(ParalleSeqScan) and ParallelAppend(SeqScan). What difference is there between them? Append does not start to execute the second or later node until first node reaches end of the scan. On the other hands, ParallelAppend will kick all the child nodes (almost) simultaneously. If other nodes will have the same functionality as you mention at the last of this proposal, it might be better that some part of this feature is implemented as a part of existing executor itself, but not as a deidicated additional node, just as my asynchronous fdw execution patch patially does. (Although it lacks planner part and bg worker launching..) If that is the case, it might be better that ExecProcNode is modified so that it supports both in-process and inter-bgworker cases by the single API. What do you think about this? Its downside is that we need to adjust all the existing nodes to follow the new executor's capability. At this moment, we have 38 node types delivered from Plan. I think, it is not an easy job to review a patch that changes multi-dozens files. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com regards, Hello, I'm recently working/investigating on ParallelAppend feature towards the next commit fest. Below is my design proposal. 1. Concept -- Its concept is quite simple anybody might consider more than once. ParallelAppend node kicks background worker process to execute child nodes in parallel / asynchronous. It intends to improve the performance to scan a large partitioned tables from standpoint of entire throughput, however, latency of the first multi-hundred rows are not scope of this project. From standpoint of technology trend, it primarily tries to utilize multi-cores capability within a system, but also enables to expand distributed database environment using foreign-tables inheritance features. Its behavior is very similar to Funnel node except for several points, thus, we can reuse its infrastructure we have had long- standing discussion through the v9.5 development cycle. 2. Problems to be solved - Typical OLAP workloads takes tons of tables join and scan on large tables which are often partitioned, and its KPI is query response time but very small number of sessions are active simultaneously. So, we are required to run a single query as rapid as possible even if it consumes larger computing resources than typical OLTP workloads. Current implementation to scan heap is painful when we look at its behavior from the standpoint - how many rows we can read within a certain time, because of synchronous manner. In the worst case, when SeqScan node tries to fetch the next tuple, heap_getnext() looks up a block on shared buffer, then ReadBuffer() calls storage manager to read the target block from the filesystem if not on the buffer. Next, operating system makes the caller process slept until required i/o get completed. Most of the cases are helped in earlier stage than the above worst case, however, the best scenario we can expect is: the next tuple already appear on top of the message queue (of course visibility checks are already done also) with no fall down to buffer manager or deeper. If we can run multiple scans in parallel / asynchronous, CPU core shall be assigned to another process by operating system, thus, it eventually improves the i/o density and enables higher processing throughput. Append node is an ideal point to be parallelized because - child nodes can have physically different location by tablespace, so further tuning is possible according to the system landscape. - it can control whether subplan is actually executed on background worker, per subplan basis. If subplan contains large tables and small tables, ParallelAppend may kick background worker to scan large tables only, but scan on small tables are by itself. - Like as Funnel node, we don't need to care about enhancement of individual node types. SeqScan, IndexScan, ForeignScan or others can perform as usual, but actually in parallel. 3. Implementation -- * Plan Cost ParallelAppend shall appear where Appen can appear except for the usage for dummy. So, I'll enhance set_append_rel_pathlist() to add both of AppendPath and ParallelAppendPath with cost for each. Cost estimation logic shall take further discussions, however, I expect the logic below to estimate the cost for ParallelAppend. 1. Sum startup_cost and run_cost for each child pathnode, but distinguish according to synchronous or asynchronous. Probably, total cost of pathnode is less than: (parallel_setup_cost + its total cost / parallel_append_degree
[HACKERS] Buildfarm TAP testing is useless as currently implemented
I challenge anybody to figure out what happened here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornetdt=2015-07-27%2010%3A25%3A17 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2015-07-04%2016%3A00%3A23 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crakedt=2015-07-07%2016%3A35%3A06 With no visibility of pg_ctl's output, and no copy of the postmaster log, there is no chance of debugging intermittent failures like this one. This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing has inadequate error reporting by design. If not ok isn't enough information for you, tough beans. (It might help if the farm script captured the postmaster log after a failure, but that would do nothing for prove's unwillingness to pass through client-side messages.) I think we should disable TAP testing in the buildfarm until there is some credible form of error reporting for it. I've grown tired of looking into buildfarm failure reports only to meet a dead end. Aside from the wasted investigation time, which admittedly isn't huge, there's an opportunity cost in that subsequent test steps didn't get run. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Andreas Seltenreich seltenre...@gmx.de writes: Tom Lane writes: I've fixed the first two of these --- thanks for the report! I let sqlsmith run during the night, and it did no longer trigger the first two. During roughly a million random queries it triggered the already mentioned brin one 10 times, but there was also one instance of this new one in the log: Oh goody, more fun. I'll take a look. I'm a bit confused about this aspect of your report though, because in my hands that example fails clear back to 9.2. It doesn't seem to require the predtest.c improvement to expose the fault. Hmm, I actually used a different, uglier query to trigger this assertion for the bisection run. Ah, okay. The triggering condition for both those cases is provably-contradictory restriction clauses on an inheritance relation. In what you showed yesterday, that was something like x x AND x IS NULL, which the planner has been able to recognize as contradictory for a long time because is strict. (It did not, and still doesn't, notice that x x all by itself is contradictory...). But here it looks like the trigger is from public.b as rel4551420 where ( rel4551420.bbrel4551420.bb ) and ( rel4551420.bbrel4551420.bb ) It was the recent predtest improvements that allowed recognition that bb bb contradicts bb bb. So that's why this run started to fail there, even though the bug it was tickling is much older. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag
I need an advice, what would be better: - to add new flag like F_HAS_GARBAGE, - or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum. According to commit message: commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079 Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Fri Nov 7 15:03:46 2014 +0200 .. The code that generated a record to clear the F_TUPLES_DELETED flag hasn't existed since we got rid of old-style VACUUM FULL. I kept the code that sets the flag, although it's not used for anything anymore, because it might still be interesting information for debugging purposes that some tuples have been deleted from a page. .. If Heikki doesn't change his opinion then introduce new flag. Although I don't think that we need to keep F_TUPLES_DELETED. Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag
Hi, I'm working on microvacuum for gist access method. Briefly microvacuum includes two steps: 1. When search tells us that the tuple is invisible to all transactions it is marked LP_DEAD and page is marked as has dead tuples, 2. Then, when insert touches full page which has dead tuples it calls microvacuum instead of splitting page. You can find a kind of review here [1]. While writing patch, I found strange GISTPageOpaqueData flag - F_TUPLES_DELETED http://doxygen.postgresql.org/gist_8h.html#a23812efd70313b9b10ae61376e2594f6 . Its description looks like it is the same for BTP_HAS_GARBAGE http://doxygen.postgresql.org/nbtree_8h.html#a3b7c77849276ff8617edc1f84441c230 #define F_TUPLES_DELETED (1 2) /* some tuples on the page are dead */ #define BTP_HAS_GARBAGE (1 6) /* page has LP_DEAD tuples */ But it's definitely not the same things. I found only two mentions of this flag. Function GistMarkTuplesDeleted http://doxygen.postgresql.org/gist_8h.html#a96fc3c6bb5aecfc8d2818b7010d68aac sets the flag after dead tuples deletion. Do anyone need it at all? I found no place where this flag is checked. Is it correct using of the flag? I need an advice, what would be better: - to add new flag like F_HAS_GARBAGE, - or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum. [1] http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120 -- Best regards, Lubennikova Anastasia
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
Attatched is the revised version of this patch. The first patch is not changed from before. The second is fixed a kind of bug. Ths third is the new one to allow backslash continuation for backslash commands. Ah, thanks:-) Would you consider adding the patch to the next commitfest? I may put myself as a reviewer... [...] I'm not satisfied by the design but I don't see another way.. I'll try to have a look. I don't have idea how to deal with the copy of psqlscan.[lh] from psql. Currently they are simply the dead copies of those of psql. I think that there should be no copies, but it should use relative symbolic links so that the files are kept synchronized. Yeah, I think so but symlinks could harm on git and Windows. The another way would be make copies it from psql directory. They live next door to each other. Indeed there are plenty of links already which are generated by makefiles (see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There should no file duplication within the source tree. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] All-zero page in GIN index causes assertion failure
On 07/23/2015 07:43 PM, Jeff Janes wrote: On Mon, Jul 20, 2015 at 1:23 PM, Heikki Linnakangas hlinn...@iki.fi wrote: I came up with the attached, for GIN and SP-GiST. Instead of WAL-logging the page initialization in SP-GiST vacuum, I changed it so that it simply leaves the page as all-zeroes, and adds it to the FSM. The code that grabs a target page from the FSM handles that, and initializes the page anyway, so that was simpler. This made the SP-GiST is-deleted flag obsolete, it's no longer set, although the code still checks for it for backwards-compatibility. (even that may actually be unnecessary, as a page that's marked as deleted must also be empty, and wherever we check for the deleted-flag, we also check for PageIsEmpty())) This patch, in conjunction with the LWLock deadlock patch, fixes all the issues I was having with GIN indexes in 9.5. Thanks, I've pushed this, as well a fix to similar failure from B-tree vacuum that Amit Langote reported in the other thread. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented
On 07/27/2015 10:06 AM, Tom Lane wrote: I challenge anybody to figure out what happened here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornetdt=2015-07-27%2010%3A25%3A17 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2015-07-04%2016%3A00%3A23 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crakedt=2015-07-07%2016%3A35%3A06 With no visibility of pg_ctl's output, and no copy of the postmaster log, there is no chance of debugging intermittent failures like this one. This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing has inadequate error reporting by design. If not ok isn't enough information for you, tough beans. (It might help if the farm script captured the postmaster log after a failure, but that would do nothing for prove's unwillingness to pass through client-side messages.) I think we should disable TAP testing in the buildfarm until there is some credible form of error reporting for it. I've grown tired of looking into buildfarm failure reports only to meet a dead end. Aside from the wasted investigation time, which admittedly isn't huge, there's an opportunity cost in that subsequent test steps didn't get run. Well, it does create a lot of files that we don't pick up. An example list is show below, and I am attaching their contents in a single gzipped attachment. However, these are in the wrong location. This was a vpath build and yet these tmp_check directories are all created in the source tree. Let's fix that and then I'll set about having the buildfarm collect them. That should get us further down the track. cheers andrew /home/andrew/pgl/pg_head/src/bin/pg_controldata/tmp_check/log/regress_log_001_pg_controldata /home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_020_pg_receivexlog /home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_010_pg_basebackup /home/andrew/pgl/pg_head/src/bin/pg_rewind/regress_log /home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_003_extrafiles /home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_001_basic /home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_002_databases /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_091_reindexdb_all /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_050_dropdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_070_dropuser /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_020_createdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_102_vacuumdb_stages /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_030_createlang /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_060_droplang /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_040_createuser /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_010_clusterdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_011_clusterdb_all /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_101_vacuumdb_all /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_090_reindexdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_080_pg_isready /home/andrew/pgl/pg_head/src/bin/pg_config/tmp_check/log/regress_log_001_pg_config /home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_001_start_stop /home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_002_status /home/andrew/pgl/pg_head/src/bin/initdb/tmp_check/log/regress_log_001_initdb binlog.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Tom Lane wrote: Bottom line is that somebody failed to consider the possibility of a null comparison value reaching the BRIN index lookup machinery. The code stanza that's failing supposes that only IS NULL or IS NOT NULL tests could have SK_ISNULL set, but that's just wrong. Hm, okay, will look into that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Restore-reliability mode
Noah Misch wrote: On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote: Peter Geoghegan wrote: On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch n...@leadboat.com wrote: - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its global pin count falls to zero. Did a patch for this ever materialize? I think the first part would be something like the attached. Neat. Does it produce any new complaints during make installcheck? I only tried a few tests, for lack of time, and it didn't produce any. (To verify that the whole thing was working properly, I reduced the range of memory made available during PinBuffer and that resulted in a crash immediately). I am not really familiar with valgrind TBH and just copied a recipe to run postmaster under it, so if someone with more valgrind-fu could verify this, it would be great. This part: Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its global pin count falls to zero. can be done without any valgrind, I think. Any takers? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag
2015-07-27 20:05 GMT+04:00 Heikki Linnakangas hlinn...@iki.fi: On 07/27/2015 06:46 PM, Teodor Sigaev wrote: I need an advice, what would be better: - to add new flag like F_HAS_GARBAGE, - or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum. According to commit message: commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079 Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Fri Nov 7 15:03:46 2014 +0200 .. The code that generated a record to clear the F_TUPLES_DELETED flag hasn't existed since we got rid of old-style VACUUM FULL. I kept the code that sets the flag, although it's not used for anything anymore, because it might still be interesting information for debugging purposes that some tuples have been deleted from a page. .. If Heikki doesn't change his opinion then introduce new flag. Although I don't think that we need to keep F_TUPLES_DELETED. It's certainly not needed for anything at the moment, although conceivably we might reintroduce code that needs it in the future. There are plenty of flag bits available, so let's use a new flag. If there was a shortage, I wouldn't blink reusing F_TUPLES_DELETED. - Heikki Thanks for the quick reply -- Best regards, Lubennikova Anastasia
Re: [HACKERS] Buildfarm failure from overly noisy warning message
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: I think a LOG entry when an autovacuum process is actually canceled has value just in case it is happening on a particular table so frequently that the table starts to bloat. I see no reason to log anything if there is an intention to cancel an autovacuum process but it actually completes before we can do so. Hm. By that logic, I'm not sure if we need anything to be logged here, because the autovacuum process will log something about having received a query cancel signal. That seems sufficient to me for normal cases. If we're in the business of minimizing log chatter, I'd suggest that we remove the entirely-routine sending cancel log message, and only log something in the uncommon case where the kill() fails (but, per original point, reduce that entry to LOG or so; or else print something only for not-ESRCH cases). +1 for only printing for the non-ESRCH cases. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce ProcArrayLock contention
On Sat, Jul 25, 2015 at 12:42 AM, Amit Kapila amit.kapil...@gmail.com wrote: I thought that internal API will automatically take care of it, example for msvc it uses _InterlockedCompareExchange64 which if doesn't work on 32-bit systems or is not defined, then we have to use 32-bit version, but I am not certain about that fact. Instead of using pg_atomic_uint64, how about using pg_atomic_uint32 and storing the pgprocno rather than the pointer directly? Then it can work the same way (and be the same size) on every platform. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented
On 07/27/2015 05:06 PM, Tom Lane wrote: I challenge anybody to figure out what happened here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornetdt=2015-07-27%2010%3A25%3A17 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2015-07-04%2016%3A00%3A23 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crakedt=2015-07-07%2016%3A35%3A06 With no visibility of pg_ctl's output, and no copy of the postmaster log, there is no chance of debugging intermittent failures like this one. This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing has inadequate error reporting by design. If not ok isn't enough information for you, tough beans. (It might help if the farm script captured the postmaster log after a failure, but that would do nothing for prove's unwillingness to pass through client-side messages.) Yep. I think we should disable TAP testing in the buildfarm until there is some credible form of error reporting for it. I've grown tired of looking into buildfarm failure reports only to meet a dead end. Aside from the wasted investigation time, which admittedly isn't huge, there's an opportunity cost in that subsequent test steps didn't get run. Commit 1ea06203b - Improve logging of TAP tests - made it a lot better. The pg_ctl log should be in the log file now. The buildfarm doesn't seem to capture those logs at the moment, but that should be easy to fix. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] All-zero page in GIN index causes assertion failure
On Tue, Jul 28, 2015 at 12:21 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Thanks, I've pushed this, as well a fix to similar failure from B-tree vacuum that Amit Langote reported in the other thread. Thanks Heikki and sorry I didn't notice this new thread. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Sat, Jul 25, 2015 at 4:46 AM, Simon Riggs si...@2ndquadrant.com wrote: On 22 July 2015 at 14:59, Robert Haas robertmh...@gmail.com wrote: Urgh. So if we do this, that forever precludes having HOT pruning set the all-visible bit. What is the reason why we don't do that already? Surely its a one liner? I wish! It actually doesn't look simple to me to modify heap_page_prune to know whether any not-all-visible items remain at the end. It's definitely true that, in order for marking the page all-visible to be a possibility, we need to reach heap_page_prune_execute() with ndead == 0. But that's not enough: we also need to know that any tuple that survived the prune operation (that is, it wasn't redirected or marked unused) has a new-enough xmin, which isn't tracked anywhere. vacuumlazy.c solves that problem by making a second pass over all the tuples on the page to determine whether the page is all-visible, but that seems pretty inefficient and I think heap_page_prune() is already a CPU hog on some workloads. (Didn't Heikki have a patch to improve that?) Another sticky wicket is that, as things stand today, we'd have to insert a second WAL record to mark the page all-visible, which would add probably-significant overhead. We could probably modify the format of the record we're already emitting for pruning to carry a flag indicating whether to additionally set PD_ALL_VISIBLE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag
On 07/27/2015 06:46 PM, Teodor Sigaev wrote: I need an advice, what would be better: - to add new flag like F_HAS_GARBAGE, - or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum. According to commit message: commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079 Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Fri Nov 7 15:03:46 2014 +0200 .. The code that generated a record to clear the F_TUPLES_DELETED flag hasn't existed since we got rid of old-style VACUUM FULL. I kept the code that sets the flag, although it's not used for anything anymore, because it might still be interesting information for debugging purposes that some tuples have been deleted from a page. .. If Heikki doesn't change his opinion then introduce new flag. Although I don't think that we need to keep F_TUPLES_DELETED. It's certainly not needed for anything at the moment, although conceivably we might reintroduce code that needs it in the future. There are plenty of flag bits available, so let's use a new flag. If there was a shortage, I wouldn't blink reusing F_TUPLES_DELETED. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Several memory leaks for pg_rewind caused by missing PQclear calls
On 07/23/2015 05:08 PM, Michael Paquier wrote: Hi all, After a run of valgrind on pg_rewind, I found a couple of code paths missing some PQclear calls after running a query. Attached is a patch to fix all those leaks. Fixed, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A little RLS oversight?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 10:03 AM, Joe Conway wrote: On 07/26/2015 07:59 AM, Joe Conway wrote: On 07/26/2015 07:19 AM, Dean Rasheed wrote: Attached is an updated patch (still needs some docs for the functions). Thanks for that. I'll add the docs. Documentation added. Also added comment to check_enable_rls about passing InvalidOid versus GetUserId(). I believe this is ready to go -- any other comments? Strike that - now I really think it is ready to go :-) In this patch I additionally changed instances of: check_enable_rls(indrelid, GetUserId(), true) to check_enable_rls(indrelid, InvalidOid, true) per Dean's earlier remark and my new comment. - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVtma+AAoJEDfy90M199hl01wP+wYTV6VfBbpVEVf2+ZmbQlbJ pgquLXkXsZ9vdsw/jY09+7HKwVQFjqq+E3zjqj/Pn9Q0h17cgflPuYSvde30Mb+l 86zVD5oKLttFlCb9Ablbauc8FoYTud3D+fJkGwDPBYh5VeIlFRwQMRSKQRxKHFfr PvXmv3z7TmYGBe7dLEl24WyGncOtsJxPiHZDYA5Cna7lG+jlHqVIDz5itu6xGHgy OOLfr07aZX3Bt9zmzg1NdxcBZNc6NkSVtKFzkqrJ+rCIcoMFxyIWsVp2IAEOItFI o7hNEqrRk8yMcyX+Ej7K/6arOqCjQ6+RT+tJarCNDPv7WRXwt4PInircCjswt+uX /vMM7zhzhrW+BMc2rbkU4TKfcEfI78SxUh3jKRTMbUWM6UJPZ+ca1mo6EQGNhUaS mOMnpPD+huKXZpKlAC1ImH1boFPYqf9de6ToQRIdm7GKLUhKK8llWg3wC2GwMrtq JDojJhPUohhofMaU7YjokJWx0vAa3NckgCO4nmYvL5Sc36+QUDlW4Amm43el7PvB SkD2B0AvLZFmMJlrh3eAnuDleXzjRmVc1WoJtGGT2qwmL9oSDtT6y4Uh+0VnDJkh T7XJ1NgvwFGNzG/heVTv346Mah2wRl/4A43jpojzQLjbNZ7t2gi8h9DkanA7/iGK JOmMBbIfVlKnT+SKEOVJ =WZhM -END PGP SIGNATURE- diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 98d1497..fd82ea4 100644 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SET search_path TO replaceableschema/ *** 15259,15264 --- 15259,15270 entrytypeboolean/type/entry entrydoes current user have privilege for role/entry /row + row +entryliteralfunctionrow_security_active/function(parametertable/parameter)/literal +/entry +entrytypeboolean/type/entry +entrydoes current user have row level security active for table/entry + /row /tbody /tgroup /table *** SET search_path TO replaceableschema/ *** 15299,15304 --- 15305,15313 indexterm primarypg_has_role/primary /indexterm +indexterm + primaryrow_security_active/primary +/indexterm para functionhas_table_privilege/function checks whether a user *** SELECT has_function_privilege('joeuser', *** 15462,15467 --- 15471,15483 are immediately available without doing commandSET ROLE/. /para +para + functionrow_security_active/function checks whether row level + security is active for the specified table in the context of the + functioncurrent_user/function and environment. The table can + be specified by name or by OID. +/para + para xref linkend=functions-info-schema-table shows functions that determine whether a certain object is firsttermvisible/ in the diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index 1043362..aa5b28c 100644 *** a/src/backend/access/index/genam.c --- b/src/backend/access/index/genam.c *** BuildIndexValueDescription(Relation inde *** 204,210 Assert(indexrelid == idxrec-indexrelid); /* RLS check- if RLS is enabled then we don't return anything. */ ! if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED) { ReleaseSysCache(ht_idx); return NULL; --- 204,210 Assert(indexrelid == idxrec-indexrelid); /* RLS check- if RLS is enabled then we don't return anything. */ ! if (check_enable_rls(indrelid, InvalidOid, true) == RLS_ENABLED) { ReleaseSysCache(ht_idx); return NULL; diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e82a53a..c0bd6fa 100644 *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** CREATE VIEW pg_indexes AS *** 150,156 LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace) WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i'; ! CREATE VIEW pg_stats AS SELECT nspname AS schemaname, relname AS tablename, --- 150,156 LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace) WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i'; ! CREATE VIEW pg_stats WITH (security_barrier) AS SELECT nspname AS schemaname, relname AS tablename, *** CREATE VIEW pg_stats AS *** 211,217 FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid) JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum) LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace) ! WHERE NOT attisdropped AND has_column_privilege(c.oid, a.attnum, 'select'); REVOKE ALL on
Re: [HACKERS] A little RLS oversight?
On 07/26/2015 07:59 AM, Joe Conway wrote: On 07/26/2015 07:19 AM, Dean Rasheed wrote: Attached is an updated patch (still needs some docs for the functions). Thanks for that. I'll add the docs. Documentation added. Also added comment to check_enable_rls about passing InvalidOid versus GetUserId(). I believe this is ready to go -- any other comments? Joe diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 98d1497..fd82ea4 100644 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SET search_path TO replaceableschema/ *** 15259,15264 --- 15259,15270 entrytypeboolean/type/entry entrydoes current user have privilege for role/entry /row + row +entryliteralfunctionrow_security_active/function(parametertable/parameter)/literal +/entry +entrytypeboolean/type/entry +entrydoes current user have row level security active for table/entry + /row /tbody /tgroup /table *** SET search_path TO replaceableschema/ *** 15299,15304 --- 15305,15313 indexterm primarypg_has_role/primary /indexterm +indexterm + primaryrow_security_active/primary +/indexterm para functionhas_table_privilege/function checks whether a user *** SELECT has_function_privilege('joeuser', *** 15462,15467 --- 15471,15483 are immediately available without doing commandSET ROLE/. /para +para + functionrow_security_active/function checks whether row level + security is active for the specified table in the context of the + functioncurrent_user/function and environment. The table can + be specified by name or by OID. +/para + para xref linkend=functions-info-schema-table shows functions that determine whether a certain object is firsttermvisible/ in the diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e82a53a..c0bd6fa 100644 *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** CREATE VIEW pg_indexes AS *** 150,156 LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace) WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i'; ! CREATE VIEW pg_stats AS SELECT nspname AS schemaname, relname AS tablename, --- 150,156 LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace) WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i'; ! CREATE VIEW pg_stats WITH (security_barrier) AS SELECT nspname AS schemaname, relname AS tablename, *** CREATE VIEW pg_stats AS *** 211,217 FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid) JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum) LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace) ! WHERE NOT attisdropped AND has_column_privilege(c.oid, a.attnum, 'select'); REVOKE ALL on pg_statistic FROM public; --- 211,219 FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid) JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum) LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace) ! WHERE NOT attisdropped ! AND has_column_privilege(c.oid, a.attnum, 'select') ! AND (c.relrowsecurity = false OR NOT row_security_active(c.oid)); REVOKE ALL on pg_statistic FROM public; diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c index aaf0061..2386cf0 100644 *** a/src/backend/rewrite/rowsecurity.c --- b/src/backend/rewrite/rowsecurity.c *** get_row_security_policies(Query *root, C *** 107,113 Relation rel; Oid user_id; - int sec_context; int rls_status; bool defaultDeny = false; --- 107,112 *** get_row_security_policies(Query *root, C *** 117,138 *hasRowSecurity = false; *hasSubLinks = false; ! /* This is just to get the security context */ ! GetUserIdAndSecContext(user_id, sec_context); /* Switch to checkAsUser if it's set */ user_id = rte-checkAsUser ? rte-checkAsUser : GetUserId(); - /* - * If this is not a normal relation, or we have been told to explicitly - * skip RLS (perhaps because this is an FK check) then just return - * immediately. - */ - if (rte-relid FirstNormalObjectId - || rte-relkind != RELKIND_RELATION - || (sec_context SECURITY_ROW_LEVEL_DISABLED)) - return; - /* Determine the state of RLS for this, pass checkAsUser explicitly */ rls_status = check_enable_rls(rte-relid, rte-checkAsUser, false); --- 116,128 *hasRowSecurity = false; *hasSubLinks = false; ! /* If this is not a normal relation, just return immediately */ ! if (rte-relkind != RELKIND_RELATION) ! return; /* Switch to checkAsUser if it's set */ user_id = rte-checkAsUser ? rte-checkAsUser :
Re: [HACKERS] Reduce ProcArrayLock contention
On Sat, Jul 25, 2015 at 10:12 AM, Amit Kapila amit.kapil...@gmail.com wrote: Numbers look impressive and definitely shows that the idea is worth pursuing. I tried patch on my laptop. Unfortunately, at least for 4 and 8 clients, I did not see any improvement. I can't help in this because I think we need somewhat bigger m/c to test the impact of patch. I understand. IMHO it will be a good idea though to ensure that the patch does not cause regression for other setups such as a less powerful machine or while running with lower number of clients. I was telling that fact even without my patch. Basically I have tried by commenting ProcArrayLock in ProcArrayEndTransaction. I did not get that. You mean the TPS is same if you run with commenting out ProcArrayLock in ProcArrayEndTransaction? Is that safe to do? But those who don't get the lock will sleep and hence the contention is moved somewhere else, at least partially. Sure, if contention is reduced at one place it will move to next lock. What I meant was that the lock may not show up in the contention because all but one processes now sleep for their work to do be done by the group leader. So the total time spent in the critical section remains the same, but not shown in the profile. Sure, your benchmark numbers show this is still better than all processes contending for the lock. No, autovacuum generates I/O due to which sometimes there is more variation in Write tests. Sure, but on an average does it still show similar improvements? Or does the test becomes IO bound and hence the bottleneck shifts somewhere else? Can you please post those numbers as well when you get chance? I can do something like that if others also agree with this new API in LWLock series, but personally I don't think LWLock.c is the right place to expose API for this work. Broadly the work we are doing can be thought of below sub-tasks. 1. Advertise each backend's xid. 2. Push all backend's except one on global list. 3. wait till some-one wakes and check if the xid is cleared, repeat untll the xid is clear 4. Acquire the lock 5. Pop all the backend's and clear each one's xid and used their published xid to advance global latestCompleteXid. 6. Release Lock 7. Wake all the processes waiting for their xid to be cleared and before waking mark that Xid of the backend is clear. So among these only step 2 can be common among different algorithms, other's need some work specific to each optimization. Right, but if we could encapsulate that work in a function that just needs to work on some shared memory, I think we can build such infrastructure. Its possible though a more elaborate infrastructure is needed that just one function pointer. For example, in this case, we also want to set the latestCompletedXid after clearing xids for all pending processes. Regarding the patch, the compare-and-exchange function calls that you've used would work only for 64-bit machines, right? You would need to use equivalent 32-bit calls on a 32-bit machine. I thought that internal API will automatically take care of it, example for msvc it uses _InterlockedCompareExchange64 which if doesn't work on 32-bit systems or is not defined, then we have to use 32-bit version, but I am not certain about that fact. Hmm. The pointer will be a 32-bit field on a 32-bit machine. I don't know how exchanging that with 64-bit integer be safe. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
[HACKERS] Alpha 2 wrapping August 3
... so please get those fixes/overhauls in in the next week. Thanks! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
On Mon, Jul 27, 2015 at 2:37 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-07-27 20:32 GMT+02:00 Robert Haas robertmh...@gmail.com: On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It will be nice side effect, but my primary problem was a impossibility to combine VACUUM and any other statement to one simple psql call. Seems like you can do that easily enough: [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql ?column? -- 1 (1 row) VACUUM ?column? -- 2 (1 row) how I can do it with xargs? I don't specifically what you're trying to do, but I bet it's not that hard. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Mon, Jul 27, 2015 at 2:43 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Robert Haas wrote: On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I think this is already possible, is it not? You just have to look for an identically-identified pg_locks entry with granted=true. That gives you a PID and vxid/xid. You can self-join pg_locks with that, and join to pg_stat_activity. I remember we discussed having a layer of system views on top of pg_stat_activity and pg_locks, probably defined recursively, that would show the full graph of waiters/lockers. It isn't necessarily the case that A is waiting for a unique process B. It could well be the case that A wants AccessExclusiveLock and many processes hold a variety of other lock types. Sure, but I don't think this makes it impossible to figure out who's locking who. I think the only thing you need other than the data in pg_locks is the conflicts table, which is well documented. Oh, hmm, one thing missing is the ordering of the wait queue for each locked object. If process A holds RowExclusive on some object, process B wants ShareLock (stalled waiting) and process C wants AccessExclusive (also stalled waiting), who of B and C is woken up first after A releases the lock depends on order of arrival. Agreed - it would be nice to expose that somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It will be nice side effect, but my primary problem was a impossibility to combine VACUUM and any other statement to one simple psql call. Seems like you can do that easily enough: [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql ?column? -- 1 (1 row) VACUUM ?column? -- 2 (1 row) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: David Rowley wrote: I've not looked into the feasibility of it, but if it were also possible to have a waiting_for column which would store the process ID of the process that's holding a lock that this process is waiting on, then it would be possible for some smart guy to write some code which draws beautiful graphs, perhaps in Pg Admin 4 of which processes are blocking other processes. I imagine this as a chart with an icon for each process. Processes waiting on locks being released would have an arrow pointing to their blocking process, if we clicked on that blocking process we could see the query that it's running and various other properties that are existing columns in pg_stat_activity. I think this is already possible, is it not? You just have to look for an identically-identified pg_locks entry with granted=true. That gives you a PID and vxid/xid. You can self-join pg_locks with that, and join to pg_stat_activity. I remember we discussed having a layer of system views on top of pg_stat_activity and pg_locks, probably defined recursively, that would show the full graph of waiters/lockers. It isn't necessarily the case that A is waiting for a unique process B. It could well be the case that A wants AccessExclusiveLock and many processes hold a variety of other lock types. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
Robert Haas wrote: On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I think this is already possible, is it not? You just have to look for an identically-identified pg_locks entry with granted=true. That gives you a PID and vxid/xid. You can self-join pg_locks with that, and join to pg_stat_activity. I remember we discussed having a layer of system views on top of pg_stat_activity and pg_locks, probably defined recursively, that would show the full graph of waiters/lockers. It isn't necessarily the case that A is waiting for a unique process B. It could well be the case that A wants AccessExclusiveLock and many processes hold a variety of other lock types. Sure, but I don't think this makes it impossible to figure out who's locking who. I think the only thing you need other than the data in pg_locks is the conflicts table, which is well documented. Oh, hmm, one thing missing is the ordering of the wait queue for each locked object. If process A holds RowExclusive on some object, process B wants ShareLock (stalled waiting) and process C wants AccessExclusive (also stalled waiting), who of B and C is woken up first after A releases the lock depends on order of arrival. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
David Rowley wrote: I've not looked into the feasibility of it, but if it were also possible to have a waiting_for column which would store the process ID of the process that's holding a lock that this process is waiting on, then it would be possible for some smart guy to write some code which draws beautiful graphs, perhaps in Pg Admin 4 of which processes are blocking other processes. I imagine this as a chart with an icon for each process. Processes waiting on locks being released would have an arrow pointing to their blocking process, if we clicked on that blocking process we could see the query that it's running and various other properties that are existing columns in pg_stat_activity. I think this is already possible, is it not? You just have to look for an identically-identified pg_locks entry with granted=true. That gives you a PID and vxid/xid. You can self-join pg_locks with that, and join to pg_stat_activity. I remember we discussed having a layer of system views on top of pg_stat_activity and pg_locks, probably defined recursively, that would show the full graph of waiters/lockers. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVOKE [ADMIN OPTION FOR] ROLE
Egor, * Egor Rogov (e.ro...@postgrespro.ru) wrote: On Thu, Jul 23, 2015 at 8:30 AM, Egor Rogov e.ro...@postgrespro.ru wrote: So, the question: is it a documentation bug (as it seems to me), code bug, or I missed something? Your analysis looks right to me, but I don't know whether the code or the documentation should be changed. This claim was added by Tom Lane in 2005 in commit 58d214e51fe50b10b4439da6ec263d54c155afbf. It might be worth checking whether the claim was true at that time and later became false, or whether it was never true to begin with. As far as I can see, modern revoke syntax for revoking membership in a role (along with admin option) was introduced in commit 7762619 (by Tom Lane, 2005). Code for handling this command didn't pay attention for restrict/cascade keywords then, as it does not now. Before that, another syntax was in use: alter group groupname drop user username [, ...]. It did not include notion of cascade at all. I guess that revoke role_name from role_name inherited [cascade|restrict] section from general revoke command but never actually used it. And I see no point in changing this, because role membership is somewhat more static than privileges. So I would propose the attached fix for documentation. Have you looked at the SQL spec at all for this..? That's what we really should be looking at to determine if this is a documentation issue or a code issue. I'll take a look in a day or two after I've caught up on other things, if no one beats me to it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] spgist recovery assertion failure
On 07/27/2015 04:24 PM, Michael Paquier wrote: On Mon, Jul 27, 2015 at 2:33 PM, Piotr Stefaniak postg...@piotr-stefaniak.me wrote: On 07/27/2015 07:19 AM, Michael Paquier wrote: On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch n...@leadboat.com wrote: When I caused a crash during the create_index regression test, recovery hit an assertion failure. Minimal test case: psql -X EOSQL CREATE TABLE t (c text); INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000); INSERT INTO t VALUES ('P0123456789abcdefF'); CREATE INDEX ON t USING spgist (c); EOSQL pg_ctl -m immediate -w restart On which platform are you seeing the failure? I am afraid I could not reproduce the failure on Linux and OSX after testing it on HEAD. I'm having the same symptoms with 159cff58cf3b565be3c17901698a74238e9e23f8 on Ubuntu Linux 3.4.39 armv7l. Yes, on armv7l this can be indeed reproduced. Fixed, thanks everyone! The problem was that in the WAL format change patch, I had used char in a struct to hold -1, but char is unsigned on PowerPC and ARM. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more RLS oversights
Joe Conway wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/03/2015 10:03 AM, Noah Misch wrote: (2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for each role in the TO clause. Test case: Please see the attached patch. Note that I used SHARED_DEPENDENCY_ACL for this. It seems appropriate, but possibly we should invent a new shared dependency type for this use case? Comments? Hmm, these are not ACL objects, so conceptually it seems cleaner to use a different symbol for this. I think the catalog state and the error messages would be a bit confusing otherwise. if (spec-roletype == ROLESPEC_PUBLIC) { ! Datum *tmp_role_oids; ! ! if (*num_roles != 1) ereport(WARNING, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(ignoring roles specified other than public), errhint(All roles are members of the public role.))); !*num_roles = 1; ! tmp_role_oids = (Datum *) palloc(*num_roles * sizeof(Datum)); ! tmp_role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC); Isn't this leaking the previously allocated array? Not sure it's all that critical, but still. (I don't think you really need to call palloc at all here.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
On 07/27/2015 02:53 PM, Pavel Stehule wrote: I am trying to run parallel execution psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3 psql -c select current_database() I don't think it's going to be a hugely important feature, but I don't see a problem with creating a new option (-C seems fine) which would have the same effect as if the arguments were contatenated into a file which is then used with -f. IIRC -c has some special characteristics which means it's probably best not to try to extend it for this feature. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
2015-07-27 20:32 GMT+02:00 Robert Haas robertmh...@gmail.com: On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It will be nice side effect, but my primary problem was a impossibility to combine VACUUM and any other statement to one simple psql call. Seems like you can do that easily enough: [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql ?column? -- 1 (1 row) VACUUM ?column? -- 2 (1 row) how I can do it with xargs? Regards Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] anole: assorted stability problems
On Sun, Jul 26, 2015 at 11:36 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-07 13:25:24 +0200, Andres Freund wrote: So, it's starting to look good. Not exactly allowing for a lot of confidence yet, but still: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anolebr=HEAD Since there have not been any relevant failures since, I'm going to remove this issue from the open items list. Woohoo! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
2015-07-27 20:47 GMT+02:00 Robert Haas robertmh...@gmail.com: On Mon, Jul 27, 2015 at 2:37 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-07-27 20:32 GMT+02:00 Robert Haas robertmh...@gmail.com: On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It will be nice side effect, but my primary problem was a impossibility to combine VACUUM and any other statement to one simple psql call. Seems like you can do that easily enough: [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql ?column? -- 1 (1 row) VACUUM ?column? -- 2 (1 row) how I can do it with xargs? I don't specifically what you're trying to do, but I bet it's not that hard. I am trying to run parallel execution psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3 psql -c select current_database() Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[HACKERS] WIP: Make timestamptz_out less slow.
Hi, I recently once more noticed that timestamptz_out is really, really slow. To quantify that, I created a bunch of similar sized tables: CREATE TABLE tbl_timestamp AS SELECT NOW() FROM generate_series(1, 10) a, generate_series(1, 100) b; CREATE TABLE tbl_int8 AS SELECT 1::bigint FROM generate_series(1, 10) a, generate_series(1, 100) b; CREATE TABLE tbl_bytea AS SELECT ' '::bytea FROM generate_series(1, 10) a, generate_series(1, 100) b; These all end up being 346MB large. COPY tbl_bytea TO '/dev/null'; Time: 1173.484 ms COPY tbl_int8 TO '/dev/null'; Time: 1030.756 ms COPY tbl_timestamp TO '/dev/null'; Time: 6598.030 (all best of three) Yes, timestamp outputs more data as a whole, but being 5 times as slow is still pretty darn insane. To make sure that's not the cause, here's another table: CREATE TABLE tbl_timestamptext AS SELECT NOW()::text FROM generate_series(1, 10) a, generate_series(1, 100) b; COPY tbl_timestamptext TO '/dev/null'; Time: 1449.554 ms So it's really just the timestamp code. Profiling it shows: Overhead Command Shared Object Symbol - 38.33% postgres_stock libc-2.19.so [.] vfprintf - 97.92% vfprintf _IO_vsprintf - sprintf + 70.25% EncodeDateTime + 29.75% AppendSeconds.constprop.10 + 1.11% _IO_default_xsputn -8.22% postgres_stock libc-2.19.so [.] _IO_default_xsputn - 99.43% _IO_default_xsputn - 90.09% vfprintf _IO_vsprintf - sprintf + 74.15% EncodeDateTime + 25.85% AppendSeconds.constprop.10 + 9.72% _IO_padn + 0.57% vfprintf + 7.76% postgres_stock postgres_stock[.] CopyOneRowTo So nearly all the time is spent somewhere inside the sprintf calls. Not nice. The only thing I could come up to make the sprintfs cheaper was to combine them into one and remove some of the width specifiers that aren't needed. That doesn't buy us very much. I then proceeded to replace the sprintf call with hand-rolled conversions. And wow, the benefit is far bigger than I'd assumed: postgres[7236][1]=# COPY tbl_timestamp TO '/dev/null'; Time: 2430.521 ms So, by hand-rolling the ISO conversion in EncodeDateTime() we got a ~250% performance improvement. I'd say that's worthwhile. The attached patch shows what I did. While there's some polishing possible, as a whole, it's pretty ugly. But I think timestamp data is so common that it's worth the effort. Does anybody have a fundamentally nicer idea than the attached to improvide this? Greetings, Andres Freund From 1d7b6110f8864ee00c1fe4f54d8937347ade7d80 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 27 Jul 2015 23:09:33 +0200 Subject: [PATCH] WIP: faster timestamp[tz]_out --- src/backend/utils/adt/datetime.c| 108 src/test/regress/expected/horology.out | 24 --- src/test/regress/expected/timestamp.out | 62 +++--- src/test/regress/sql/timestamp.sql | 1 + 4 files changed, 164 insertions(+), 31 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 2a44b6e..4c13f01 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -3968,7 +3968,115 @@ EncodeDateTime(struct pg_tm * tm, fsec_t fsec, bool print_tz, int tz, const char switch (style) { +#ifdef HAVE_INT64_TIMESTAMP + case USE_ISO_DATES: + /* + * Fastpath for most common format and range. Not using sprintf + * provides significant performance benefits, and timestamp data + * is pretty common. All sane use cases dealing with large amounts + * of data use iso timestamps, so we can focus on optimizing + * those, and keep the rest of the code maintainable. + */ + if (tm-tm_year 0 tm-tm_year 1) + { +int year = (tm-tm_year 0) ? tm-tm_year : -(tm-tm_year - 1); + +*str++ = (year / 1000) + '0'; +*str++ = (year / 100) % 10 + '0'; +*str++ = (year / 10) % 10 + '0'; +*str++ = year % 10 + '0'; +*str++ = '-'; +*str++ = (tm-tm_mon / 10) + '0'; +*str++ = tm-tm_mon % 10 + '0'; +*str++ = '-'; +*str++ = (tm-tm_mday / 10) + '0'; +*str++ = tm-tm_mday % 10 + '0'; +*str++ = ' '; +*str++ = (tm-tm_hour / 10) + '0'; +*str++ = tm-tm_hour % 10 + '0'; +*str++ = ':'; +*str++ = (tm-tm_min / 10) + '0'; +*str++ = tm-tm_min % 10 + '0'; +*str++ = ':'; +*str++ = (tm-tm_sec / 10) + '0'; +*str++ = tm-tm_sec % 10 + '0'; + +/* + * Yes, this is darned ugly and would look nicer in a loop, + * but some versions of gcc can't convert the divisions into + * more efficient instructions unless manually unrolled. + */ +if (fsec != 0) +{ + int fseca = abs(fsec); + + *str++ = '.'; + + if (fseca % 100 != 0) + { + *str++ = (fseca / 10) + '0'; + + if (fseca % 10 != 0) + { + *str++ = ((fseca /
Re: [HACKERS] A little RLS oversight?
On 27 July 2015 at 21:58, Stephen Frost sfr...@snowman.net wrote: Dean, * Dean Rasheed (dean.a.rash...@gmail.com) wrote: On 27 July 2015 at 18:13, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 10:03 AM, Joe Conway wrote: On 07/26/2015 07:59 AM, Joe Conway wrote: On 07/26/2015 07:19 AM, Dean Rasheed wrote: Attached is an updated patch (still needs some docs for the functions). Thanks for that. I'll add the docs. Documentation added. Also added comment to check_enable_rls about passing InvalidOid versus GetUserId(). I believe this is ready to go -- any other comments? Strike that - now I really think it is ready to go :-) In this patch I additionally changed instances of: check_enable_rls(indrelid, GetUserId(), true) to check_enable_rls(indrelid, InvalidOid, true) per Dean's earlier remark and my new comment. Looks good to me, except I'm not sure about those latest changes because I don't understand the reasoning behind the logic in check_enable_rls() when row_security is set to OFF. I would expect that if the current user has permission to bypass RLS, and they have set row_security to OFF, then it should be off for all tables that they have access to, regardless of how they access those tables (directly or through a view). If it really is intentional that RLS remains active when querying through a view not owned by the table's owner, then the other calls to check_enable_rls() should probably be left as they were, since the table might have been updated through such a view and that code can't really tell at that point. Joe and I were discussing this earlier and it was certainly intentional that RLS still be enabled if you're querying through a view as the RLS rights of the view owner are used, not your own. Note that we don't allow a user to assume the BYPASSRLS right of the view owner though, also intentionally. As a comparison to what we do today, even if you have access to a table, if you query it through a view, it's the view owner's permissions which are used to determine access to the table through the view, not your own. I agree that can be a bit odd at times, as you can get a permission denied error when using the view even though you have access to the table which is complained about, but that's how views have worked for quite a long time. OK, fair enough. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Optimization idea: merging multiple EXISTS(...) with constraint-based join removal
I've frequently encountered queries where the WHERE clause contains a lot of EXISTS(...) subqueries linked by logical ANDs. They are typically generated queries, and the EXISTS(...) fragments are composable pieces of SQL code acting as filters. Moreover, all those filters ends up being very similar: the joins used to navigate in the objects graph often involves the same tables. Unfortunately, PostgreSQL doesn't seem to be smart enough to simplify them. In fact, according to my experiments, even dead-simple uncorrelated subqueries such as SELECT * FROM t1 WHERE EXISTS(SELECT * FROM t2) AND EXISTS(SELECT * FROM t2) are not simplified. (the execution plan is rather naive: t2 is checked twice for non-emptiness). I've not seen anything related to that topic on the TODO list or in the mailing list. Consequently, I would like to propose a method which can be used to merge multiple EXISTS(...) semi-joins linked by logical ANDs. For the sake of this example, the database schema is: create table t1(id int primary key, t2_id int, t6_id int) create table t2(id int primary key, t3_id int) create table t3(id int primary key, t4_id int, t5_id int) create table t4(id int primary key, val text) create table t5(id int primary key, val text) create table t6(id int primary key, val text) Here is a machine-generated query. Notice that the path t1-t2-t3 is used twice. SELECT * FROM t1 WHERE EXISTS( SELECT 1 FROM t2, t3, t4 WHERE t2.id = t1.t2_id AND t3.id = t2.t3_id AND t4.id = t3.t4_id AND t4.val = 'XYZ' ) AND EXISTS( SELECT 1 FROM t2, t3, t5 WHERE t2.id = t1.t2_id AND t3.id = t2.t3_id AND t5.id = t3.t5_id AND t5.val = 'Blablabla' ) AND EXISTS( SELECT 1 FROM t6 WHERE t6.id = t1.t6_id AND t6.val = 'Hello' ) The following transformation can be applied: SELECT * FROM [...] WHERE EXISTS(SELECT 1 FROM [...tables_1...] WHERE [...conditions_1...]) AND EXISTS(SELECT 1 FROM [...tables_2...] WHERE [...conditions_2...]) = SELECT * FROM [...] WHERE EXISTS( SELECT 1 FROM (SELECT 1 FROM [...tables_1...] WHERE [...conditions_1...]) q1, (SELECT 1 FROM [...tables_2...] WHERE [...conditions_2...]) q2 ) The resulting query is: SELECT * FROM t1 WHERE EXISTS( SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5, t6 WHERE t2_a.id = t1.t2_id AND t3_a.id = t2_a.t3_id AND t4_a.id = t3_a.t4_id AND t4_a.val = 'XYZ' AND t2_b.id = t1.t2_id AND t3_b.id = t2_b.t3_id AND t5.id = t3_b.t5_id AND t5.val = 'Blablabla' AND t6.id = t1.t6_id AND t6.val = 'Hello' ) Given the constraints, it can be shown that t2_a = t2_b AND t3_a = t3_b : * t2_a.id = t1.t2_id AND t2_b.id = t1.t2_id = t2_a.id = t2_b.id * t2_a.id = t2_b.id AND t2(id) is a pkey = t2_a = t2_b * t2_a = t2_b = t2_a.t3_id = t2_b.t3_id * t2_a.t3_id = t2_b.t3_id AND t3_b.id = t2_b.t3_id AND t3_a.id = t2_a.t3_id = t3_b.id = t3_a.id * t3_b.id = t3_a.id AND t3(id) is a pkey = t3_a = t3_b It is then safe to remove the joins t2_b and t3_b in the subquery, because they have been proved to be duplicates of t2_a and t3_a. SELECT * FROM t1 WHERE EXISTS( SELECT 1 FROM t2, t3, t4, t2, t3, t5, t6 WHERE t2.id = t1.t2_id AND t3.id = t2.t3_id AND t4.id = t3.t4_id AND t5.id = t3.t5_id AND t4.val = 'XYZ' AND t5.val = 'Blablabla' AND t6.id = t1.t6_id AND t6.val = 'Hello' ) Using a connected component analysis on join predicates, joined tables can be separated and we get back our original query with the first two EXISTS(...) semi-joins merged. SELECT * FROM t1 WHERE EXISTS( SELECT 1 FROM t2, t3, t4, t2, t3, t5 WHERE t2.id = t1.t2_id AND t3.id = t2.t3_id AND t4.id = t3.t4_id AND t5.id = t3.t5_id AND t4.val = 'XYZ' AND t5.val = 'Blablabla' ) AND EXISTS( SELECT 1 FROM t6 WHERE t6.id = t1.t6_id AND t6.val = 'Hello' ) My questions are: - Does PostgreSQL already supports this optimization ? Maybe it's not enabled in my case only? - If yes, is my reasoning incorrect ? Can you point me my mistake ? - Otherwise is there any plan to add this optimization to PostgreSQL ? Thank you ! Frédéric Terrazzoni
Re: [HACKERS] WIP: Make timestamptz_out less slow.
On 2015-07-27 17:31:41 -0400, Tom Lane wrote: Andres Freund and...@anarazel.de writes: So nearly all the time is spent somewhere inside the sprintf calls. Not nice. What happens if you force use of port/snprintf.c instead of glibc's version? Good question. Even worse. 15900.014 ms. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A little RLS oversight?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 01:25 PM, Dean Rasheed wrote: Looks good to me, except I'm not sure about those latest changes because I don't understand the reasoning behind the logic in check_enable_rls() when row_security is set to OFF. I would expect that if the current user has permission to bypass RLS, and they have set row_security to OFF, then it should be off for all tables that they have access to, regardless of how they access those tables (directly or through a view). If it really is intentional that RLS remains active when querying through a view not owned by the table's owner, then the other calls to check_enable_rls() should probably be left as they were, since the table might have been updated through such a view and that code can't really tell at that point. After looking again at those three call sites, I'm now on the fence. All three are in functions which are trying to avoid leaking info in error messages. If we leave the original coding, then the messages will remain the same for a given user regardless of the row_security setting, whereas if we change them as in my latest patch, the messages will be different depending on row_security for users with BYPASSRLS. I think if we do the former (original coding) there should probably be a comment added to indicate we are doing that intentionally. Thoughts? - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVtqRuAAoJEDfy90M199hlIQEQAIli3fJHPbpBBPocIjzo04EH 78YTiRYlb58ZK9l+VKj+sA9JbOdUVEes168hJjHSdnw6HcjJnKY+J3+aKjcRgXu2 s197hMOiP2+nqj0r0+KX/W8cuHT/4x5NtQ46BR9UoAPGdW9139CSbf3nQ9gTIllR PQs7TRJIsJRWhuZhX5eCvQqix+RUYY2PKPMNdo8OLQpZxlsA7ezWr5QuDBx0PYFd WTkaOsRxpAtfPBQrt+0xnX8oKi1pF4QLGt0Nb0J5XQmxUbKUdQsYY7+iu7Utrmf2 i5TORkX5HIpyH1N6R5Zu9wyRiOpLQv8SyH0kWovDA2neMlrApkM8kYfTYZAPIQUE H6fOs6bafMMP4vWH7CwDhOwasccoLwdkEg80wiGnn5Nu+K4nq4k6Dq05oq+G7ZL1 6vZCXR0zts1TuX4abwtAcURaNbw+y7D/1XN9Dn0kDwuV3cXRh2tc33r/0SJ9XQFj +gEdqptm3gyIgBExGlZDwNtpHwHEs2xqFjIBChyDV3cMjvFeTlYgiFiJlm40Ac/3 zelJ6hpsttAHWBg+42MoogrV7wKdCLOH/npwRx0zw5hH3meMs3ydQibtUwb/z+yl 6zl7xDMrTDOg/gV+gXbruVzSQIgNmfDEXmcHDKRr2IQwcNXXkTzEiIxJBRB7FhJg GgXBUGlGDKRGXF8Koauy =jCXT -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
On Mon, Jul 27, 2015 at 2:53 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I am trying to run parallel execution psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3 psql -c select current_database() Put this in a shell script called run-psql: #!/bin/bash test $# = 0 exit for f in ${@:1:$(($#-1))}; do echo $f \; done | psql ${@:$#} Then: psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3 ./run-psql select current_database() vacuum select 1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On 22 July 2015 at 17:11, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com wrote: Attached is a patch that implements the vm scan for truncation. It introduces a variable to hold the last blkno which was skipped during the forward portion. Any blocks after both this blkno and after the last inspected nonempty page (which the code is already tracking) must have been observed to be empty by the current vacuum. Any other process rendering the page nonempty are required to clear the vm bit, and no other process can set the bit again during the vacuum's lifetime. So if the bit is still set, the page is still empty without needing to inspect it. Urgh. So if we do this, that forever precludes having HOT pruning set the all-visible bit. I wouldn't say forever, as it would be easy to revert the change if something more important came along that conflicted with it. I think what is being said here is that someone is already using this technique, or if not, then we actively want to encourage them to do so as an extension or as a submission to core. In that case, I think the rely-on-VM technique sinks again, sorry Jim, Jeff. Probably needs code comments added. That does still leave the prefetch technique, so all is not lost. Can we see a patch with just prefetch, probably with a simple choice of stride? Thanks. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Autonomous Transaction is back
On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus j...@agliodbs.com wrote: Batch Jobs: large data-manipulation tasks which need to be broken up into segments, with each segment committing separately. Example: updating 1 million records in batches of 1000. Autonomous transactions are not a good fit for this case; stored procedures are a better way to go for any scenario where you don't want be be in a snapshot (for example, suppose you want to change isolation level on the fly). Hm, you mean we need real stored procedures in PostgreSQL and not just functions? If not, I think it would be sufficient to add Autonomous Transaction support to the type of functions we already have in pg to allow writing a batch job function which would commit after X numbers of modified rows, instead of having to write a script in an external language such as Perl to call the function in a while-loop and commit in between each function call. However, we should also add a way for the caller to protect against an Autonomous Transaction in a function called by the caller. Imagine if you're the author of function X() and within X() make use of some other function Y() which has been written by some other author, and within your function X(), it's very important either all of your work or none at all gets committed, then you need to make sure none of the changes you made before calling Y() gets committed, and thus we need a way to prevent Y() from starting and committing an Autonomous Transaction, otherwise we would increase the risk and complexity of working with functions and plpgsql in PostgreSQL as you would then need to be sure none of the functions you are using within a function will start and commit an ATX.
Re: [HACKERS] A little RLS oversight?
On 27 July 2015 at 18:13, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 10:03 AM, Joe Conway wrote: On 07/26/2015 07:59 AM, Joe Conway wrote: On 07/26/2015 07:19 AM, Dean Rasheed wrote: Attached is an updated patch (still needs some docs for the functions). Thanks for that. I'll add the docs. Documentation added. Also added comment to check_enable_rls about passing InvalidOid versus GetUserId(). I believe this is ready to go -- any other comments? Strike that - now I really think it is ready to go :-) In this patch I additionally changed instances of: check_enable_rls(indrelid, GetUserId(), true) to check_enable_rls(indrelid, InvalidOid, true) per Dean's earlier remark and my new comment. Looks good to me, except I'm not sure about those latest changes because I don't understand the reasoning behind the logic in check_enable_rls() when row_security is set to OFF. I would expect that if the current user has permission to bypass RLS, and they have set row_security to OFF, then it should be off for all tables that they have access to, regardless of how they access those tables (directly or through a view). If it really is intentional that RLS remains active when querying through a view not owned by the table's owner, then the other calls to check_enable_rls() should probably be left as they were, since the table might have been updated through such a view and that code can't really tell at that point. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A little RLS oversight?
Dean, * Dean Rasheed (dean.a.rash...@gmail.com) wrote: On 27 July 2015 at 18:13, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 10:03 AM, Joe Conway wrote: On 07/26/2015 07:59 AM, Joe Conway wrote: On 07/26/2015 07:19 AM, Dean Rasheed wrote: Attached is an updated patch (still needs some docs for the functions). Thanks for that. I'll add the docs. Documentation added. Also added comment to check_enable_rls about passing InvalidOid versus GetUserId(). I believe this is ready to go -- any other comments? Strike that - now I really think it is ready to go :-) In this patch I additionally changed instances of: check_enable_rls(indrelid, GetUserId(), true) to check_enable_rls(indrelid, InvalidOid, true) per Dean's earlier remark and my new comment. Looks good to me, except I'm not sure about those latest changes because I don't understand the reasoning behind the logic in check_enable_rls() when row_security is set to OFF. I would expect that if the current user has permission to bypass RLS, and they have set row_security to OFF, then it should be off for all tables that they have access to, regardless of how they access those tables (directly or through a view). If it really is intentional that RLS remains active when querying through a view not owned by the table's owner, then the other calls to check_enable_rls() should probably be left as they were, since the table might have been updated through such a view and that code can't really tell at that point. Joe and I were discussing this earlier and it was certainly intentional that RLS still be enabled if you're querying through a view as the RLS rights of the view owner are used, not your own. Note that we don't allow a user to assume the BYPASSRLS right of the view owner though, also intentionally. As a comparison to what we do today, even if you have access to a table, if you query it through a view, it's the view owner's permissions which are used to determine access to the table through the view, not your own. I agree that can be a bit odd at times, as you can get a permission denied error when using the view even though you have access to the table which is complained about, but that's how views have worked for quite a long time. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On 7/27/15 1:46 PM, Robert Haas wrote: On Mon, Jul 27, 2015 at 2:43 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Robert Haas wrote: On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I think this is already possible, is it not? You just have to look for an identically-identified pg_locks entry with granted=true. That gives you a PID and vxid/xid. You can self-join pg_locks with that, and join to pg_stat_activity. I remember we discussed having a layer of system views on top of pg_stat_activity and pg_locks, probably defined recursively, that would show the full graph of waiters/lockers. It isn't necessarily the case that A is waiting for a unique process B. It could well be the case that A wants AccessExclusiveLock and many processes hold a variety of other lock types. Sure, but I don't think this makes it impossible to figure out who's locking who. I think the only thing you need other than the data in pg_locks is the conflicts table, which is well documented. Oh, hmm, one thing missing is the ordering of the wait queue for each locked object. If process A holds RowExclusive on some object, process B wants ShareLock (stalled waiting) and process C wants AccessExclusive (also stalled waiting), who of B and C is woken up first after A releases the lock depends on order of arrival. Agreed - it would be nice to expose that somehow. +1. It's very common to want to know who's blocking who, and not at all easy to do that today. We should at minimum have a canonical example of how to do it, but something built in would be even better. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 07/27/2015 02:41 PM, Joel Jacobson wrote: However, we should also add a way for the caller to protect against an Autonomous Transaction in a function called by the caller. Imagine if you're the author of function X() and within X() make use of some other function Y() which has been written by some other author, and within your function X(), it's very important either all of your work or none at all gets committed, then you need to make sure none of the changes you made before calling Y() gets committed, and thus we need a way to prevent Y() from starting and committing an Autonomous Transaction, otherwise we would increase the risk and complexity of working with functions and plpgsql in PostgreSQL as you would then need to be sure none of the functions you are using within a function will start and commit an ATX. Ah, you're missing how commits in ATX are expected to work. Let me illustrate: X ( Data write A1 call Y( Start ATX Data write B1 Commit ATX ) Data write A2 Exception ) In this workflow, B1 would be committed and persistent. Neither A1 nor A2 would be committed, or visible to other users. Depending on what implementation we end up with, A1 might not even be visible to Y(). So that solves your use case without any need to block ATXs in called functions. However, it leads to some interesting cases involving self-deadlocks; see the original post on this thread. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] copy.c handling for RLS is insecure
All, * Stephen Frost (sfr...@snowman.net) wrote: * Andres Freund (and...@anarazel.de) wrote: On 2015-07-09 01:28:28 -0400, Noah Misch wrote: - Keep the OID check, shouldn't hurt to have it What benefit is left? A bit of defense in depth. We execute user defined code in COPY (e.g. BEFORE triggers). That user defined code could very well replace the relation. Now I think right now that'd happen late enough, so the second lookup already happened. But a bit more robust defense against that sounds good to me. Attached patch keeps the relation locked, fully qualifies it when building up the query, and uses list_member_oid() to check that the relation's OID ends up in the resulting relationOids list (to address Noah's point that the planner doesn't guarantee the ordering; I doubt that list will ever be more than a few entries long). Also removes the misguided Assert(). Barring objections, I'll commit this (and backpatch to 9.5) tomorrow. Apologies for not pushing this before I left on vacation. I've done so now. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] proposal: multiple psql option -c
On 7/27/15 2:57 PM, Andrew Dunstan wrote: psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3 psql -c select current_database() I don't think it's going to be a hugely important feature, but I don't see a problem with creating a new option (-C seems fine) which would have the same effect as if the arguments were contatenated into a file which is then used with -f. IIRC -c has some special characteristics which means it's probably best not to try to extend it for this feature. +1. I've occasionally wanted this as well. I'd also vote for -C returning every state string as well. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Make timestamptz_out less slow.
Andres Freund and...@anarazel.de writes: So nearly all the time is spent somewhere inside the sprintf calls. Not nice. What happens if you force use of port/snprintf.c instead of glibc's version? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On 07/27/2015 02:47 AM, Rajeev rastogi wrote: Why have any fixed maximum? Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that resources can be controlled. Is there a particular reason why this limit wouldn't just be max_stack_depth? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Make timestamptz_out less slow.
Andres Freund and...@anarazel.de writes: On 2015-07-27 17:31:41 -0400, Tom Lane wrote: What happens if you force use of port/snprintf.c instead of glibc's version? Even worse. 15900.014 ms. Interesting. So as a separate optimization problem, we might consider try to put snprintf.c at least on a par with glibc. I'm kind of surprised by this result really, since snprintf.c lacks a lot of the bells and whistles that are in glibc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics / patch v7
Hello Horiguchi-san, On 07/27/2015 09:04 AM, Kyotaro HORIGUCHI wrote: Hello, At Sat, 25 Jul 2015 23:09:31 +0200, Tomas Vondra tomas.von...@2ndquadrant.com wrote in 55b3fb0b.7000...@2ndquadrant.com Hi, On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote: Hi, I'd like to show you the modified constitution of multivariate statistics application logic. Please find the attached. They apply on your v7 patch. Sadly I do have some trouble getting it to apply correctly :-( So for now all my comments are based on just reading the code. Ah. My modification to rebase to the master for the time should be culprit. Sorry for the dirty patch. # I would recreate the patch if you complained before struggling # with the thing.. The core of the modification is on closesel.c. I attached the patched closesel.c. I don't see any attachment. Perhaps you forgot to actually attach it? My concern about the code at the time was following, - You embedded the logic of multivariate estimation into clauselist_selectivity. I think estimate using multivariate statistics is quite different from the ordinary estimate based on single column stats then they are logically separatable and we should do so. I don't see them as very different, actually quite the opposite. The two kinds of statistics are complementary and should naturally coexist. Perhaps the current code is not perfect and a refactoring would make the code more readable, but I don't think it's primary aim should be to separate regular and multivariate stats. - You are taking top-down approach and it runs tree-walking to check appliability of mv-stats for every stepping down in clause tree. If the subtree found to be mv-applicable, split it to two parts - mv-compatible and non-compatible. These steps requires expression tree walking, which looks using too-much CPU. I'm taking top-down approach because that's what the regular stats do, and also because that's what allows implementing the features that I think are interesting - ability to combine multiple stats in an efficient way, pass conditions and such. I think those two features are very useful and allow very interesting things. The bottom-up would work too, probably - I mean, we could start from leaves of the expression tree, and build the largest subtree compatible with multivariate stats and then try to estimate it. I don't see how we could pass conditions though, which works naturally in the top-down approach. Or maybe a combination of both - identify the compatible subtrees first, then perform the top-down phase. - You look to be considering the cases when users create many multivariate statistics on attribute sets having duplications. But it looks too-much for me. MV-stats are more resource-eating so we can assume the minimum usage of that. Not really. I don't expect huge numbers of multivariate stats to be built on the tables. But I think restricting the users to use a single multivariate statistics per table would be a significant limitation. And once you allow using multiple multivariate statistics for the set of clauses, supporting over-lapping stats is not that difficult. What it however makes possible is combining multiple small stats into a larger one in a very efficient way - it assumes the overlap is sufficient, of course. But if that's true you may build multiple small (and very accurate) stats instead of one huge (or very inaccurate) statistics. This also makes it possible to handle complex combinations of clauses that are compatible and incompatible with multivariate statistics, by passing the conditions. My suggestion in the patch is a bottom-up approach to find mv-applicable portion(s) in the expression tree, which is the basic way of planner overall. The approach requires no repetitive run of tree walker, that is, pull_varnos. It could fail to find the 'optimal' solution for complex situations but needs far less calculation for almost the same return (I think..). Even though it doesn't consider the functional dependency, the reduce of the code shows the efficiency. It does not nothing tricky. OK On a conceptual level, I think the idea to split the estimation into two phases - enrich the expression tree with nodes with details about stats etc, and then actually do the estimation in the second phase might be interesting. Not because it's somehow clearer, but because it gives us a chance to see the expression tree as a whole, with details about all the stats (with the current code we process/estimate the tree incrementally). But I don't really know how useful that would be. It is difficult to say which approach is better sinch it is affected by what we think important than other things. However I concern about that your code substantially reconstructs the expression (clause) tree midst of processing it. I believe it should be a separate phase for simplicity. Of course additional required resource is also
Re: [HACKERS] Autonomous Transaction is back
On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson j...@trustly.com wrote: On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus j...@agliodbs.com wrote: Batch Jobs: large data-manipulation tasks which need to be broken up into segments, with each segment committing separately. Example: updating 1 million records in batches of 1000. Autonomous transactions are not a good fit for this case; stored procedures are a better way to go for any scenario where you don't want be be in a snapshot (for example, suppose you want to change isolation level on the fly). Hm, you mean we need real stored procedures in PostgreSQL and not just functions? Yes, exactly. Autonomous transactions aren't really set up for cases where the function runs for a very long time or indefinitely. This is the 'advancing xmin' problem as Josh puts it but I think the problem is much bigger than that. Anyways, this is mostly irrelevant to autonomous transactions as long as the design isn't extended to try and cover that case. Is the Autonomous Transaction feature only going to be exposed through pl/pgsql? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A little RLS oversight?
On 27 July 2015 at 22:36, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 01:25 PM, Dean Rasheed wrote: Looks good to me, except I'm not sure about those latest changes because I don't understand the reasoning behind the logic in check_enable_rls() when row_security is set to OFF. I would expect that if the current user has permission to bypass RLS, and they have set row_security to OFF, then it should be off for all tables that they have access to, regardless of how they access those tables (directly or through a view). If it really is intentional that RLS remains active when querying through a view not owned by the table's owner, then the other calls to check_enable_rls() should probably be left as they were, since the table might have been updated through such a view and that code can't really tell at that point. After looking again at those three call sites, I'm now on the fence. All three are in functions which are trying to avoid leaking info in error messages. If we leave the original coding, then the messages will remain the same for a given user regardless of the row_security setting, whereas if we change them as in my latest patch, the messages will be different depending on row_security for users with BYPASSRLS. I think if we do the former (original coding) there should probably be a comment added to indicate we are doing that intentionally. Thoughts? I could go either way on that, but I don't think it's too serious to worry about leaking information to users with BYPASSRLS. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization idea: merging multiple EXISTS(...) with constraint-based join removal
On 28 July 2015 at 09:37, Frédéric TERRAZZONI frederic.terrazz...@gmail.com wrote: SELECT * FROM t1 WHERE EXISTS( SELECT 1 FROM t2, t3, t4 WHERE t2.id = t1.t2_id AND t3.id = t2.t3_id AND t4.id = t3.t4_id AND t4.val = 'XYZ' ) AND EXISTS( SELECT 1 FROM t2, t3, t5 WHERE t2.id = t1.t2_id AND t3.id = t2.t3_id AND t5.id = t3.t5_id AND t5.val = 'Blablabla' ) AND EXISTS( SELECT 1 FROM t6 WHERE t6.id = t1.t6_id AND t6.val = 'Hello' ) ... The resulting query is: SELECT * FROM t1 WHERE EXISTS( SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5, t6 WHERE t2_a.id = t1.t2_id AND t3_a.id = t2_a.t3_id AND t4_a.id = t3_a.t4_id AND t4_a.val = 'XYZ' AND t2_b.id = t1.t2_id AND t3_b.id = t2_b.t3_id AND t5.id = t3_b.t5_id AND t5.val = 'Blablabla' AND t6.id = t1.t6_id AND t6.val = 'Hello' ) My questions are: - Does PostgreSQL already supports this optimization ? Maybe it's not enabled in my case only? No, there's nothing which supports that currently. - If yes, is my reasoning incorrect ? Can you point me my mistake ? It sounds reasonable to me. - Otherwise is there any plan to add this optimization to PostgreSQL ? I did propose the idea here http://www.postgresql.org/message-id/CAApHDvopmWq4i2BCf0VqU4mYmxzHCYwfnUMat9TWuKzdvo7=8...@mail.gmail.com but I didn't focus just with semi-joins. Without re-reading, I think I was talking about any join that could be proved to not duplicate rows could be consolidated. I do believe that this optimisation would be useful in more cases than most people might think, for example: UPDATE t1 SET col1 = (SELECT col1 FROM t2 WHERE t1.id=t2.id), col2 = (SELECT col2 FROM t2 WHERE t1.id=t2.id), ...; Of course, this query could have been written using UPDATE/FROM, (non-standard), or UPDATE t1 SET (col1,col2) = (SELECT ...), which was only added recently. There's also the case of the view which just has 1 column missing, so the consumer joins a table that's already been joined to in the view. I think it would be quite nice to have this, and I don't think it would be all that expensive for the planner to detect this. I think the planner would have to do something like: 1. Scan simple_rte_array looking for relids which are the same as another entry's. 2. If found, is the join condition the same as the other one? 3. Is there a unique index to prove that joining to this does not duplicate rows, or is it a semi-join? 4. Remove relation and replace all Vars from the removed relation with the one from the other table, mark relation as REL_DEAD. I think 1 is quite cheap to perform, so normal queries wouldn't suffer much of a slow-down from these extra checks, as most queries won't have self joins. Are you thinking of working on it? Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Autonomous Transaction is back
On 7/27/15 5:56 PM, Josh Berkus wrote: Can you explain what use case you have where simply telling the staff if you use ATX without clearing it, you'll be fired is not sufficient? Possibly there's something we failed to account for in the unconference discussion. That there's no way to enforce that, short of hand-auditing code? There's already enough things that are difficult/impossible to enforce, I'd rather not add another one. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction is back
On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus j...@agliodbs.com wrote: Ah, you're missing how commits in ATX are expected to work. Let me illustrate: X ( Data write A1 call Y( Start ATX Data write B1 Commit ATX ) Data write A2 Exception ) In this workflow, B1 would be committed and persistent. Neither A1 nor A2 would be committed, or visible to other users. Depending on what implementation we end up with, A1 might not even be visible to Y(). So that solves your use case without any need to block ATXs in called functions. However, it leads to some interesting cases involving self-deadlocks; see the original post on this thread. I don't follow. In your example above, if I'm X(), how do I ensure Y() won't have committed anyting at all when I later at Exception decide to rollback everything from Data write A1 to Data write A2 including any writes made by Y() (in the example Data write B1)? I understand the Exception will take care of rollbacking my (X's) writes, but that's not sufficient if you want to make sure you rollback *everything*, including any writes made by functions you call. Right now, when writing a function, if you raise an exception, you can be sure all writes you have made will be rollbacked, but your caller function might caught the exception and decide to carry on and commit work made before your function was called, but at least you can be confident your writes won't be committed as long as you don't caught the exception you raised in your own function. If I understand it correctly, that would change with the addition of Autonomous Transaction, unless given a way to prevent a function you call from starting and commiting a Autonomous Transaction. Wrong? If so, then please show how to prevent Y() from commiting the Data write B1 in your example, I don't get it.
Re: [HACKERS] A little RLS oversight?
On Monday, July 27, 2015, Dean Rasheed dean.a.rash...@gmail.com wrote: On 27 July 2015 at 22:36, Joe Conway m...@joeconway.com javascript:; wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 01:25 PM, Dean Rasheed wrote: Looks good to me, except I'm not sure about those latest changes because I don't understand the reasoning behind the logic in check_enable_rls() when row_security is set to OFF. I would expect that if the current user has permission to bypass RLS, and they have set row_security to OFF, then it should be off for all tables that they have access to, regardless of how they access those tables (directly or through a view). If it really is intentional that RLS remains active when querying through a view not owned by the table's owner, then the other calls to check_enable_rls() should probably be left as they were, since the table might have been updated through such a view and that code can't really tell at that point. After looking again at those three call sites, I'm now on the fence. All three are in functions which are trying to avoid leaking info in error messages. If we leave the original coding, then the messages will remain the same for a given user regardless of the row_security setting, whereas if we change them as in my latest patch, the messages will be different depending on row_security for users with BYPASSRLS. I think if we do the former (original coding) there should probably be a comment added to indicate we are doing that intentionally. Thoughts? I could go either way on that, but I don't think it's too serious to worry about leaking information to users with BYPASSRLS. AFK at the moment, but my thinking was that we should avoid having the error message change based on what a GUC is set to. I agree that there should be comments which explain that. Thanks! Stephen
Re: [HACKERS] Autonomous Transaction is back
On 07/27/2015 03:12 PM, Joel Jacobson wrote: On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com wrote: Ah, you're missing how commits in ATX are expected to work. Let me illustrate: X ( Data write A1 call Y( Start ATX Data write B1 Commit ATX ) Data write A2 Exception ) In this workflow, B1 would be committed and persistent. Neither A1 nor A2 would be committed, or visible to other users. Depending on what implementation we end up with, A1 might not even be visible to Y(). So that solves your use case without any need to block ATXs in called functions. However, it leads to some interesting cases involving self-deadlocks; see the original post on this thread. I don't follow. In your example above, if I'm X(), how do I ensure Y() won't have committed anyting at all when I later at Exception decide to rollback everything from Data write A1 to Data write A2 including any writes made by Y() (in the example Data write B1)? Ah, ok. The goal of the project is that the writer of X() *cannot* prevent Y() from writing its data (B1) and committing it. One of the primary use cases for ATX is audit triggers. If a function writer could override ATX and prevent the audit triggers from committing, then that use case would be violated. Can you explain what use case you have where simply telling the staff if you use ATX without clearing it, you'll be fired is not sufficient? Possibly there's something we failed to account for in the unconference discussion. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On 7/27/15 10:39 AM, Robert Haas wrote: But that's not enough: we also need to know that any tuple that survived the prune operation (that is, it wasn't redirected or marked unused) has a new-enough xmin, which isn't tracked anywhere. Wouldn't that be old-enough xmin? heap_prune_chain() is already calling HeapTupleSatisfiesVacuum, so it should be able to figure out if the page is all visible without a lot of extra work, and pass that info back to heap_page_prune (which would then pass it down to _execute()). Definitely not a one-liner though. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Make timestamptz_out less slow.
On 28 July 2015 at 09:17, Andres Freund and...@anarazel.de wrote: Hi, I recently once more noticed that timestamptz_out is really, really slow. To quantify that, I created a bunch of similar sized tables: CREATE TABLE tbl_timestamp AS SELECT NOW() FROM generate_series(1, 10) a, generate_series(1, 100) b; CREATE TABLE tbl_int8 AS SELECT 1::bigint FROM generate_series(1, 10) a, generate_series(1, 100) b; CREATE TABLE tbl_bytea AS SELECT ' '::bytea FROM generate_series(1, 10) a, generate_series(1, 100) b; These all end up being 346MB large. COPY tbl_bytea TO '/dev/null'; Time: 1173.484 ms COPY tbl_int8 TO '/dev/null'; Time: 1030.756 ms COPY tbl_timestamp TO '/dev/null'; Time: 6598.030 (all best of three) Yes, timestamp outputs more data as a whole, but being 5 times as slow is still pretty darn insane. To make sure that's not the cause, here's another table: CREATE TABLE tbl_timestamptext AS SELECT NOW()::text FROM generate_series(1, 10) a, generate_series(1, 100) b; COPY tbl_timestamptext TO '/dev/null'; Time: 1449.554 ms So it's really just the timestamp code. Profiling it shows: Overhead Command Shared Object Symbol - 38.33% postgres_stock libc-2.19.so [.] vfprintf - 97.92% vfprintf _IO_vsprintf - sprintf + 70.25% EncodeDateTime + 29.75% AppendSeconds.constprop.10 + 1.11% _IO_default_xsputn -8.22% postgres_stock libc-2.19.so [.] _IO_default_xsputn - 99.43% _IO_default_xsputn - 90.09% vfprintf _IO_vsprintf - sprintf + 74.15% EncodeDateTime + 25.85% AppendSeconds.constprop.10 + 9.72% _IO_padn + 0.57% vfprintf + 7.76% postgres_stock postgres_stock[.] CopyOneRowTo So nearly all the time is spent somewhere inside the sprintf calls. Not nice. The only thing I could come up to make the sprintfs cheaper was to combine them into one and remove some of the width specifiers that aren't needed. That doesn't buy us very much. I then proceeded to replace the sprintf call with hand-rolled conversions. And wow, the benefit is far bigger than I'd assumed: postgres[7236][1]=# COPY tbl_timestamp TO '/dev/null'; Time: 2430.521 ms So, by hand-rolling the ISO conversion in EncodeDateTime() we got a ~250% performance improvement. I'd say that's worthwhile. The attached patch shows what I did. While there's some polishing possible, as a whole, it's pretty ugly. But I think timestamp data is so common that it's worth the effort. Does anybody have a fundamentally nicer idea than the attached to improvide this? It won't be quite as fast as what you've written, but I think it will be much neater and more likely to be used in other places if we invent a function like pg_ltoa() which returns a pointer to the new end of string. Also if we're specifying padding with zeros then we can skip the reverse part that's in pg_ltoa(), (normally needed since the numeric string is build in reverse) The code could then be written as: str = pg_int2str_pad(str, year, 4); *str++ = '-'; str = pg_int2str_pad(str, tm-tm_mon, 2); *str++ = '-'; str = pg_int2str_pad(str, tm-tm_mday, 2); etc I've used this method before and found it to be about 10 times faster than snprintf(), but I was reversing the string, so quite likely it be more than 10 times. I'm interested to see how much you're really gaining by manually unrolling the part that builds the fractional part of the second. We could just build that part with: (untested) if (fsec != 0) { int fseca = abs(fsec); while (fseca % 10 == 0 fseca 0) fseca /= 10; *str++ = '.'; str = pg_int2str(str, fseca); } Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Autonomous Transaction is back
On 7/27/15 5:12 PM, Joel Jacobson wrote: Right now, when writing a function, if you raise an exception, you can be sure all writes you have made will be rollbacked, but your caller function might caught the exception and decide to carry on and commit work made before your function was called, but at least you can be confident your writes won't be committed as long as you don't caught the exception you raised in your own function. If I understand it correctly, that would change with the addition of Autonomous Transaction, unless given a way to prevent a function you call from starting and commiting a Autonomous Transaction. Wrong? If so, then please show how to prevent Y() from commiting the Data write B1 in your example, I don't get it. What's being described here doesn't make sense in either use case ([1] [2]), but I do understand the concern about what 3rd party software is doing. It would be nice to have the ability to disallow and/or disable autonomous transactions, but I don't see a practical way of doing that other than introducing a new GUC. I'm not sure if it's worth that effort. [1] the batch process use case: batches that still hold their own transaction open don't gain anything. [2] the audit logging case. If you didn't care about auditing surviving regardless of a rollback then you wouldn't go to the extra work of an autonomous transaction to begin with. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner debug views
On Thu, Jul 23, 2015 at 4:11 PM, Tatsuo Ishii is...@postgresql.org wrote: Sounds like a great feature! Thanks! Attached is a draft patch implementing the idea. To play with it, you shall create the follow two foreign tables: CREATE EXTENSION file_fdw; CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw; create foreign table pg_planner_rels(rel text, content text)server pglog options(filename 'your_install/data/debug_planner_relopt.csv', format 'csv'); create foreign table pg_planner_paths(rel text, path text, replacedby text, reason int, startupcost float, totalcost float, cheapest text, innerp text, outerp text, content text) server pglog options(filename 'your_install/data/debug_planner_paths.csv', format 'csv'); Example output attached. Questions: 1. Which document shall we update? This is more than existing debug_print_ knobs. 2. GEQO is not supported yet. I would suggest we do that with a separate check in. 3. Where do we want to put the csv files? Currently I just put them under /data. 4. Do we want to push these two foreign tables into system_view.sql? One problem is that foreign table needs a absolute path. Any way to handle this? 5. As the output is csv file: I wrap strings with '' but not sure within the string itself if there any. Do we have any guarantee here? Thanks, Qingqing --- postgres=# select p.rel, p.path, p.replacedby, p.reason, p.startupcost, p.totalcost, p.cheapest, p.innerp, p.outerp, substr(p.content, 1,30),r.content from pg_planner_paths p join pg_planner_rels r on p.rel=r.rel; rel| path| replacedby | reason | startupcost | totalcost | cheapest | innerp | outerp | substr |content ---+---+++-+---+--+---+---++ 0x2791a10 | 0x279d4b0 ||| 0 | 40.1 | +total+startup+param | | | ForeignScan(1) rows=301 cost=0 | RELOPTINFO (1): rows=301 width=244 0x279f998 | 0x27a2238 ||| 0 | 1.1 | +total+startup+param | | | ForeignScan(1) rows=1 cost=0.0 | RELOPTINFO (1): rows=1 width=244 0x279fbd0 | 0x27a28b8 ||| 0 | 1.1 | +total+startup+param | | | ForeignScan(2) rows=1 cost=0.0 | RELOPTINFO (2): rows=1 width=64 0x27a2ab0 | 0x27a3c68 ||| 0 | 2.21 | +total+startup+param | 0x27a28b8 | 0x27a2238 | NestLoop(1 2) rows=1 cost=0.00 | RELOPTINFO (1 2): rows=1 width=308 0x27a2ab0 | 0x27a4608 | 0x27a4608 | 2 |1.11 | 2.23 | | 0x27a2238 | 0x27a28b8 | HashJoin(1 2) rows=1 cost=1.11 | RELOPTINFO (1 2): rows=1 width=308 0x27a2ab0 | 0x27a4498 | 0x27a4498 | 0 | 0 | 2.22 | | 0x27a4330 | 0x27a28b8 | NestLoop(1 2) rows=1 cost=0.00 | RELOPTINFO (1 2): rows=1 width=308 0x27a2ab0 | 0x27a4388 | 0x27a4388 | 0 | 0 | 2.21 | | 0x27a2238 | 0x27a28b8 | NestLoop(1 2) rows=1 cost=0.00 | RELOPTINFO (1 2): rows=1 width=308 0x27a2ab0 | 0x27a4220 | 0x27a4220 | 2 |2.22 | 2.25 | | 0x27a2238 | 0x27a28b8 | MergeJoin(1 2) rows=1 cost=2.2 | RELOPTINFO (1 2): rows=1 width=308 0x27a2ab0 | 0x27a3f90 | 0x27a3f90 | 2 |1.11 | 2.23 | | 0x27a28b8 | 0x27a2238 | HashJoin(1 2) rows=1 cost=1.11 | RELOPTINFO (1 2): rows=1 width=308 0x27a2ab0 | 0x27a3e20 | 0x27a3e20 | 0 | 0 | 2.22 | | 0x27a3c10 | 0x27a2238 | NestLoop(1 2) rows=1 cost=0.00 | RELOPTINFO (1 2): rows=1 width=308 0x27a2ab0 | 0x27a3b18 | 0x27a3c68 | 1 |2.22 | 2.25 | | 0x27a28b8 | 0x27a2238 | MergeJoin(1 2) rows=1 cost=2.2 | RELOPTINFO (1 2): rows=1 width=308 0002-local-change.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On 27 July 2015 at 20:11, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/27/2015 08:34 AM, David Rowley wrote: - * agg_input_types, agg_state_type, agg_result_type identify the input, - * transition, and result types of the aggregate. These should all be - * resolved to actual types (ie, none should ever be ANYELEMENT etc). + * agg_input_types identifies the input types of the aggregate. These should + * be resolved to actual types (ie, none should ever be ANYELEMENT etc). I'm not sure I understand why agg_state_type and agg_result_type have changed here. The function no longer has the agg_result_type argument, but the removal of agg_state_type from the comment was a mistake. I've put agg_state_type back in the attached delta which is again based on your version of the patch. + peraggstate-sortstates = (Tuplesortstate **) + palloc0(sizeof(Tuplesortstate *) * numGroupingSets); + for (currentsortno = 0; currentsortno numGroupingSets; currentsortno++) + peraggstate-sortstates[currentsortno] = NULL; This was not you, but this NULL setting looks unneeded due to the palloc0(). Yeah, I noticed that too. Ok, let's take it out. Removed in attached. In this function I also wasn't quite sure if it was with comparing both non-NULL INITCOND's here. I believe my code comments may slightly contradict what the code actually does, as the comments talk about them having to match, but the code just bails if any are non-NULL. The reason I didn't check them was because it seems inevitable that some duplicate work needs to be done when setting up the INITCOND. Perhaps it's worth it? It would be nice to handle non-NULL initconds. I think you'll have to check that the input function isn't volatile. Or perhaps just call the input function, and check that the resulting Datum is byte-per-byte identical, although that might be awkward to do with the current code structure. I've not done anything with this. I'd not thought of an input function being volatile before, but I guess it's possible, which makes me a bit scared that we could be treading on ground we shouldn't be. I know it's more of an output function thing than an input function thing, but a GUC like extra_float_digits could cause problems here. In summary, I'm much less confident it's safe to enable the optimisation in this case. BTW, the name of the AggStatePerAggStateData struct is pretty horrible. The repeated AggState feels awkward. Now that I've stared at the patch for a some time, it doesn't bother me anymore, but it took me quite a while to over that. I'm sure it will for others too. And it's not just that struct, the comments talk about aggregate state, which could be confused to mean AggState, but it actually means AggStatePerAggStateData. I don't have any great suggestions, but can you come up a better naming scheme? I agree, they're horrible. The thing that's causing the biggest problem is the struct named AggState, which carries state for *all* aggregates, and has nothing to do with transition state, so it seems there's two different meanings if the word state and I've used both meanings in the name for AggStatePerAggStateData. Perhaps just renaming AggStatePerAggStateData to AggStateTransStateData would be good enough? Hmm. I think it should be AggStatePerTransData then, to keep the same pattern as AggStatePerAggData and AggStatePerGroupData. Sounds good. I've renamed it to that in the attached delta patch. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services sharing_aggstate-heikki-1_delta2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented
On Tue, Jul 28, 2015 at 1:15 AM, Andrew Dunstan and...@dunslane.net wrote: Well, it does create a lot of files that we don't pick up. An example list is show below, and I am attaching their contents in a single gzipped attachment. However, these are in the wrong location. This was a vpath build and yet these tmp_check directories are all created in the source tree. Let's fix that and then I'll set about having the buildfarm collect them. That should get us further down the track. [log list] The patch attached fixes that. I suggest that we use env{TESTDIR}/log as a location for the logs so as even a vpath build will locate correctly the log files. -- Michael 20150728_tap_logs_vpath.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A little RLS oversight?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 03:05 PM, Stephen Frost wrote: AFK at the moment, but my thinking was that we should avoid having the error message change based on what a GUC is set to. I agree that there should be comments which explain that. I changed back to using GetUserId() for the call to check_enable_rls() at those three call sites, and added to the comments to explain why. While looking at ri_ReportViolation() I spotted what I believe to be a bug in the current logic -- namely, has_perm is initialized to true, and when check_enable_rls() returns RLS_ENABLED we never reset has_perm to false, and thus leak info even though the comments claim we don't. I fixed that here, but someone please take a look and confirm I am reading that correctly. Beyond that, any additional comments? Thanks, Joe - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVtuadAAoJEDfy90M199hl67kQAJw9iekYVAm52+kOxmBi8YLK NMoRUWLv5AZ7coaltQBBTiYYbjWHVKoWaMrOg2OjtxjyeshYaZ+xsBQl4umznI9j b2unSfUmRPcCgy7O6R63+IXePh6krKowlMAIkSelYjvV05nSgQfy87/xjcBXS12r MajLambTyJycS8fQXdt9sG8uGZh7ncXUtip3mUOYgl9Omn5GiDcgbdV1xQR+GBRJ 48S9lTHIJenpvi83Y9/7CXfDwxdcvkziUkR67UL4jxqmjWBDrrGZWEWOE1KOn78W dIvItOnuw/tKoxmhcwkgys+u92uhfQUUwhDQsJRVKsqzvPcVt6Oh15rtlqipbYEn YfcM35Sa4sUtxL9JIoyof+8audagPy9nzD26c4jA2A7EWXHt8Dim/z7D5RgrOpdn xBqlwViuR5Zt+kLynf3aZyDtmaMIRA+tvzJPam1vrl7g86LCJbZslFNktveiGeYl 17+PKLTDcVO5f6CdT1NTnlaks0J7D4VThxGemDs09KX6P8iCe6VFaUqfEONpjb41 WsumlDJkT9bu5i8W68xtEskXBYgBmDCo6yho4bKn/f6tpHc10yyh8RpK48P5xPt9 ZLSTLmYkfLL7wsINw6eNrQ4OZCtWwiydD9CmjQZOzscyBBusOvlIcI9Kfrle0I1V r2gQN651WyY4YJIoEggu =hlUr -END PGP SIGNATURE- diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 98d1497..fd82ea4 100644 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SET search_path TO replaceableschema/ *** 15259,15264 --- 15259,15270 entrytypeboolean/type/entry entrydoes current user have privilege for role/entry /row + row +entryliteralfunctionrow_security_active/function(parametertable/parameter)/literal +/entry +entrytypeboolean/type/entry +entrydoes current user have row level security active for table/entry + /row /tbody /tgroup /table *** SET search_path TO replaceableschema/ *** 15299,15304 --- 15305,15313 indexterm primarypg_has_role/primary /indexterm +indexterm + primaryrow_security_active/primary +/indexterm para functionhas_table_privilege/function checks whether a user *** SELECT has_function_privilege('joeuser', *** 15462,15467 --- 15471,15483 are immediately available without doing commandSET ROLE/. /para +para + functionrow_security_active/function checks whether row level + security is active for the specified table in the context of the + functioncurrent_user/function and environment. The table can + be specified by name or by OID. +/para + para xref linkend=functions-info-schema-table shows functions that determine whether a certain object is firsttermvisible/ in the diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index 1043362..2797c56 100644 *** a/src/backend/access/index/genam.c --- b/src/backend/access/index/genam.c *** BuildIndexValueDescription(Relation inde *** 203,209 indrelid = idxrec-indrelid; Assert(indexrelid == idxrec-indexrelid); ! /* RLS check- if RLS is enabled then we don't return anything. */ if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED) { ReleaseSysCache(ht_idx); --- 203,213 indrelid = idxrec-indrelid; Assert(indexrelid == idxrec-indexrelid); ! /* ! * RLS check - if RLS is enabled then we don't return anything. ! * Explicitly pass GetUserId() to ensure the result is not ! * dependent on the value of row_security for users with BYPASSRLS. ! */ if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED) { ReleaseSysCache(ht_idx); diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e82a53a..c0bd6fa 100644 *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** CREATE VIEW pg_indexes AS *** 150,156 LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace) WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i'; ! CREATE VIEW pg_stats AS SELECT nspname AS schemaname, relname AS tablename, --- 150,156 LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace) WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i'; ! CREATE VIEW pg_stats WITH (security_barrier) AS SELECT nspname AS schemaname, relname AS tablename, *** CREATE VIEW pg_stats AS *** 211,217
Re: [HACKERS] Autonomous Transaction is back
On 7/27/15 6:40 PM, Jim Nasby wrote: On 7/27/15 5:12 PM, Joel Jacobson wrote: Right now, when writing a function, if you raise an exception, you can be sure all writes you have made will be rollbacked, but your caller function might caught the exception and decide to carry on and commit work made before your function was called, but at least you can be confident your writes won't be committed as long as you don't caught the exception you raised in your own function. If I understand it correctly, that would change with the addition of Autonomous Transaction, unless given a way to prevent a function you call from starting and commiting a Autonomous Transaction. Wrong? If so, then please show how to prevent Y() from commiting the Data write B1 in your example, I don't get it. What's being described here doesn't make sense in either use case ([1] [2]), but I do understand the concern about what 3rd party software is doing. It would be nice to have the ability to disallow and/or disable autonomous transactions, but I don't see a practical way of doing that other than introducing a new GUC. I'm not sure if it's worth that effort. It just occurred to me that another option would be to have an event trigger for beginning an autonomous transaction. [1] the batch process use case: batches that still hold their own transaction open don't gain anything. [2] the audit logging case. If you didn't care about auditing surviving regardless of a rollback then you wouldn't go to the extra work of an autonomous transaction to begin with. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Several memory leaks for pg_rewind caused by missing PQclear calls
On Tue, Jul 28, 2015 at 2:45 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/23/2015 05:08 PM, Michael Paquier wrote: Hi all, After a run of valgrind on pg_rewind, I found a couple of code paths missing some PQclear calls after running a query. Attached is a patch to fix all those leaks. Fixed, thanks. Thanks for taking care of this. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more RLS oversights
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/27/2015 01:13 PM, Alvaro Herrera wrote: Hmm, these are not ACL objects, so conceptually it seems cleaner to use a different symbol for this. I think the catalog state and the error messages would be a bit confusing otherwise. Ok -- done Isn't this leaking the previously allocated array? Not sure it's all that critical, but still. (I don't think you really need to call palloc at all here.) Agreed -- I think the attached is a bit cleaner. Any other comments or concerns? - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVts3+AAoJEDfy90M199hlzTIQAJ6LEOrEEhHEjsoVvCT6TUAu pMQl/LWmb0s3/vF9o5VFTpVd21k0LxcggD+DdbxSagiw1WpLK5x67C9Lj5uuFn/d 3a95nFnQje3ciQJaAKS4XcGyx8+6rHPZqfBRC1rARbLuDHrwpKqmbKwvpQCud4xN kD7OolYk5Gd3cId0xH0XBHuwLVJg4Bt/MAexrcHn+kXuQoF8V6iOjnmBI/BHvTQy w4j4ov7DpWSAR1ZiCTCkF2ZuNd9TJ8FmAEtSDVrlWMxB9J+9PU5oTfD50jp62BTz wycANVYmbpCyZ7DAAiqopt3IQFIiF/1bYwzWH3/M8RRMKJF8HNyE8KBPDyC/doO5 0c0poCugJI2wOhumLGJShizgSAS85vqijh2Uxpp2yQxdStMnADykzT4Nb5EnEJVE i7OKy6w+2xyilfFGEWhHfS7uo5Y0zBorhpjgT9fRaqPBGoK4jYwZoO8w97SUd9aS kNXOCfmFxvcDFSZIYZP77pOeJZR2dLCbr+X9bF1Fz5S4FVkgCXVOp9rmsqzgxoDh lcpkDh9zPPezdczRkfq/qCf0lmzGg8apdqdr7+g8DxU+01OvPspEdpovQQN0HXlM m5Kbny4KCWhAgBTyAsOFTEy6lK7u4KsHV1cee3bG+ev65czbQ14gGRMJc/Lhm6Gg Apcn/UcF14vLWxYVo6lS =pS6L -END PGP SIGNATURE- diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 9096ee5..7781c56 100644 *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 5793,5798 --- 5793,5808 /varlistentry varlistentry + termsymbolSHARED_DEPENDENCY_POLICY/ (literalr/)/term + listitem + para +The referenced object (which must be a role) is mentioned as the +target of a dependent policy object. + /para + /listitem + /varlistentry + + varlistentry termsymbolSHARED_DEPENDENCY_PIN/ (literalp/)/term listitem para diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c index 34fe4e2..43076c9 100644 *** a/src/backend/catalog/pg_shdepend.c --- b/src/backend/catalog/pg_shdepend.c *** storeObjectDescription(StringInfo descs, *** 1083,1088 --- 1083,1090 appendStringInfo(descs, _(owner of %s), objdesc); else if (deptype == SHARED_DEPENDENCY_ACL) appendStringInfo(descs, _(privileges for %s), objdesc); + else if (deptype == SHARED_DEPENDENCY_POLICY) + appendStringInfo(descs, _(target of %s), objdesc); else elog(ERROR, unrecognized dependency type: %d, (int) deptype); diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c index 17b48d4..9544f75 100644 *** a/src/backend/commands/policy.c --- b/src/backend/commands/policy.c *** *** 22,27 --- 22,28 #include catalog/indexing.h #include catalog/namespace.h #include catalog/objectaccess.h + #include catalog/pg_authid.h #include catalog/pg_policy.h #include catalog/pg_type.h #include commands/policy.h *** *** 48,54 static void RangeVarCallbackForPolicy(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static char parse_policy_command(const char *cmd_name); ! static ArrayType *policy_role_list_to_array(List *roles); /* * Callback to RangeVarGetRelidExtended(). --- 49,55 static void RangeVarCallbackForPolicy(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static char parse_policy_command(const char *cmd_name); ! static Datum *policy_role_list_to_array(List *roles, int *num_roles); /* * Callback to RangeVarGetRelidExtended(). *** parse_policy_command(const char *cmd_nam *** 130,159 /* * policy_role_list_to_array ! * helper function to convert a list of RoleSpecs to an array of role ids. */ ! static ArrayType * ! policy_role_list_to_array(List *roles) { ! ArrayType *role_ids; ! Datum *temp_array; ListCell *cell; - int num_roles; int i = 0; /* Handle no roles being passed in as being for public */ if (roles == NIL) { ! temp_array = (Datum *) palloc(sizeof(Datum)); ! temp_array[0] = ObjectIdGetDatum(ACL_ID_PUBLIC); ! role_ids = construct_array(temp_array, 1, OIDOID, sizeof(Oid), true, ! 'i'); ! return role_ids; } ! num_roles = list_length(roles); ! temp_array = (Datum *) palloc(num_roles * sizeof(Datum)); foreach(cell, roles) { --- 131,158 /* * policy_role_list_to_array ! * helper function to convert a list of RoleSpecs to an array of ! * role id Datums. */ ! static Datum * ! policy_role_list_to_array(List *roles, int *num_roles) { ! Datum *role_oids; ListCell *cell; int i = 0; /* Handle no roles being passed in as being for public */ if (roles == NIL) { ! *num_roles = 1; ! role_oids = (Datum *) palloc(*num_roles *
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
Hello. In the attached patch I've made a refactoring for tranches. The prefix for them was extended, and I've did a split of LWLockAssign to two functions (one with tranche and second for user defined LWLocks). -- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 0eb991c..e75ca4d 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -480,7 +480,7 @@ pgss_shmem_startup(void) if (!found) { /* First time through ... */ - pgss-lock = LWLockAssign(); + pgss-lock = LWLockUserAssign(); pgss-cur_median_usage = ASSUMED_MEDIAN_INIT; pgss-mean_query_len = ASSUMED_LENGTH_INIT; SpinLockInit(pgss-mutex); diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index 9c15950..b365565 100644 --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -3356,7 +3356,7 @@ if (!ptr) { initialize contents of shmem area; acquire any requested LWLocks using: -ptr-mylockid = LWLockAssign(); +ptr-mylockid = LWLockUserAssign(); } LWLockRelease(AddinShmemInitLock); } diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 3a58f1e..3a55511 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -457,7 +457,7 @@ CLOGShmemInit(void) { ClogCtl-PagePrecedes = CLOGPagePrecedes; SimpleLruInit(ClogCtl, CLOG Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, - CLogControlLock, pg_clog); + CLogControlLock, pg_clog, LW_TRANCHE_CLOG_BUFFERS); } /* diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 5ad35c0..2571203 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -466,7 +466,8 @@ CommitTsShmemInit(void) CommitTsCtl-PagePrecedes = CommitTsPagePrecedes; SimpleLruInit(CommitTsCtl, CommitTs Ctl, CommitTsShmemBuffers(), 0, - CommitTsControlLock, pg_commit_ts); + CommitTsControlLock, pg_commit_ts, + LW_TRANCHE_COMMITTS_BUFFERS); commitTsShared = ShmemInitStruct(CommitTs shared, sizeof(CommitTimestampShared), diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 1933a87..e8d2474 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -1842,10 +1842,12 @@ MultiXactShmemInit(void) SimpleLruInit(MultiXactOffsetCtl, MultiXactOffset Ctl, NUM_MXACTOFFSET_BUFFERS, 0, - MultiXactOffsetControlLock, pg_multixact/offsets); + MultiXactOffsetControlLock, pg_multixact/offsets, + LW_TRANCHE_MULTIXACT_BUFFERS); SimpleLruInit(MultiXactMemberCtl, MultiXactMember Ctl, NUM_MXACTMEMBER_BUFFERS, 0, - MultiXactMemberControlLock, pg_multixact/members); + MultiXactMemberControlLock, pg_multixact/members, + LW_TRANCHE_MULTIXACT_BUFFERS); /* Initialize our shared state struct */ MultiXactState = ShmemInitStruct(Shared MultiXact State, diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index 5fcea11..7709f6d 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -161,7 +161,7 @@ SimpleLruShmemSize(int nslots, int nlsns) void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns, - LWLock *ctllock, const char *subdir) + LWLock *ctllock, const char *subdir, int lwlocktranche) { SlruShared shared; bool found; @@ -218,7 +218,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns, shared-page_status[slotno] = SLRU_PAGE_EMPTY; shared-page_dirty[slotno] = false; shared-page_lru_count[slotno] = 0; - shared-buffer_locks[slotno] = LWLockAssign(); + shared-buffer_locks[slotno] = LWLockAssign(lwlocktranche); ptr += BLCKSZ; } } diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c index 6b70982..80690bc 100644 --- a/src/backend/access/transam/subtrans.c +++ b/src/backend/access/transam/subtrans.c @@ -179,7 +179,8 @@ SUBTRANSShmemInit(void) { SubTransCtl-PagePrecedes = SubTransPagePrecedes; SimpleLruInit(SubTransCtl, SUBTRANS Ctl, NUM_SUBTRANS_BUFFERS, 0, - SubtransControlLock, pg_subtrans); + SubtransControlLock, pg_subtrans, + LW_TRANCHE_SUBTRANS_BUFFERS); /* Override default assumption that writes should be fsync'd */ SubTransCtl-do_fsync = false; } diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e82a53a..fd04258 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -621,6 +621,7 @@ CREATE VIEW pg_stat_activity AS S.query_start, S.state_change, S.waiting, +
Re: [HACKERS] Feature - Index support on an lquery field (from the ltree module)
On 07/28/2015 06:40 AM, David Belle wrote: I have a requirement for a project that I am working on and was hoping to gain some discussion on the idea of implementing an index type for lquery fields (see ltree’s http://www.postgresql.org/docs/9.4/static/ltree.html http://www.postgresql.org/docs/9.4/static/ltree.html) Currently ltree fields can have GIST/GIN indexes, but you cannot create a index for lquery or ltxtquery. In my scope, an index for lquery would be enough, although if it wasn’t too difficult it for to be expanded for ltxtquery fields, then this could also be implemented. Hmm. My first thought is to extract labels from the query which are required for any matches, and put those in the GIN index. When querying, find all lqueries that have at least one label in common with the search key. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DESIGN] ParallelAppend
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Monday, July 27, 2015 11:07 PM To: Amit Kapila Cc: pgsql-hackers@postgresql.org; Robert Haas; Kyotaro HORIGUCHI Subject: Re: [HACKERS] [DESIGN] ParallelAppend On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Hello, I'm recently working/investigating on ParallelAppend feature towards the next commit fest. Below is my design proposal. 1. Concept -- Its concept is quite simple anybody might consider more than once. ParallelAppend node kicks background worker process to execute child nodes in parallel / asynchronous. It intends to improve the performance to scan a large partitioned tables from standpoint of entire throughput, however, latency of the first multi-hundred rows are not scope of this project. From standpoint of technology trend, it primarily tries to utilize multi-cores capability within a system, but also enables to expand distributed database environment using foreign-tables inheritance features. Its behavior is very similar to Funnel node except for several points, thus, we can reuse its infrastructure we have had long- standing discussion through the v9.5 development cycle. 2. Problems to be solved - Typical OLAP workloads takes tons of tables join and scan on large tables which are often partitioned, and its KPI is query response time but very small number of sessions are active simultaneously. So, we are required to run a single query as rapid as possible even if it consumes larger computing resources than typical OLTP workloads. Current implementation to scan heap is painful when we look at its behavior from the standpoint - how many rows we can read within a certain time, because of synchronous manner. In the worst case, when SeqScan node tries to fetch the next tuple, heap_getnext() looks up a block on shared buffer, then ReadBuffer() calls storage manager to read the target block from the filesystem if not on the buffer. Next, operating system makes the caller process slept until required i/o get completed. Most of the cases are helped in earlier stage than the above worst case, however, the best scenario we can expect is: the next tuple already appear on top of the message queue (of course visibility checks are already done also) with no fall down to buffer manager or deeper. If we can run multiple scans in parallel / asynchronous, CPU core shall be assigned to another process by operating system, thus, it eventually improves the i/o density and enables higher processing throughput. Append node is an ideal point to be parallelized because - child nodes can have physically different location by tablespace, so further tuning is possible according to the system landscape. - it can control whether subplan is actually executed on background worker, per subplan basis. If subplan contains large tables and small tables, ParallelAppend may kick background worker to scan large tables only, but scan on small tables are by itself. - Like as Funnel node, we don't need to care about enhancement of individual node types. SeqScan, IndexScan, ForeignScan or others can perform as usual, but actually in parallel. 3. Implementation -- * Plan Cost ParallelAppend shall appear where Appen can appear except for the usage for dummy. So, I'll enhance set_append_rel_pathlist() to add both of AppendPath and ParallelAppendPath with cost for each. Is there a real need to have new node like ParallelAppendPath? Can't we have Funnel node beneath AppendNode and then each worker will be responsible to have SeqScan on each inherited child relation. Something like Append --- Funnel -- SeqScan rel1 -- SeqScan rel2 If Funnel can handle both of horizontal and vertical parallelism, it is a great simplification. I never stick a new node. Once Funnel get a capability to have multiple child nodes, probably, Append node above will have gone. I expect set_append_rel_pathlist() add two paths based on Append and Funnel, then planner will choose the cheaper one according to its cost. In the latest v16 patch, Funnel is declared as follows: typedef struct Funnel { Scanscan; int num_workers; } Funnel; If we try to add Append capability here, I expects the structure will be adjusted as follows, for example: typedef struct Funnel { Scanscan; List *funnel_plans; List *funnel_num_workers; } Funnel; As literal, funnel_plans saves underlying Plan nodes instead of the lefttree. Also, funnel_num_workers saves number of expected workers to be assigned on individual child plans. Even
Re: [HACKERS] MultiXact member wraparound protections are now enabled
On Mon, Jul 27, 2015 at 07:59:40AM +0100, Simon Riggs wrote: On 26 July 2015 at 20:15, Noah Misch n...@leadboat.com wrote: On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote: On 7/22/15 4:45 PM, Robert Haas wrote: But it seemed to me that this could be rather confusing. I thought it would be better to be explicit about whether the protections are enabled in all cases. That way, (1) if you see the message saying they are enabled, they are enabled; (2) if you see the message saying they are disabled, they are disabled; and (3) if you see neither message, your version does not have those protections. But this is not documented, AFAICT, so I don't think anyone is going to be able to follow that logic. I don't see anything in the release notes saying, look for this message to see how this applies to you, or whatever. I supported inclusion of the message, because it has good potential to help experts studying historical logs to find the root cause of data corruption. The complex histories of clusters showing corruption from this series of bugs have brought great expense to the task of debugging new reports. Given a cluster having full mxact wraparound protections since last corruption-free backup (or since initdb), one can rule out some causes. Would it be better to replace it with a less specific and more generally useful message? For example, Server started with release X.y.z from which we could infer various useful things. That message does sound generally useful, but we couldn't infer $subject from it. While the $subject message appears at startup in simple cases, autovacuum prerequisite work can delay it indefinitely. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan and readfuncs.c
2. Reproduce method table on background worker -- The method field of CustomPath/Scan/ScanState is expected to be a reference to a static structure. Thus, copyObject() does not copy the entire table, but only pointers. However, we have no way to guarantee the callback functions have same entrypoint addressed on background workers. So, we may need an infrastructure to reproduce same CustomScan node with same callback function tables, probably, identified by name. We may have a few ways to solve the problem. * Add system catalog, function returns pointer The simplest way, like FDW. System catalog has a name and function to return callback pointers. It also needs SQL statement support, even a little down side. I tried to design a DDL statement and relevant system catalog as follows. #define CustomPlanRelationId3999 CATALOG(pg_custom_plan,3999) { NameDatacustom_name; regproc custom_handler; } FormData_pg_custom_plan; This simple catalog saves a pair of name and handler function of custom plan provider. Like FDW, this handler function returns pointers to the entrypoint to be called by set_(rel|join)_pathlist_hook and relevant CustomXXXMethods table. User can register a custom plan provider using the following statement: CREATE CUSTOM PLAN name HANDLER function_name; And unregister: DROP CUSTOM PLAN name; This enhancement allows background workers to reproduce CustomScan node that was serialized by nodeToString(), as long as provider is specified by the name. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Kaigai Kouhei(海外 浩平) Sent: Monday, July 27, 2015 8:42 AM To: 'Tom Lane' Cc: pgsql-hackers@postgresql.org Subject: RE: [HACKERS] CustomScan and readfuncs.c Kouhei Kaigai kai...@ak.jp.nec.com writes: Under the investigation of ParallelAppend, I noticed here is a few problems in CustomScan, that prevents to reproduce an equivalent plan node on the background worker from serialized string. 1. CustomScanMethods-TextOutCustomScan callback This callback allows to output custom information on nodeToString. Originally, we intend to use this callback for debug only, because CustomScan must be copyObject() safe, thus, all the private data also must be stored in custom_exprs or custom_private. However, it will lead another problem when we try to reproduce CustomScan node from the string form generated by outfuncs.c. If TextOutCustomScan prints something, upcoming _readCustomScan has to deal with unexpected number of tokens in unexpected format. Um ... wait a second. There is no support in readfuncs for any plan node type, and never has been, and I seriously doubt that there ever should be. I do not think it makes sense to ship plans around in the way you seem to have in mind. (Also, I don't think the problems you mention are exactly unique to CustomScan. There's no reason to assume that FDW plans could survive this treatment either, since we do not know what's in the fdw_private stuff; certainly no one has ever suggested that it should not contain pointers to static data.) Yep, no Plan node types are supported at this moment, however, will appear soon by the Funnel + PartialSeqScan nodes. It serializes a partial plan subtree using nodeToString() then gives the flatten PlannedStmt to background workers. I'm now investigating to apply same structure to Append not to kick child nodes in parallel. Once various plan node types appear in readfuncs.c, we have to care about this problem, don't it? I'm working for the patch submission of ParallelAppend on the next commit-fest, so like to make a consensus how to treat this matter. I'd like to propose to omit this callback prior to v9.5 release, for least compatibility issues. I regard our commitment to cross-version compatibility for the custom scan APIs as being essentially zero, for reasons previously discussed. So if this goes away in 9.6 it will not matter, but we might as well leave it in for now for debug support. I don't argue this point strongly. If TextOutCustomScan shall be obsoleted on v9.6, it is just kindness for developers not to use this callback. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
2015-07-27 21:57 GMT+02:00 Andrew Dunstan and...@dunslane.net: On 07/27/2015 02:53 PM, Pavel Stehule wrote: I am trying to run parallel execution psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3 psql -c select current_database() I don't think it's going to be a hugely important feature, but I don't see a problem with creating a new option (-C seems fine) which would have the same effect as if the arguments were contatenated into a file which is then used with -f. IIRC -c has some special characteristics which means it's probably best not to try to extend it for this feature. ok, I'll try to write patch. Pavel cheers andrew
Re: [HACKERS] more RLS oversights
On 07/03/2015 10:03 AM, Noah Misch wrote: +static void +dumpPolicy(Archive *fout, PolicyInfo *polinfo) ... +if (polinfo-polqual != NULL) +appendPQExpBuffer(query, USING %s, polinfo-polqual); (3) The USING clause needs parentheses; a dump+reload failed like so: Also needed for WITH CHECK. Fix pushed to 9.5 and HEAD. Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Andreas Seltenreich seltenre...@gmx.de writes: I let sqlsmith run during the night, and it did no longer trigger the first two. During roughly a million random queries it triggered the already mentioned brin one 10 times, but there was also one instance of this new one in the log: TRAP: FailedAssertion(!(join_clause_is_movable_into(rinfo, joinrel-relids, join_and_req)), File: relnode.c, Line: 987) LOG: server process (PID 12851) was terminated by signal 6: Aborted DETAIL: Failed process was running: select rel65543066.tmplname as c0, rel65543064.umuser as c1 from public.dept as rel65543059 inner join pg_catalog.pg_user_mappings as rel65543064 left join pg_catalog.pg_enum as rel65543065 on (rel65543064.srvname = rel65543065.enumlabel ) inner join pg_catalog.pg_ts_template as rel65543066 on (rel65543065.enumtypid = rel65543066.tmplnamespace ) on (rel65543059.dname = rel65543064.srvname ) where ((rel65543059.mgrname = rel65543059.mgrname) and (rel65543064.usename = rel65543066.tmplname)) and (rel65543059.mgrname ~~ rel65543059.mgrname) fetch first 128 rows only; I looked into this one. What seems to be the story is that join_clause_is_movable_into() is approximate in the conservative direction, that is it sometimes can return false when a strict analysis would conclude true (a fact that is undocumented in its comments; I shall fix that). This is acceptable, as long as the answers are consistent across different jointree levels, since the worst consequence is that we might fail to push a join clause as far down the jointree as it could be pushed. However, the Assert that's failing here supposes that the answer is exact. I think a sufficient fix in the near term is to disable that Assert and add suitable commentary. In the long run it might be nice if the answers were exact, but that would require more information than is currently passed to join_clause_is_movable_into(), which would imply non-back-patchable API changes. ,[ git bisect ] | first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve | predtest.c's ability to reason about operator expressions. ` There seems to be something odd going on with your bisection tests, because once again the query fails clear back to 9.2 for me, which is what I'd expect based on the above analysis --- the movable-join-clause logic all came in with parameterized paths in 9.2. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Feature - Index support on an lquery field (from the ltree module)
Hi Forgive me if this is not the right place to discuss this. I am new to the postgresql community. I have a requirement for a project that I am working on and was hoping to gain some discussion on the idea of implementing an index type for lquery fields (see ltree’s http://www.postgresql.org/docs/9.4/static/ltree.html http://www.postgresql.org/docs/9.4/static/ltree.html) Currently ltree fields can have GIST/GIN indexes, but you cannot create a index for lquery or ltxtquery. In my scope, an index for lquery would be enough, although if it wasn’t too difficult it for to be expanded for ltxtquery fields, then this could also be implemented. I have not contributed to postgres yet so I’m not really sure how to get started. Thoughts?
Re: [HACKERS] Autonomous Transaction is back
2015-07-27 23:59 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson j...@trustly.com wrote: On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus j...@agliodbs.com wrote: Batch Jobs: large data-manipulation tasks which need to be broken up into segments, with each segment committing separately. Example: updating 1 million records in batches of 1000. Autonomous transactions are not a good fit for this case; stored procedures are a better way to go for any scenario where you don't want be be in a snapshot (for example, suppose you want to change isolation level on the fly). Hm, you mean we need real stored procedures in PostgreSQL and not just functions? Yes, exactly. Autonomous transactions aren't really set up for cases where the function runs for a very long time or indefinitely. This is the 'advancing xmin' problem as Josh puts it but I think the problem is much bigger than that. Anyways, this is mostly irrelevant to autonomous transactions as long as the design isn't extended to try and cover that case. Is the Autonomous Transaction feature only going to be exposed through pl/pgsql? I hope not. The integration with plpgsql can be secondary question. In this case I prefer a relation to block statement without possibility to explicit COMMIT. Minimally in functions. some like BEGIN BEGIN AUTONOMOUS ... END; END; This is consistent with current subtransaction support, and disallow some corner cases like forgotten COMMIT. Regards Pavel merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
2015-07-28 5:24 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com: 2015-07-27 21:57 GMT+02:00 Andrew Dunstan and...@dunslane.net: On 07/27/2015 02:53 PM, Pavel Stehule wrote: I am trying to run parallel execution psql -At -c select datname from pg_database postgres | xargs -n 1 -P 3 psql -c select current_database() I don't think it's going to be a hugely important feature, but I don't see a problem with creating a new option (-C seems fine) which would have the same effect as if the arguments were contatenated into a file which is then used with -f. IIRC -c has some special characteristics which means it's probably best not to try to extend it for this feature. ok, I'll try to write patch. I have a question. Can be -C option multiple? The SQL is without problem, but what about \x command? postgres=# \dt \dn select 10; No relations found. List of schemas ┌──┬───┐ │ Name │ Owner │ ╞══╪═══╡ └──┴───┘ (0 rows) \dn: extra argument 10; ignored some like psql -C \dt \dn -C select 10 It is looking better than psql -c \dt \dn \n select 10 Regards Pavel Pavel cheers andrew
Re: [HACKERS] [DESIGN] ParallelAppend
On Tue, Jul 28, 2015 at 7:59 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Monday, July 27, 2015 11:07 PM To: Amit Kapila Is there a real need to have new node like ParallelAppendPath? Can't we have Funnel node beneath AppendNode and then each worker will be responsible to have SeqScan on each inherited child relation. Something like Append --- Funnel -- SeqScan rel1 -- SeqScan rel2 If Funnel can handle both of horizontal and vertical parallelism, it is a great simplification. I never stick a new node. Once Funnel get a capability to have multiple child nodes, probably, Append node above will have gone. I expect set_append_rel_pathlist() add two paths based on Append and Funnel, then planner will choose the cheaper one according to its cost. In the latest v16 patch, Funnel is declared as follows: typedef struct Funnel { Scanscan; int num_workers; } Funnel; If we try to add Append capability here, I expects the structure will be adjusted as follows, for example: typedef struct Funnel { Scanscan; List *funnel_plans; List *funnel_num_workers; } Funnel; As literal, funnel_plans saves underlying Plan nodes instead of the lefttree. Also, funnel_num_workers saves number of expected workers to be assigned on individual child plans. or shall we have a node like above and name it as FunnelAppend or AppenFunnel? Even though create_parallelscan_paths() in v16 set num_workers not larger than parallel_seqscan_degree, total number of the concurrent background workers may exceed this configuration if more than two PartialSeqScan nodes are underlying. It is a different configuration from max_worker_processes, so it is not a matter as long as we have another restriction. However, how do we control the cap of number of worker processes per appendable Funnel node? For example, if a parent table has 200 child tables but max_worker_processes are configured to 50. It is obviously impossible to launch all the background workers simultaneously. One idea I have is to suspend launch of some plans until earlier ones are completed. Okay, but I think in that idea you need to re-launch the workers again for new set of relation scan's which could turn out to be costly, how about designing some way where workers after completing their assigned work check for new set of task/'s (which in this case would be to scan a new) and then execute the same. I think in this way we can achieve dynamic allocation of work and achieve maximum parallelism with available set of workers. We have achieved this in ParallelSeqScan by scanning at block level, once a worker finishes a block, it checks for new block to scan. We will need to pay attention another issues we will look at when Funnel kicks background worker towards asymmetric relations. If number of rows of individual child nodes are various, we may want to assign 10 background workers to scan rel1 with PartialSeqScan. On the other hands, rel2 may have very small number of rows thus its total_cost may be smaller than cost to launch a worker. In this case, Funnel has child nodes to be executed asynchronously and synchronously. I think this might turn out to be slightly tricky, for example how do we know for what size of relation, how many workers are sufficient? Another way to look at dividing the work in this case could be in terms of chunk-of-blocks, once a worker finishes it current set of block/'s, it should be able to get new set of block's to scan. So let us assume if we decide chunk-size as 32 and total number of blocks in whole inheritance hierarchy are 3200, then the max workers we should allocate to this scan are 100 and if we have parallel_seqscan degree lesser than that then we can use those many workers and then let them scan 32-blocks-at-a-time. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Tom Lane writes: Andreas Seltenreich seltenre...@gmx.de writes: when running my random query generator contraption[1] against the regression database of 9.5 or master, it occasionally triggers one of the following three assertions. I've fixed the first two of these --- thanks for the report! I let sqlsmith run during the night, and it did no longer trigger the first two. During roughly a million random queries it triggered the already mentioned brin one 10 times, but there was also one instance of this new one in the log: TRAP: FailedAssertion(!(join_clause_is_movable_into(rinfo, joinrel-relids, join_and_req)), File: relnode.c, Line: 987) LOG: server process (PID 12851) was terminated by signal 6: Aborted DETAIL: Failed process was running: select rel65543066.tmplname as c0, rel65543064.umuser as c1 from public.dept as rel65543059 inner join pg_catalog.pg_user_mappings as rel65543064 left join pg_catalog.pg_enum as rel65543065 on (rel65543064.srvname = rel65543065.enumlabel ) inner join pg_catalog.pg_ts_template as rel65543066 on (rel65543065.enumtypid = rel65543066.tmplnamespace ) on (rel65543059.dname = rel65543064.srvname ) where ((rel65543059.mgrname = rel65543059.mgrname) and (rel65543064.usename = rel65543066.tmplname)) and (rel65543059.mgrname ~~ rel65543059.mgrname) fetch first 128 rows only; ,[ git bisect ] | first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve | predtest.c's ability to reason about operator expressions. ` I'm a bit confused about this aspect of your report though, because in my hands that example fails clear back to 9.2. It doesn't seem to require the predtest.c improvement to expose the fault. Hmm, I actually used a different, uglier query to trigger this assertion for the bisection run. I'll attach it[1] along with the complete git bisect log[2]. regards, andreas Footnotes: [1] select subq_717608.c3 as c0, rel4551421.inhrelid as c1, rel4551421.inhrelid as c2, subq_717608.c3 as c3 from information_schema.foreign_tables as rel4551363 right join public.hash_f8_heap as rel4551366 inner join pg_catalog.pg_constraint as rel4551419 inner join (select rel4551420.bb as c0, rel4551420.aa as c1, rel4551420.aa as c2, rel4551420.aa as c3 from public.b as rel4551420 where ( rel4551420.bbrel4551420.bb ) and ( rel4551420.bbrel4551420.bb ) ) as subq_717608 on (rel4551419.coninhcount = subq_717608.c1 ) left join pg_catalog.pg_inherits as rel4551421 on (subq_717608.c1 = rel4551421.inhseqno ) on (rel4551366.seqno = subq_717608.c1 ) on (rel4551363.foreign_table_schema = rel4551419.conname ) where ( ( rel4551419.contypidrel4551419.connamespace ) and ( rel4551419.connamespace=rel4551419.conrelid ) ) and ( rel4551421.inhparentrel4551419.contypid ) fetch first 9 rows only ; [2] git bisect start # bad: [3b5a89c4820fb11c337838c1ad71e8e93f2937d1] Fix resource leak pointed out by Coverity. git bisect bad 3b5a89c4820fb11c337838c1ad71e8e93f2937d1 # good: [e6df2e1be6330660ba4d81daa726ae4a71535aa9] Stamp 9.4beta1. git bisect good e6df2e1be6330660ba4d81daa726ae4a71535aa9 # bad: [68e66923ff629c324e219090860dc9e0e0a6f5d6] Add missing volatile qualifier. git bisect bad 68e66923ff629c324e219090860dc9e0e0a6f5d6 # bad: [a16bac36eca8158cbf78987e95376f610095f792] Remove dependency on wsock32.lib in favor of ws2_32 git bisect bad a16bac36eca8158cbf78987e95376f610095f792 # good: [55d5b3c08279b487cfa44d4b6e6eea67a0af89e4] Remove unnecessary output expressions from unflattened subqueries. git bisect good 55d5b3c08279b487cfa44d4b6e6eea67a0af89e4 # bad: [1cbc9480106241aaa8db112331e93d0a265b6db0] Check interrupts during logical decoding more frequently. git bisect bad 1cbc9480106241aaa8db112331e93d0a265b6db0 # bad: [686f362bee126e50280bcd3b35807b02f18a8966] Fix contrib/pg_upgrade/test.sh for $PWD containing spaces. git bisect bad 686f362bee126e50280bcd3b35807b02f18a8966 # bad: [be76a6d39e2832d4b88c0e1cc381aa44a7f86881] Secure Unix-domain sockets of make check temporary clusters. git bisect bad be76a6d39e2832d4b88c0e1cc381aa44a7f86881 # good: [6554656ea2043c5bb877b427237dc5ddd7c5e5c8] Improve tuplestore's error messages for I/O failures. git bisect good 6554656ea2043c5bb877b427237dc5ddd7c5e5c8 # bad: [a7205d81573cb0c979f2d463a1d9edb6f97c94aa] Adjust 9.4 release notes. git bisect bad a7205d81573cb0c979f2d463a1d9edb6f97c94aa # bad: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve predtest.c's ability to reason about operator expressions. git bisect bad 3f8c23c4d31d4a0e801041733deb2c7cfa577b32 # good: [c81e63d85f0c2c39d3fdfd8b95fc1ead6fdcb89f] Fix pg_restore's processing of old-style BLOB COMMENTS data. git bisect good c81e63d85f0c2c39d3fdfd8b95fc1ead6fdcb89f # first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve predtest.c's ability to reason about operator
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On Thu, Jul 9, 2015 at 7:44 PM, David Rowley david.row...@2ndquadrant.com wrote: On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote: This basically allows an aggregate's state to be shared between other aggregate functions when both aggregate's transition functions (and a few other things) match There's quite a number of aggregates in our standard set which will benefit from this optimisation. After compiling the original patch with another compiler, I noticed a couple of warnings. The attached fixes these. I did some performance tests on the patch. This patch shown good improvement for same column aggregates. With int or bigint datatype columns, this patch doesn't show any visible performance difference. But with numeric datatype it shows good improvement. select sum(x), avg(y) from test where x $1; Different columns: selectivityHeadpatch (millions) 0.1 315 322 0.3 367 376 0.5 419 427 1 551 558 2 824 826 select sum(x), avg(x) from test where x $1; Same column: selectivityHeadpatch (millions) 0.1 314 314 0.3 363 343 0.5 412 373 1 536 440 2 795 586 Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact member wraparound protections are now enabled
On 26 July 2015 at 20:15, Noah Misch n...@leadboat.com wrote: On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote: On 7/22/15 4:45 PM, Robert Haas wrote: But it seemed to me that this could be rather confusing. I thought it would be better to be explicit about whether the protections are enabled in all cases. That way, (1) if you see the message saying they are enabled, they are enabled; (2) if you see the message saying they are disabled, they are disabled; and (3) if you see neither message, your version does not have those protections. But this is not documented, AFAICT, so I don't think anyone is going to be able to follow that logic. I don't see anything in the release notes saying, look for this message to see how this applies to you, or whatever. I supported inclusion of the message, because it has good potential to help experts studying historical logs to find the root cause of data corruption. The complex histories of clusters showing corruption from this series of bugs have brought great expense to the task of debugging new reports. Given a cluster having full mxact wraparound protections since last corruption-free backup (or since initdb), one can rule out some causes. Would it be better to replace it with a less specific and more generally useful message? For example, Server started with release X.y.z from which we could infer various useful things. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On 27 July 2015 at 18:15, Haribabu Kommi kommi.harib...@gmail.com wrote: On Thu, Jul 9, 2015 at 7:44 PM, David Rowley david.row...@2ndquadrant.com wrote: On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote: This basically allows an aggregate's state to be shared between other aggregate functions when both aggregate's transition functions (and a few other things) match There's quite a number of aggregates in our standard set which will benefit from this optimisation. After compiling the original patch with another compiler, I noticed a couple of warnings. The attached fixes these. I did some performance tests on the patch. This patch shown good improvement for same column aggregates. With int or bigint datatype columns, this patch doesn't show any visible performance difference. But with numeric datatype it shows good improvement. Thanks for testing this. You should only see an improvement on aggregates listed here: select aggfnoid::oid, aggfnoid || '(' || typname || ')',aggtransfn,agginitval from pg_aggregate ag inner join pg_proc pr on aggfnoid = pr.oid inner join pg_type tp on pr.proargtypes[0] = tp.oid where ag.aggtransfn in (select aggtransfn from pg_aggregate group by aggtransfn having count(*)1) and ag.agginitval is null order by ag.aggtransfn; Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] multivariate statistics / patch v7
Hello, At Sat, 25 Jul 2015 23:09:31 +0200, Tomas Vondra tomas.von...@2ndquadrant.com wrote in 55b3fb0b.7000...@2ndquadrant.com Hi, On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote: Hi, I'd like to show you the modified constitution of multivariate statistics application logic. Please find the attached. They apply on your v7 patch. Sadly I do have some trouble getting it to apply correctly :-( So for now all my comments are based on just reading the code. Ah. My modification to rebase to the master for the time should be culprit. Sorry for the dirty patch. # I would recreate the patch if you complained before struggling # with the thing.. The core of the modification is on closesel.c. I attached the patched closesel.c. FWIW I've rebased my patch to the current master, it's available on github as usual: https://github.com/tvondra/postgres/commits/mvstats Thanks. The code to find mv-applicable clause is moved out of the main flow of clauselist_selectivity. As I said in the previous mail, the new function transformRestrictInfoForEstimate (too bad name but just for PoC:) scans clauselist and generates RestrictStatsData struct which drives mv-aware selectivity calculation. This struct isolates MV and non-MV estimation. The struct RestrictStatData mainly consists of the following three parts, - clause to be estimated by current logic (MV is not applicable) - clause to be estimated by MV-staistics. - list of child RestrictStatDatas, which are to be run recursively. mvclause_selectivty() is the topmost function where mv stats works. This structure effectively prevents main estimation flow from being broken by modifying mvstats part. Although I haven't measured but I'm positive the code is far reduced from yours. I attached two patches to this message. The first one is to rebase v7 patch to current(maybe) master and the second applies the refactoring. I'm a little anxious about performance but I think this makes the process to apply mv-stats far clearer. Regtests for mvstats succeeded asis except for fdep, which is not implememted in this patch. What do you think about this? I'm not sure, at this point. I'm having a hard time understanding how exactly the code works - there are pretty much no comments explaining the implementation, so it takes time to understand the code. This is especially true about transformRestrictInfoForEstimate which is also quite long. I understand it's a PoC, but comments would really help. The patch itself shold hardly readable because it's not from master but from your last patch plus somthing. My concern about the code at the time was following, - You embedded the logic of multivariate estimation into clauselist_selectivity. I think estimate using multivariate statistics is quite different from the ordinary estimate based on single column stats then they are logically separatable and we should do so. - You are taking top-down approach and it runs tree-walking to check appliability of mv-stats for every stepping down in clause tree. If the subtree found to be mv-applicable, split it to two parts - mv-compatible and non-compatible. These steps requires expression tree walking, which looks using too-much CPU. - You look to be considering the cases when users create many multivariate statistics on attribute sets having duplications. But it looks too-much for me. MV-stats are more resource-eating so we can assume the minimum usage of that. My suggestion in the patch is a bottom-up approach to find mv-applicable portion(s) in the expression tree, which is the basic way of planner overall. The approach requires no repetitive run of tree walker, that is, pull_varnos. It could fail to find the 'optimal' solution for complex situations but needs far less calculation for almost the same return (I think..). Even though it doesn't consider the functional dependency, the reduce of the code shows the efficiency. It does not nothing tricky. On a conceptual level, I think the idea to split the estimation into two phases - enrich the expression tree with nodes with details about stats etc, and then actually do the estimation in the second phase might be interesting. Not because it's somehow clearer, but because it gives us a chance to see the expression tree as a whole, with details about all the stats (with the current code we process/estimate the tree incrementally). But I don't really know how useful that would be. It is difficult to say which approach is better sinch it is affected by what we think important than other things. However I concern about that your code substantially reconstructs the expression (clause) tree midst of processing it. I believe it should be a separate phase for simplicity. Of course additional required resource is also should be considered but it is rather reduced for this case. I don't think the proposed change makes the
Re: [HACKERS] We need to support ForeignRecheck for late row locking, don't we?
KaiGai-san, On 2015/07/24 23:51, Kouhei Kaigai wrote: On 2015/07/22 19:10, Etsuro Fujita wrote: While working on the issue Foreign join pushdown vs EvalPlanQual, I happened to notice odd behaviors of late row locking in FDWs. I think the reason for that is because we don't check pushed-down quals inside an EPQ testing even if what was fetched by RefetchForeignRow was an updated version of the tuple rather than the same version previously obtained. So, to fix this, I'd like to propose that pushed-down quals be checked in ForeignRecheck. * I've modified ForeignRecheck so as to check pushed-down quals whether doing late locking or early locking. Isn't it an option to put a new callback in ForeignRecheck? FDW driver knows its private data structure includes expression node that was pushed down to the remote side. So, it seems to me the best way to consult FDW driver whether the supplied tuple should be visible according to the pushed down qualifier. More or less, this fix need a new interface contract around EvalPlanQual logic. It is better to give FDW driver more flexibility of its private data structure and the way to process recheck logic, rather than special purpose variable. If FDW driver managed pushed-down expression in its own format, requirement to pushedDownQual makes them to have qualifier redundantly. The callback approach does not have such kind of concern. That might be an idea, but is there any performance disadvantage as discussed in [1]?; it looks like that that needs to perform another remote query to see if the supplied tuple satisfies the pushed-down quals during EPQ testing. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/5590ed5c.2040...@lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On 07/27/2015 08:34 AM, David Rowley wrote: - * agg_input_types, agg_state_type, agg_result_type identify the input, - * transition, and result types of the aggregate. These should all be - * resolved to actual types (ie, none should ever be ANYELEMENT etc). + * agg_input_types identifies the input types of the aggregate. These should + * be resolved to actual types (ie, none should ever be ANYELEMENT etc). I'm not sure I understand why agg_state_type and agg_result_type have changed here. The function no longer has the agg_result_type argument, but the removal of agg_state_type from the comment was a mistake. + peraggstate-sortstates = (Tuplesortstate **) + palloc0(sizeof(Tuplesortstate *) * numGroupingSets); + for (currentsortno = 0; currentsortno numGroupingSets; currentsortno++) + peraggstate-sortstates[currentsortno] = NULL; This was not you, but this NULL setting looks unneeded due to the palloc0(). Yeah, I noticed that too. Ok, let's take it out. In this function I also wasn't quite sure if it was with comparing both non-NULL INITCOND's here. I believe my code comments may slightly contradict what the code actually does, as the comments talk about them having to match, but the code just bails if any are non-NULL. The reason I didn't check them was because it seems inevitable that some duplicate work needs to be done when setting up the INITCOND. Perhaps it's worth it? It would be nice to handle non-NULL initconds. I think you'll have to check that the input function isn't volatile. Or perhaps just call the input function, and check that the resulting Datum is byte-per-byte identical, although that might be awkward to do with the current code structure. BTW, the name of the AggStatePerAggStateData struct is pretty horrible. The repeated AggState feels awkward. Now that I've stared at the patch for a some time, it doesn't bother me anymore, but it took me quite a while to over that. I'm sure it will for others too. And it's not just that struct, the comments talk about aggregate state, which could be confused to mean AggState, but it actually means AggStatePerAggStateData. I don't have any great suggestions, but can you come up a better naming scheme? I agree, they're horrible. The thing that's causing the biggest problem is the struct named AggState, which carries state for *all* aggregates, and has nothing to do with transition state, so it seems there's two different meanings if the word state and I've used both meanings in the name for AggStatePerAggStateData. Perhaps just renaming AggStatePerAggStateData to AggStateTransStateData would be good enough? Hmm. I think it should be AggStatePerTransData then, to keep the same pattern as AggStatePerAggData and AggStatePerGroupData. I've attached a delta patch based on your patch, in this I've: 1. Renamed AggStatePerAggStateData to AggStateTransStateData and all variables using that are renamed to suit better. 2. Removed surplus peraggstate-sortstates[currentsortno] = NULL; (not related to this patch, but since we're moving that part of the code, we'd better fix) 3. Put back the missing aggfnoid from the error message. 4. Changed initialize_aggregates() to not pass the states. They're already in AggState and we're using aggstate-numstates to get the count of the items in that array, so it seems wrong to allow a different array to ever be passed in. 5. Changed wording of a few comments to try and reduce confusing of 'state' and 'transition state'. 6. Renamed AggState.peraggstate to transstates. I pluralised this to try to reduce confusion of the single state pointers named 'transstate' in the functions in nodeAgg.c. I did think that peragg should also become peraggs and pergroup should become pergroups, but didn't change those. Anything else I changed is self explanatory. What do you think? Looks good, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump -Fd and compression level
As per attached patch. Comments? It seems that the first test on the compression in pg_backup_tar.c is now obsolete. It didn't make much sense anyway. 211 if (AH-compression 0 || AH-compression 9) 212 AH-compression = Z_DEFAULT_COMPRESSION; 213 214 /* Don't compress into tar files unless asked to do so */ 215 if (AH-compression == Z_DEFAULT_COMPRESSION) 216 AH-compression = 0; 217 218 /* 219 * We don't support compression because reading the files back is not 220 * possible since gzdopen uses buffered IO which totally screws file 221 * positioning. 222 */ 223 if (AH-compression != 0) 224 exit_horribly(modulename, 225 compression is not supported by tar archive format\n); 226 } In fact, the first two tests look unnecessary. Neither condition should be possible now. Hello, Isn't the second test still required if you call pg_dump -Ft without setting -Z0 explicitly ? (= AH-compression == Z_DEFAULT_COMPRESSION) There still are a few suspicious places in pg_backup_tar.c that refer to the compression although not supported (except for blob ?) (C programming is beyond my capabilities, I can roughly read simple code ... ) regards, Marc Mamin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On 27 July 2015 at 20:11, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/27/2015 08:34 AM, David Rowley wrote: In this function I also wasn't quite sure if it was with comparing both non-NULL INITCOND's here. I believe my code comments may slightly contradict what the code actually does, as the comments talk about them having to match, but the code just bails if any are non-NULL. The reason I didn't check them was because it seems inevitable that some duplicate work needs to be done when setting up the INITCOND. Perhaps it's worth it? It would be nice to handle non-NULL initconds. I think you'll have to check that the input function isn't volatile. Or perhaps just call the input function, and check that the resulting Datum is byte-per-byte identical, although that might be awkward to do with the current code structure. Yeah, it's awkward, as I just managed to remind myself: The aggtranstype needs to be known before we can call GetAggInitVal() on the initval datum. That currently happens in build_transstate_for_aggref() only when we've decided to create a new state. transstate-initValue = GetAggInitVal(textInitVal, transstate-aggtranstype); And to get the aggtranstype: transstate-aggtranstype = resolve_aggregate_transtype(aggref-aggfnoid, aggform-aggtranstype, inputTypes, numArguments); Of course, not impossible, but lots of code gets duplicated. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] We need to support ForeignRecheck for late row locking, don't we?
On 2015/07/24 23:51, Kouhei Kaigai wrote: On 2015/07/22 19:10, Etsuro Fujita wrote: While working on the issue Foreign join pushdown vs EvalPlanQual, I happened to notice odd behaviors of late row locking in FDWs. I think the reason for that is because we don't check pushed-down quals inside an EPQ testing even if what was fetched by RefetchForeignRow was an updated version of the tuple rather than the same version previously obtained. So, to fix this, I'd like to propose that pushed-down quals be checked in ForeignRecheck. * I've modified ForeignRecheck so as to check pushed-down quals whether doing late locking or early locking. Isn't it an option to put a new callback in ForeignRecheck? FDW driver knows its private data structure includes expression node that was pushed down to the remote side. So, it seems to me the best way to consult FDW driver whether the supplied tuple should be visible according to the pushed down qualifier. More or less, this fix need a new interface contract around EvalPlanQual logic. It is better to give FDW driver more flexibility of its private data structure and the way to process recheck logic, rather than special purpose variable. If FDW driver managed pushed-down expression in its own format, requirement to pushedDownQual makes them to have qualifier redundantly. The callback approach does not have such kind of concern. That might be an idea, but is there any performance disadvantage as discussed in [1]?; it looks like that that needs to perform another remote query to see if the supplied tuple satisfies the pushed-down quals during EPQ testing. I expect the callback of ForeignRecheck runs ExecQual() towards the qualifier expression pushed-down but saved on the private data of ForeignScanState. It does not need to kick another remote query (unless FDW driver is not designed), so performance disadvantage is none or quite limited. Also, let's assume the case when scanrelid == 0 (join pushdown). It is easy to put special code path if scanrelid == 0, that implies ScanState is either ForeignScan or CustomScan. If ForeignRecheck (= recheckMtd) is called instead of the if- block below of the Assert() on ExecScanFetch, FDW driver will be able to put its own special code path to run alternative sub-plan. How this alternative sub-plan works? It walks down the sub-plan tree that is typically consists of NestLoop + ForeignScan for example, then ExecScanFetch() is called again towards ScanState with scanrelid 0 at that time. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers