Re: [HACKERS] Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)
On 02 May 2014 10:00, Amit Longote Wrote: I s the following behavior perceived fix-worthy? -- note the ' 1's in the output s po stgres=# CREATE TABLE test AS SELECT; SELECT 1 postgres=# insert into test select; INSERT 0 1 Or maybe, it just means 1 'null' row/record and not no row at all? It just creates an item pointer and corresponding to that heap tuple header (without data or bitmask for NULL) gets stored as part of this insertion. So though it does not insert anything (not even NULL) but still it reserve one row position. So while SELECT, it will not display anything but it will show actual number of rows. Even below syntax is also allowed: CREATE TABLE no_column_table(); IMO, this might be useful for dynamic use of table (later column might be added using 'ALTER') or to use as abstract ancestor in class hierarchy. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgaudit - an auditing extension for PostgreSQL
Hi Here is an initial version of an auditing extension for Postgres to generate log output suitable for compiling a comprehensive audit trail of database operations. Why auditing? Various laws and regulations (HIPAA, PCI DSS, EU Data Protection Directive etc.) as well as internal business requirements mandate auditing at database level. While many proprietary and some open source databases offer auditing facilities, Postgres does not currently provide any kind of auditing feature. Availability of such a feature will assist PostgreSQL's adoption in key sectors such as finance and health. About pgaudit pgaudit uses Event Triggers to log unambiguous representation of DDL, as well as a combination of executor and utility hooks for other commands (DML, including SELECT, as well as other utility commands): https://github.com/2ndQuadrant/pgaudit To provide fully-featured auditing capability, pgaudit exploits the capabilities of the new Event Trigger code, which 2ndQuadrant will be submitting to core Postgres. Currently that means you'll have to build against an enhanced version of Postgres [1]. However the intention is that pgaudit will be both a useful module now (it is designed to compile against 9.3 and 9.4), but will also serve as a demonstration of features proposed for 9.5. [1] deparse branch of git://git.postgresql.org/git/2ndquadrant_bdr.git Here's some example log output: LOG: [AUDIT],2014-04-30 17:13:55.202854+09,auditdb,ianb,ianb,DEFINITION,CREATE TABLE,TABLE,public.x,CREATE TABLE public.x (a pg_catalog.int4 , b pg_catalog.int4 ) WITH (oids=OFF) LOG: [AUDIT],2014-04-30 17:14:06.548923+09,auditdb,ianb,ianb,WRITE,INSERT,TABLE,public.x,INSERT INTO x VALUES(1,1); LOG: [AUDIT],2014-04-30 17:14:21.221879+09,auditdb,ianb,ianb,READ,SELECT,TABLE,public.x,SELECT * FROM x; LOG: [AUDIT],2014-04-30 17:15:25.620213+09,auditdb,ianb,ianb,READ,SELECT,VIEW,public.v_x,SELECT * from v_x; LOG: [AUDIT],2014-04-30 17:15:25.620262+09,auditdb,ianb,ianb,READ,SELECT,TABLE,public.x,SELECT * from v_x; LOG: [AUDIT],2014-04-30 17:16:00.849868+09,auditdb,ianb,ianb,WRITE,UPDATE,TABLE,public.x,UPDATE x SET a=a+1; LOG: [AUDIT],2014-04-30 17:16:18.291452+09,auditdb,ianb,ianb,ADMIN,VACUUM,,,VACUUM x; LOG: [AUDIT],2014-04-30 17:18:01.08291+09,auditdb,ianb,ianb,DEFINITION,CREATE FUNCTION,FUNCTION,public.func_x(),CREATE FUNCTION public.func_x() RETURNS pg_catalog.int4 LANGUAGE sql VOLATILE CALLED ON NULL INPUT SECURITY INVOKER COST 100.00 AS $dprs_$SELECT a FROM x LIMIT 1;$dprs_$ LOG: [AUDIT],2014-04-30 17:18:09.694755+09,auditdb,ianb,ianb,FUNCTION,EXECUTE,FUNCTION,public.func_x,SELECT * FROM func_x(); LOG: [AUDIT],2014-04-30 17:18:09.694865+09,auditdb,ianb,ianb,READ,SELECT,TABLE,public.x,SELECT * FROM func_x(); LOG: [AUDIT],2014-04-30 17:18:33.703007+09,auditdb,ianb,ianb,WRITE,DELETE,VIEW,public.v_x,DELETE FROM v_x; LOG: [AUDIT],2014-04-30 17:18:33.703051+09,auditdb,ianb,ianb,WRITE,DELETE,TABLE,public.x,DELETE FROM v_x; LOG: [AUDIT],2014-04-30 17:19:54.811244+09,auditdb,ianb,ianb,ADMIN,SET,,,set role ams; LOG: [AUDIT],2014-04-30 17:19:57.039979+09,auditdb,ianb,ams,WRITE,INSERT,VIEW,public.v_x,INSERT INTO v_x VALUES(1,2); LOG: [AUDIT],2014-04-30 17:19:57.040014+09,auditdb,ianb,ams,WRITE,INSERT,TABLE,public.x,INSERT INTO v_x VALUES(1,2); LOG: [AUDIT],2014-04-30 17:20:02.059415+09,auditdb,ianb,ams,ADMIN,SET,,,SET role ianb; LOG: [AUDIT],2014-04-30 17:20:09.840261+09,auditdb,ianb,ianb,DEFINITION,ALTER TABLE,TABLE,public.x,ALTER TABLE public.x ADD COLUMN c pg_catalog.int4 LOG: [AUDIT],2014-04-30 17:23:58.920342+09,auditdb,ianb,ianb,ADMIN,ALTER ROLE,,,ALTER USER ams SET search_path = 'foo'; How is this different to log_statement='all'? 1. pgaudit logs fully-qualified relation names, so you don't have to wonder if SELECT * FROM x referred to public.x or other.x. 2. pgaudit creates a log entry for each affected object, so you don't have to wonder which tables SELECT * FROM someview accessed, and it's easy to identify all accesses to a particular table. 3. pgaudit allows finer-grained control over what is logged. Commands are classified into read, write, etc. and logging for these classes can be individually enabled and disabled (either via pgaudit.log in postgresql.conf, or as a per-database or per-user setting). Here's a quick overview of how it works: 0. In 9.3 and 9.4, we build without USE_DEPARSE_FUNCTIONS. In the deparse branch (which I'll call 9.5 for convenience), we build with USE_DEPARSE_FUNCTIONS (set in the Makefile). 1. In 9.5, we create a ddl_command_end event trigger and use pg_event_trigger_{get_creation_commands,expand_command} to log a deparsed representation of any DDL commands supported by event triggers. 2. We always use an sql_drop event trigger to log DROP commands, but once 9.5 includes pg_event_trigger_get_deletion_commands() or some equivalent, we'll use that functionality as well. 3. We
Re: [HACKERS] Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)
On Fri, May 2, 2014 at 3:57 PM, Amit Langote amitlangot...@gmail.comwrote: Hi, I s the following behavior perceived fix-worthy? -- note the ' 1's in the output s po stgres=# CREATE TABLE test AS SELECT; SELECT 1 postgres=# insert into test select; INSERT 0 1 My guess why this happens is because changes made in the commit in $SUBJECT only pertain to fixing syntax errors and nothing else. Are you proposing that this does not insert a 0 column row? I don't find the current behaviour wrong. If it didn't insert the row then the query in the following would return 0 rows. begin work; create table nocols (); insert into nocols select; insert into nocols select; create table test (value int); insert into test values(1); select * from nocols cross join test; -- give 2 rows with the value 1 rollback; Why should the above results be any different than if I created the nocols table with a column then dropped it? Certainly removing all of the records on the drop of the last column would be wrong. Regards David Rowley -- Amit
Re: [HACKERS] Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)
Amit Langote wrote: Is the following behavior perceived fix-worthy? -- note the '1's in the outputs postgres=# CREATE TABLE test AS SELECT; SELECT 1 postgres=# insert into test select; INSERT 0 1 Or maybe, it just means 1 'null' row/record and not no row at all? Right, I'd say you end up with a table with two 0-tuples. Maybe odd, but it shouldn't be a problem. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)
On Fri, May 2, 2014 at 4:14 PM, David Rowley dgrowle...@gmail.com wrote: Why should the above results be any different than if I created the nocols table with a column then dropped it? Certainly removing all of the records on the drop of the last column would be wrong. I see, dropping the only column in a table does exhibit a similar behavior. -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Merge Join for Non '=' Operators
On 29 April 2014 13:28, Hadi Moshayedi Wrote, This looks like a great improvement. Repeating Nicolas's question, do you have a real-world example of such joins? I can think of some scenario where, user need to self-join and find the comparison with other tuples, For example, list down all the employee which has less salary compare to others employees and count of the employees who are earning more than that emp. Like query given below “select ta.emp_name, count(*) from t1 as ta, t1 as tb where ta.emp_salarytb.emp_salary group by ta.emp_name;” In my experience, I see more queries like self-join table A and table B where A.time BETWEEN B.time - '1 week' and B.time, similar to what Nicolas and Tom mentioned. As an example, count users who placed an order in the week following their registration. Currently I have implemented very basic POC which can work only for a b query, I think actual patch can be enhanced for these type of queries also. Can you send a patch so we can also try it? Patch is attached in the mail, but for testing we need to take care of some points 1. Patch is implemented only for ab type of queries (only for table with one integer field, this can be modified in create_nestloop_plan if needed, I have written for basic test with integer). 2. What changes are done There is no changes done in planner cost calculation, so hack is put while generating the plan. IF planner has selected NLJ plan, and enable material is set to off (this is the hint to select special Merge Join) Then add sort node above left and right tree for NLJ. 3. So if you want to test with normal NLJ no need to change anything, and if you want to test using this merge join just run ‘set enable_material=off’; postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.atb.a; QUERY PLAN --- Aggregate (cost=396625.51..396625.52 rows=1 width=0) - Nested Loop (cost=0.00..375758.83 rows=8346672 width=0) Join Filter: (ta.a tb.a) - Seq Scan on t1 ta (cost=0.00..73.04 rows=5004 width=4) - Materialize (cost=0.00..98.06 rows=5004 width=4) - Seq Scan on t1 tb (cost=0.00..73.04 rows=5004 width=4) Planning time: 0.291 ms (7 rows) Now For enabling this merge Join postgres=# set enable_material=off; SET postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.atb.a; QUERY PLAN --- Aggregate (cost=699432.08..699432.09 rows=1 width=0) - Nested Loop (cost=0.00..678565.40 rows=8346672 width=0) Join Filter: (ta.a tb.a) - Sort (cost=380.51..393.02 rows=5004 width=4) Sort Key: ta.a - Seq Scan on t1 ta (cost=0.00..73.04 rows=5004 width=4) - Sort (cost=380.51..393.02 rows=5004 width=4) Sort Key: tb.a - Seq Scan on t1 tb (cost=0.00..73.04 rows=5004 width=4) Planning time: 0.286 ms (10 rows) Thanks Regards, Dilip Kumar merge_join_nonequal.patch Description: merge_join_nonequal.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] includedir_internal headers are not self-contained
Re: Tom Lane 2014-05-02 9995.1398994...@sss.pgh.pa.us The patch is certainly too invasive to consider back-patching into 9.3, though. Understood. I feel unsure about this. I agree the patch is quite invasive. Leaving 9.3 in a broken state seems problematic. In particular I'm not sure what would Debian do about the whole issue; would they have to carry the patch for their 9.3 packages? My recommendation to Christoph upthread was that they just look the other way for the time being, ie, ignore the fact that relpath.h is unusable by freestanding apps in 9.3. Backpatching what I did for 9.4 would be an ABI break, so that seems to me to be out of the question in 9.3. And it's not clear that anybody outside core+contrib really needs relpath.h yet, anyway. (Of course, you could argue that if there are no external users then the ABI break isn't a problem; but if there are, then it is.) We are certainly not going to replace the old mess by a custom new one ;) The original problem that postgres_fe.h wasn't usable is already fixed for 9.3, so afaict the only remaining problem there seems the installation {rule, location} of common/, which is either taken care of by the patch I've sent, or a trivial addition to the packaging files on our side. As long as there's no complaints, we'll simply ignore the fact that the other headers in 9.3's common/ aren't self-contained, the workaround to simply install the server headers seems easy enough. We should probably be able to move to 9.4 in time for the freeze of Debian Jessie in November, so backports won't matter that much. (As long as the 9.3-and-older server-headers are self-contained and/or compatible with what 9.4 provides...) Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Obsolete coding in fork_process.c
On Thu, May 01, 2014 at 11:07:51PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Thu, May 01, 2014 at 08:44:46PM -0400, Tom Lane wrote: You're only considering one aspect of the problem. Yeah, you might not get duplicated output unless system() prints something before exec(), but we would also like to have causality: that is, whatever we sent to stdout before calling system() should appear there before anything the child process sends to stdout. Good point. I suppose a couple of fflush() calls have negligible cost next to a system() or popen(). Introduce pg_popen()/pg_system(), and adopt a rule that they are [almost] our only callers of raw popen()/system()? Meh. I'm not usually in favor of adopting nonstandard notation, and this doesn't seem like a place to start. In particular, if you don't want to use fflush(NULL) in these proposed wrappers, then call sites are still going to have an issue with needing to do manual fflushes; pg_regress.c's spawn_process is an example: /* * Must flush I/O buffers before fork. Ideally we'd use fflush(NULL) here * ... does anyone still care about systems where that doesn't work? */ fflush(stdout); fflush(stderr); if (logfile) fflush(logfile); pid = fork(); I think that removing the need for fflush(stdout) and fflush(stderr) in this context would mostly result in people forgetting to fflush other output files. I'd rather have the two lines of boilerplate (and a comment about why we're refusing to depend on fflush(NULL)) than take that risk. Works for me. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] need xmllint on borka
Tom Lane wrote: (The subtext here is that borka is absolutely not an acceptable place to encounter documentation build failures. By the time we're at that stage of the release cycle, I don't really care what xmllint might have to say; there isn't going to be time to make it happy.) Borka is what runs the guaibasaurus animal, so failures would show up in buildfarm ... If the xmllint check could be made optional, I guess we could have the failures show up in buildfarm but avoid having them cause a problem for make dist when releases are created. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] need xmllint on borka
Peter Eisentraut wrote: I have been working on making the DocBook XML output valid. The first part was bb4eefe7bf518e42c73797ea37b033a5d8a8e70a, I now have the rest ready, but I'll spare you the mostly mechanical 200kB patch for now. In addition, I'd like to add the attached patch with an xmllint call to make sure things stay valid. But we don't have xmllint installed on borka, where we build the releases. Could someone please install it? xmllint installed on borka. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sending out a request for more buildfarm animals?
Hi, There's pretty little coverage of non mainstream platforms/compilers in the buildfarm atm. Maybe we should send an email on -announce asking for new ones? There's no coverage for OS-wise; * AIX (at all) * HP-UX (for master at least) (* Tru64) (* UnixWare) Architecture wise there's no coverage for: * some ARM architecture varians * mips * s390/x * sparc 32bit (* s390) (* alpha) (* mipsel) (* M68K) A couple of those aren't that important (my opinion indicated by ()), but the other ones really should be covered or desupported. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] elog a stack trace
Does anyone have any hints on how to get a stack trace programmatically, rather than trying to run ps to get the pid and then attach gdb to a ephemeral process and hoping the situation has not been changed while you are doing that? I'd like to set log_error_verbosity = stack or elog_stack(,...) But those don't exist. Cheers, Jeff
Re: [HACKERS] elog a stack trace
Hi, On 2014-05-02 08:16:39 -0700, Jeff Janes wrote: Does anyone have any hints on how to get a stack trace programmatically, rather than trying to run ps to get the pid and then attach gdb to a ephemeral process and hoping the situation has not been changed while you are doing that? I'd like to set log_error_verbosity = stack or elog_stack(,...) There's backtrace() on several platforms. The stacktraces are less detailed than what gdb gives you tho. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 05/01/2014 11:19 PM, Ian Barwick wrote: Here is an initial version of an auditing extension for Postgres to generate log output suitable for compiling a comprehensive audit trail of database operations. Cool! Looking forward to seeing it around the 9.5 cycle. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Ian, * Ian Barwick (i...@2ndquadrant.com) wrote: Here is an initial version of an auditing extension for Postgres to generate log output suitable for compiling a comprehensive audit trail of database operations. Neat stuff. Why auditing? Yeah, we really need to improve here. I've been hoping to make progress on this and it looks like I'll finally have some time to. pgaudit uses Event Triggers to log unambiguous representation of DDL, as well as a combination of executor and utility hooks for other commands (DML, including SELECT, as well as other utility commands): While certainly a good approach to minimize the changes needed to the backend, I'd really like to see us be able to, say, log to a table and have more fine-grained control over what is logged, without needing an extension. 1. pgaudit logs fully-qualified relation names, so you don't have to wonder if SELECT * FROM x referred to public.x or other.x. Yeah, that's definitely an issue for any kind of real auditing. 2. pgaudit creates a log entry for each affected object, so you don't have to wonder which tables SELECT * FROM someview accessed, and it's easy to identify all accesses to a particular table. Interesting- I'm a bit on the fence about this one. Perhaps you can elaborate on the use-case for this? 3. pgaudit allows finer-grained control over what is logged. Commands are classified into read, write, etc. and logging for these classes can be individually enabled and disabled (either via pgaudit.log in postgresql.conf, or as a per-database or per-user setting). This is something I've been mulling over for a couple of years (you can see notes from the discussion at the 2011 hacker meeting on the wiki about how we might change our logging system to allow for better filtering). Planned future improvements include: 1. Additional logging facilities, including to a separate audit log file and to syslog, and potentially logging to a table (possibly via a bgworker process). Currently output is simply emitted to the server log via ereport(). Using the existing logging collector will almost certainly be a contention point- we've seen that before. I've had thoughts about an option to log to individual files from each backend (perhaps based on that backend's position in the proc table) or directly from each backend to a remote service (eg: rabbitMQ/AMQP or something). Regarding background worker processes, a thought that's been kicked around a bit is to actually change our existing logging collector to be a background worker (or perhaps be able to have multiple?) which is fed from a DSM queue and then logs to a file (or maybe files), or a table or something else. 2. To implement per-object auditing configuration, it would be nice to use extensible reloptions (or an equivalent mechanism) Yeah, that's another interesting challenge. This kind of auditing is often about specific information (and therefore specific objects) and it'd be ideal to have that set up and managed alongside the table definition. Having the auditing done in core instead of through an extension would make this easier to address though. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 05/02/2014 11:04 AM, Stephen Frost wrote: This is something I've been mulling over for a couple of years (you can see notes from the discussion at the 2011 hacker meeting on the wiki about how we might change our logging system to allow for better filtering). Logging hooks. We really need some contrib/ modules which take advantage of these. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
I've been thinking about how we might implement the multiple column assignment UPDATE syntax that was introduced in SQL:2003. This feature allows you to do UPDATE table SET ..., (column, column, ...) = row-valued expression, ... where the system arranges to evaluate the row-valued expression just once per row and then assign its fields into the specified target columns. Back in commit 6e8596a146c9b16f2c053ea8c6e361a114c0b65c we introduced some limited support for this syntax, but it only handles a row-valued expression that is a ROW() constructor, and it just does a simple syntactic transformation of pulling apart the ROW() constructor and building an independent assignment to each target column. The actually interesting uses for this feature don't work with that implementation approach. The most common case I've seen asked for is where the expression is a sub-SELECT returning multiple columns (but at most one row). As far as the parser is concerned, the main hurdle to supporting this feature is that the representation of an UPDATE's targetlist assumes that each list element is an independent TargetEntry representing a single assignment. Now, there is a heck of a lot of code that knows what targetlists look like, so I'm not eager to try to change that basic assumption. What seems like probably a better idea is to represent SET (target1, target2, target3) = foo as though it were SET target1 = fooref.col1, target2 = fooref.col2, target3 = fooref.col3 where fooref is some Param-like reference to a separate list of expressions that have composite outputs. It would be understood that this separate targetlist would be evaluated just once before evaluating the main tlist. This approach would allow all the existing targetlist manipulation code to stay about the same. It would be a bit of a challenge for ruleutils.c to reconstruct the original syntax when printing an UPDATE in a rule, but I think that's just a small matter of programming. (Possibly it would help if the elements of the separate composite-values targetlist contained markers as to which main-tlist elements they were for.) Now, we could probably implement it straightforwardly just based on that idea, though it's not quite clear where to shoehorn evaluation of the separate targetlist into the constructed plan. One way would be to insert an additional evaluation level by adding a Result node on top of the normal plan, and then have the lower level compute the composite values as resjunk tlist elements, while the upper level does FieldSelects from the composite values to implement the fooref.colN references. However, I'm mainly interested in the sub-SELECT case; indeed, anything else you might want to do could be transformed into a sub-SELECT, so I wouldn't feel bad if we just restricted the new feature to that. And this doesn't seem like quite the right way to do it for sub-SELECTs. In the case of sub-SELECTs, we have almost the right execution mechanism already, in that initPlans are capable of setting multiple PARAM_EXEC runtime Params, one for each output column of the sub-SELECT. So what I called fooref.col1 etc above could just be PARAM_EXEC Params referring to the subplan outputs --- except that initPlans are only for uncorrelated subqueries (those without any outer references to Vars of the parent query level). And the interesting cases for UPDATE generally involve correlated subqueries. What I'm thinking about this is that we ought to make an effort to unify the currently separate implementation paths for correlated and uncorrelated subqueries. Instead of SubPlans in the expression tree for correlated subqueries, I think they should all be treated much like initPlans are now, ie, there are PARAM_EXEC Params referencing outputs from a list of subqueries that are attached to the expression tree's parent plan node, and we lazily evaluate the subqueries upon first use of one of their output parameters. What would be different from the current handling of initPlans is that each time we advance to a new input row, we'd need to reset the evaluation state of the subqueries that are correlated. The reason for changing it like that is so that we can have multiple separate Params referencing different output columns of a single correlated subquery, and be sure that we evaluate the correlated subquery only once; the current SubPlan mechanism can't support separate references to the same subplan. Now, this would add a small amount of new bookkeeping overhead to use of correlated subqueries, but I find it hard to believe that that'd be noticeable compared to the startup/shutdown cost of the subquery. So, if we were to revise the handling of correlated subqueries like that, then for the case of a row-valued expression that is a sub-SELECT we wouldn't need any explicit runtime evaluation of separate targetlist entries. Use of Params referencing the subplan's outputs would be enough to cause evaluation to happen at the right
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
* Josh Berkus (j...@agliodbs.com) wrote: Logging hooks. We really need some contrib/ modules which take advantage of these. I'm aware and I really am not convinced that pushing all of this to contrib modules using the hooks is the right approach- for one thing, it certainly doesn't seem to me that we've actually gotten a lot of traction from people to actually make use of them and keep them updated. We've had many of those hooks for quite a while. What 2Q has done here is great, but they also point out problems with building this as a contrib module using the hooks. As we add more capabilities and improve the overall PG system (new objects, etc), I'm rather unconvinced that having to go, independently, update the contrib modules to understand each new object is going to be a terribly workable long-term solution. Additionally, using triggers (either on the tables or the event triggers), while good for many use-cases, doesn't completely answer the auditing requirements (SELECT being the great example, but there are others) and having to combine event triggers with various hooks just doesn't strike me as a great design. (I don't intend to knock what 2Q has done here at all- they're using a minimal-backend-hacking approach, and under that constraint they've done exactly what makes sense). Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Faster array_length()
Hello, The attached patch improves the performance of array_length() by detoasting only the overhead part of the datum. Here is a test case: postgres=# create table array_length_test as select array_agg(a) a from generate_series(1, 1) a, generate_series(1, 1) b group by b; Without the patch: postgres=# select sum(array_length(a, 1)) from array_length_test; sum --- 1 (1 row) Time: 199.002 ms With the patch: postgres=# select sum(array_length(a, 1)) from array_length_test; sum --- 1 (1 row) Time: 34.599 ms The motivation for patch is that some of our customers use arrays to store a sequence of tens of thousands of events in each row. They often need to get the last 10 event for each row, for which we do A[array_length(A, 1) - 9: 100] (assuming 1M is an upper-bound. we could use array_length() instead of this constant too, but that is unnecessary if we know the upper-bound and only slows down the query). Without this optimization, array gets detoasted twice. With this patch, array_length() becomes much faster, and the whole query saves few seconds. Of course this technique is applicable to some other functions too, but they have never become a bottleneck for me, so I decided to keep the changes only to this function. Another alternative I could think of was introducing python style slicing, in which negative indexes start from end of array, so -10 means 10th element from end. I thought this would be a bigger change and is probably unnecessary, so I decided to improve array_length() instead. Feedback is welcome. Thanks, -- Hadi diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 91df184..5e1d9c2 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -1719,11 +1719,15 @@ array_upper(PG_FUNCTION_ARGS) Datum array_length(PG_FUNCTION_ARGS) { - ArrayType *v = PG_GETARG_ARRAYTYPE_P(0); + Datum arrdatum = PG_GETARG_DATUM(0); int reqdim = PG_GETARG_INT32(1); + ArrayType *v; int *dimv; int result; + v = (ArrayType *) PG_DETOAST_DATUM_SLICE(arrdatum, 0, + ARR_OVERHEAD_NONULLS(MAXDIM)); + /* Sanity check: does it look like an array at all? */ if (ARR_NDIM(v) = 0 || ARR_NDIM(v) MAXDIM) PG_RETURN_NULL(); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix quiet inline configure test for newer clang compilers.
Andres Freund and...@2ndquadrant.com writes: On 2014-05-01 20:16:48 +, Tom Lane wrote: Fix quiet inline configure test for newer clang compilers. Since it doesn't seem to have caused any problems I think this should be backpatched. Hearing no objections, done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Fri, May 2, 2014 at 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: I've been thinking about how we might implement the multiple column assignment UPDATE syntax that was introduced in SQL:2003. This feature allows you to do UPDATE table SET ..., (column, column, ...) = row-valued expression, ... where the system arranges to evaluate the row-valued expression just once per row and then assign its fields into the specified target columns. Back in commit 6e8596a146c9b16f2c053ea8c6e361a114c0b65c we introduced some limited support for this syntax, but it only handles a row-valued expression that is a ROW() constructor, and it just does a simple syntactic transformation of pulling apart the ROW() constructor and building an independent assignment to each target column. The actually interesting uses for this feature don't work with that implementation approach. The most common case I've seen asked for is where the expression is a sub-SELECT returning multiple columns (but at most one row). As far as the parser is concerned, the main hurdle to supporting this feature is that the representation of an UPDATE's targetlist assumes that each list element is an independent TargetEntry representing a single assignment. Now, there is a heck of a lot of code that knows what targetlists look like, so I'm not eager to try to change that basic assumption. What seems like probably a better idea is to represent SET (target1, target2, target3) = foo as though it were SET target1 = fooref.col1, target2 = fooref.col2, target3 = fooref.col3 where fooref is some Param-like reference to a separate list of expressions that have composite outputs. It would be understood that this separate targetlist would be evaluated just once before evaluating the main tlist. This approach would allow all the existing targetlist manipulation code to stay about the same. It would be a bit of a challenge for ruleutils.c to reconstruct the original syntax when printing an UPDATE in a rule, but I think that's just a small matter of programming. (Possibly it would help if the elements of the separate composite-values targetlist contained markers as to which main-tlist elements they were for.) Now, we could probably implement it straightforwardly just based on that idea, though it's not quite clear where to shoehorn evaluation of the separate targetlist into the constructed plan. One way would be to insert an additional evaluation level by adding a Result node on top of the normal plan, and then have the lower level compute the composite values as resjunk tlist elements, while the upper level does FieldSelects from the composite values to implement the fooref.colN references. However, I'm mainly interested in the sub-SELECT case; indeed, anything else you might want to do could be transformed into a sub-SELECT, so I wouldn't feel bad if we just restricted the new feature to that. And this doesn't seem like quite the right way to do it for sub-SELECTs. In the case of sub-SELECTs, we have almost the right execution mechanism already, in that initPlans are capable of setting multiple PARAM_EXEC runtime Params, one for each output column of the sub-SELECT. So what I called fooref.col1 etc above could just be PARAM_EXEC Params referring to the subplan outputs --- except that initPlans are only for uncorrelated subqueries (those without any outer references to Vars of the parent query level). And the interesting cases for UPDATE generally involve correlated subqueries. What I'm thinking about this is that we ought to make an effort to unify the currently separate implementation paths for correlated and uncorrelated subqueries. Instead of SubPlans in the expression tree for correlated subqueries, I think they should all be treated much like initPlans are now, ie, there are PARAM_EXEC Params referencing outputs from a list of subqueries that are attached to the expression tree's parent plan node, and we lazily evaluate the subqueries upon first use of one of their output parameters. What would be different from the current handling of initPlans is that each time we advance to a new input row, we'd need to reset the evaluation state of the subqueries that are correlated. The reason for changing it like that is so that we can have multiple separate Params referencing different output columns of a single correlated subquery, and be sure that we evaluate the correlated subquery only once; the current SubPlan mechanism can't support separate references to the same subplan. Now, this would add a small amount of new bookkeeping overhead to use of correlated subqueries, but I find it hard to believe that that'd be noticeable compared to the startup/shutdown cost of the subquery. So, if we were to revise the handling of correlated subqueries like that, then for the case of a row-valued expression that is a sub-SELECT we wouldn't need any explicit runtime evaluation of
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
Merlin Moncure mmonc...@gmail.com writes: On Fri, May 2, 2014 at 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: I've been thinking about how we might implement the multiple column assignment UPDATE syntax that was introduced in SQL:2003. This feature allows you to do UPDATE table SET ..., (column, column, ...) = row-valued expression, ... Couple quick questions: 1) how does this interplay with RETURNING? I guess it probably doesn't change, but I imagine there's be no way to reference the composite result in the RETURNING statement? Not as such; obviously you could reference the assigned-to columns in RETURNING and thereby reconstruct the composite value. 2) I often wish that you could reference the table (or it's alias) directly as the field list. UPDATE foo f set f = (...)::foo; or even UPDATE foo SET foo = foo; Hm. You could get there with this syntax as long as you didn't mind writing out the field list explicitly. Arguments why you should want to do that are the same as for avoiding SELECT *, with maybe a bit more urgency since at least SELECT * won't trash your data if you get it wrong. However, assuming that that argument doesn't impress you ... My draft copy of SQL99 mentions a syntax UPDATE table SET ROW = row-valued expression [ WHERE ... ] which does not appear in later editions of the spec, and probably wasn't in SQL99 final either (since SQL:2003 does not mention it as a removed feature). I'm not sure we'd want to implement that; it would require making ROW into a fully-reserved word, which it is not today, and that seems rather a high price for implementing a not-per-spec feature. But I don't think your suggestions of the table name or alias work; they could conflict with an actual column name. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Fri, May 2, 2014 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: 2) I often wish that you could reference the table (or it's alias) directly as the field list. UPDATE foo f set f = (...)::foo; or even UPDATE foo SET foo = foo; Hm. You could get there with this syntax as long as you didn't mind writing out the field list explicitly. Arguments why you should want to do that are the same as for avoiding SELECT *, with maybe a bit more urgency since at least SELECT * won't trash your data if you get it wrong. However, assuming that that argument doesn't impress you ... My draft copy of SQL99 mentions a syntax UPDATE table SET ROW = row-valued expression [ WHERE ... ] which does not appear in later editions of the spec, and probably wasn't in SQL99 final either (since SQL:2003 does not mention it as a removed feature). I'm not sure we'd want to implement that; it would require making ROW into a fully-reserved word, which it is not today, and that seems rather a high price for implementing a not-per-spec feature. But I don't think your suggestions of the table name or alias work; they could conflict with an actual column name. Presumably it'd follow similar rules to SELECT -- resolve the column name in the face of ambiguity. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
Merlin Moncure mmonc...@gmail.com writes: On Fri, May 2, 2014 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: But I don't think your suggestions of the table name or alias work; they could conflict with an actual column name. Presumably it'd follow similar rules to SELECT -- resolve the column name in the face of ambiguity. Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Fri, May 2, 2014 at 3:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Fri, May 2, 2014 at 2:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: But I don't think your suggestions of the table name or alias work; they could conflict with an actual column name. Presumably it'd follow similar rules to SELECT -- resolve the column name in the face of ambiguity. Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... That's already the case with select statements and, if a user were concerned about that, always have the option of aliasing the table as nearly 100% of professional developers do: SELECT f FROM foo f; etc. Now, I need this feature a lot less than I used to (although I do like the symmetry with SELECT); hstore and jsonb have matured to the point that they can handle most trigger function operations that you'd want to abstract over multiple tables without expensive calls to information_schema. The main advantages for a native approach would be type safety (although even that situation is improving at long last), performance, and code complexity. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sending out a request for more buildfarm animals?
On Fri, May 02, 2014 at 05:04:01PM +0200, Andres Freund wrote: There's pretty little coverage of non mainstream platforms/compilers in the buildfarm atm. Maybe we should send an email on -announce asking for new ones? There's no coverage for OS-wise; * AIX (at all) * HP-UX (for master at least) (* Tru64) (* UnixWare) Architecture wise there's no coverage for: * some ARM architecture varians * mips * s390/x * sparc 32bit (* s390) (* alpha) (* mipsel) (* M68K) A couple of those aren't that important (my opinion indicated by ()), but the other ones really should be covered or desupported. More coverage of non-gcc compilers would be an asset to the buildfarm. +1 for sending a call for help to -announce. I agree with your importance estimates, particularly on the OS side. -1 for making code-level changes to desupport a platform based on the lack of a buildfarm member, though I don't mind documentation/advocacy changes on that basis. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] regexp_replace( , , , NULL ) returns null?
Is there any particular reason for this: decibel@decina.local=# SELECT regexp_replace( 'a', 'a', 'b', null ) IS NULL; ?column? -- t (1 row) ISTM it’d be a lot better if it treated NULL flags the same as ‘’... -- Jim Nasby, Lead Data Architect (512) 569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regexp_replace( , , , NULL ) returns null?
Jim Nasby jna...@enova.com writes: Is there any particular reason for this: decibel@decina.local=# SELECT regexp_replace( 'a', 'a', 'b', null ) IS NULL; ?column? -- t (1 row) Yeah: regexp_replace is strict. ISTM itd be a lot better if it treated NULL flags the same as ... In Oracle's universe that probably makes sense, but to me it's not sensible. Why should unknown flags produce a non-unknown result? I find it hard to envision many use-cases where you wouldn't actually have the flags as a constant, anyway; they're too fundamental to the behavior of the function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New and interesting replication issues with 9.2.8 sync rep
Just got a report of a replication issue with 9.2.8 from a community member: Here's the sequence: 1) A -- B (sync rep) 2) Shut down B 3) Shut down A 4) Start up B as a master 5) Start up A as sync replica of B 6) A successfully joins B as a sync replica, even though its transaction log is 1016 bytes *ahead* of B. 7) Transactions written to B all hang 8) Xlog on A is now corrupt, although the database itself is OK Now, the above sequence happened because of the user misunderstanding what sync rep really means. However, A should not have been able to connect with B in replication mode, especially in sync rep mode; that should have failed. Any thoughts on why it didn't? I'm trying to produce a test case ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster array_length()
Hadi Moshayedi h...@moshayedi.net writes: The attached patch improves the performance of array_length() by detoasting only the overhead part of the datum. It looks to me like this would actually make things worse for cases where the input array wasn't toasted-out-of-line (because it would uselessly make a copy of the header part, costing a palloc cycle). I'm not averse to improving the case you're worried about, but you have to pay attention to not having bad side-effects on other cases. Another thought is that this can only win for arrays that are external without being compressed; when they are compressed, then heap_tuple_untoast_attr_slice will fetch and decompress the entire array anyway (and then, just to add insult to injury, make another copy :-(). With that in mind, I was surprised that your test case showed any improvement at all --- it looks like the arrays aren't getting compressed for some reason. There are going to be a lot of other cases where this patch doesn't help unless the user turns off compression, which will hurt his performance in other ways. Now, the slice detoast support was only designed to work with data stored in external mode (that is, with compression manually disabled via the appropriate ALTER TABLE option), and that's not unreasonable for its originally-intended application of being able to fetch any part of an external text string. But it strikes me that for what you want here, namely fetching just a few bytes from the start, it ought to be possible to do better. Could we teach the toast code to fetch and decompress just an initial subset of the data? (This might be useful even for the original use-case of slice fetches, as long as the desired slice isn't too close to the end of the datum.) Bottom line for me is that you've shown that there can be a win from improving this code, but we need to do some basic work on the slice-fetching logic to get full value out of the idea. Of course this technique is applicable to some other functions too, but they have never become a bottleneck for me, so I decided to keep the changes only to this function. I would expect a committable version of this patch to cover all the array-dimension-related functions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster array_length()
Thanks for looking into this. With that in mind, I was surprised that your test case showed any improvement at all --- it looks like the arrays aren't getting compressed for some reason. You are right, it seems that they were not getting compressed, probably because the arrays were seq 1 which seems to not get compressed by pglz. When I changed the test data to an array containing 1 ones, there were no speed improvement anymore. I'll look into how to improve the compressed case and other issues you raised. Thanks, -- Hadi
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
At 2014-05-02 14:04:27 -0400, sfr...@snowman.net wrote: I'd really like to see us be able to, say, log to a table and have more fine-grained control over what is logged, without needing an extension. There were several factors we considered in our work: 1. We did the minimum possible to produce something that gives us demonstrably more than «log_statement=all» in 9.3/9.4/9.5. 2. We wanted to produce something that could be used *now*, i.e. with 9.3 and soon 9.4, to get wider feedback based on actual usage. I'm hoping that by the time we make a submission for 9.5, we'll have a clearer picture of what Postgres auditing should look like. 3. We steered clear of implementing different log targets. We know that ereport() doesn't cut it, but decided that doing anything else would be better after some feedback and wider discussion. Any suggestions in this regard are very welcome. (Stephen, I can see from your mail that you've already inferred at least some of the above, so it's more a general statement of our approach than a response to what you said.) 2. pgaudit creates a log entry for each affected object […] Interesting- I'm a bit on the fence about this one. Perhaps you can elaborate on the use-case for this? Who accessed public.x last month? Answering that question would become much more difficult if one had to account for every view that might refer to public.x. And did the view refer to public.x before the schema change on the first Wednesday of last month? We don't have a deparsed representation of DML, so select * from x is logged differently from select * from other.x. Same with potential complications like how exactly a join is written. The way pgaudit does it, you can just grep public.x in your audit log and be sure (modulo bugs, of course) you're seeing everything relevant. This kind of auditing is often about specific information (and therefore specific objects) and it'd be ideal to have that set up and managed alongside the table definition. Yes, exactly. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
At 2014-05-02 14:22:23 -0400, sfr...@snowman.net wrote: I'm aware and I really am not convinced that pushing all of this to contrib modules using the hooks is the right approach- for one thing, it certainly doesn't seem to me that we've actually gotten a lot of traction from people to actually make use of them and keep them updated. For what it's worth, I greatly appreciate *having* the hooks. Without them, it would have been much more difficult to prototype pgaudit, and it would have been impossible to do so in a way that could be used with 9.3/9.4. As for whether auditing as a feature *should* be an extension, I do not have a strong opinion yet. If a consensus formed around a better design in-core, I certainly wouldn't object. I'm rather unconvinced that having to go, independently, update the contrib modules to understand each new object is going to be a terribly workable long-term solution. (I am not expressing any opinion at this time on this larger question.) having to combine event triggers with various hooks just doesn't strike me as a great design. Suggestions are welcome, but I have to say that I'm not a big fan of reinventing what event trigger give us in the way of deparsing either. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tab completion for setting search_path
I've been working with an app that uses a schema name whose spelling is hard to type, and the lack of tab completion for SET search_path TO was bugging me. So see attached. I filter out the system schemata, but not public. For commit fest next. Cheers, Jeff