Re: [HACKERS] [v9.2] Fix Leaky View Problem
Hi Robert, I'm a bit confusing about this sentence. If you can make this work, I think it could be a pretty sweet plannner optimization even apart from the implications for security views. Consider a query of this form: A LEFT JOIN B LEFT JOIN C where B is a view defined as: B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5 Now let's suppose that from_collapse_limit/join_collapse_limit are set low enough that we decline to fold these subproblems together. If there happens to be a qual B.x = 1, where B.x is really B1.x, then the generated plan sucks, because it will basically lose the ability to filter B1 early, very possibly on, say, a unique index. Or at least a highly selective index. I tried to reproduce the scenario with enough small from/join_collapse_limit (typically 1), but it allows to push down qualifiers into the least scan plan. E.g) mytest=# SET from_collapse_limit = 1; mytest=# SET join_collapse_limit = 1; mytest=# CREATE VIEW B AS SELECT B1.* FROM B1,B2,B3 WHERE B1.x = B2.x AND B2.x = B3.x; mytest=# EXPLAIN SELECT * FROM A,B,C WHERE A.x=B.x AND B.x=C.x AND f_leak(B.y); QUERY PLAN Merge Join (cost=381.80..9597.97 rows=586624 width=108) Merge Cond: (a.x = b1.x) - Merge Join (cost=170.85..290.46 rows=7564 width=72) Merge Cond: (a.x = c.x) - Sort (cost=85.43..88.50 rows=1230 width=36) Sort Key: a.x - Seq Scan on a (cost=0.00..22.30 rows=1230 width=36) - Sort (cost=85.43..88.50 rows=1230 width=36) Sort Key: c.x - Seq Scan on c (cost=0.00..22.30 rows=1230 width=36) - Materialize (cost=210.95..528.56 rows=15510 width=44) - Merge Join (cost=210.95..489.78 rows=15510 width=44) Merge Cond: (b1.x = b3.x) - Merge Join (cost=125.52..165.40 rows=2522 width=40) Merge Cond: (b1.x = b2.x) - Sort (cost=40.09..41.12 rows=410 width=36) Sort Key: b1.x - Seq Scan on b1 (cost=0.00..22.30 rows=410 width=36) Filter: f_leak(y) - Sort (cost=85.43..88.50 rows=1230 width=4) Sort Key: b2.x - Seq Scan on b2 (cost=0.00..22.30 rows=1230 width=4) - Sort (cost=85.43..88.50 rows=1230 width=4) Sort Key: b3.x - Seq Scan on b3 (cost=0.00..22.30 rows=1230 width=4) (25 rows) In this example, f_leak() takes an argument come from B1 table within B view, and it was correctly distributed to SeqScan on B1. From perspective of the code, the *_collapse_limit affects the contents of joinlist being returned from deconstruct_jointree() whether its sub-portion is flatten, or not. However, the qualifiers are distributed on distribute_restrictinfo_to_rels() to RelOptInfo based on its dependency of relations being referenced by arguments. Thus, the above f_leak() was distributed to B1, not B, because its arguments come from only B1. I agree with the following approach to tackle this problem in 100%. However, I'm unclear how from/join_collapse_limit affects to keep sub-queries unflatten. It seems to me it is determined based on the result of is_simple_subquery(). 1. Let quals percolate down into subqueries. 2. Add the notion of a security view, which prevents flattening and disables the optimization of patch #1 3. Add the notion of a leakproof function, which can benefit from the optimization of #1 even when the view involved is a security view as introduced in #2 Thanks, 2011/10/11 Robert Haas robertmh...@gmail.com: On Mon, Oct 10, 2011 at 4:28 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: I agreed. We have been on the standpoint that tries to prevent leakable functions to reference a portion of join-tree being already flatten, however, it has been a tough work. It seems to me it is much simple approach that enables to push down only non-leaky functions into inside of sub-queries. An idea is to add a hack on distribute_qual_to_rels() to relocate a qualifier into inside of the sub-query, when it references only a particular sub-query being come from a security view, and when the sub-query satisfies is_simple_subquery(), for example. If you can make this work, I think it could be a pretty sweet plannner optimization even apart from the implications for security views. Consider a query of this form: A LEFT JOIN B LEFT JOIN C where B is a view defined as: B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5 Now let's suppose that from_collapse_limit/join_collapse_limit are set low enough that we decline to fold these subproblems together. If there happens to be a qual B.x = 1, where B.x is really B1.x, then the generated plan sucks, because it will basically lose
Re: [HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM
Noah Misch n...@leadboat.com writes: On Sat, Oct 15, 2011 at 02:58:45PM -0400, Tom Lane wrote: [algorithm for a regular index scan satisfying key IN (...)] Sounds sensible. The algorithm applies to more than ScalarArrayOpExpr; is it not the ability to handle an OR'ed list of ScanKey instead of an AND'ed one? No, because it's restricted to all the elements having the same operator and same index column; it's not a generic OR. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: set GUC variables for single query
Hi, this idea has cropped up last PGCon - the ability to set GUC variables for the duration of a single query. It would work by setting the GUCs for the duration of the query and setting them back to what they were after it has terminated. By setting them back I mean respecting the previously set values, regardless of their source (set in run-time, per-role settings, postgresql.conf settings). An example of where this would be useful: an application maintains a persistent connection to the database and answers requests for data from a bunch of clients. Each connected client has a preferred timezone and would like to get results in that timezone. Currently the application has to either sprinkle each query with AT TIME ZONE or wrap the queries in BEGIN; SET LOCAL TIMEZONE ..; query; COMMIT. It gets more complex when things like pgbouncer come into play. Another example is a one-off query that should use a different statement_timeout than the server has configured or a REINDEX command that would like to use more maintenance_work_mem. It mostly falls into the realm of syntax sugar, but as more than one person felt it's a good idea, I thought I'd float it around here. I poked a little bit at the grammar to see where could it fit and didn't have much success of doing it without a new reserved keyword. Supposing the idea gets some traction, any suggestions for the syntax? Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: this idea has cropped up last PGCon - the ability to set GUC variables for the duration of a single query. It would work by setting the GUCs for the duration of the query and setting them back to what they were after it has terminated. Doesn't SET LOCAL cover this use-case pretty well already? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIMITing number of results in a VIEW with global variables
On Oct15, 2011, at 14:52 , Thomas Girault wrote: Alternatively, we could also set the alpha value before the query : SELECT set_alpha(0.1); SELECT age, young(age) FROM employees WHERE young(age); That's certainly much safer. I would be very interested to know if there is smarter way to set global variables. The closest thing to global variables are GUC settings. These can be set globally in postgres.conf, per user and/or per-database via ALTER ROLE/DATABSE ... [IN DATABASE ...], per session with SET, per function via CREATE FUNCTION ... SET and finally per subtransaction with SET LOCAL. Modules can add their own GUC settings - you should be able to find an example in one of the modules in contrib/ But your getter/setter-based solution isn't bad either - in fact, the trigram module contains something very similar. Just don't try to modify the value mid-query. I can sort the results in the view 'sorted_employees' according to value MU of a fuzzy predicate thanks to fuzzy2bool cast function. CREATE OR REPLACE VIEW sorted_employees AS SELECT *, get_mu() as mu FROM employees ORDER BY mu DESC; Are you aware that an ORDER BY clause in a VIEW is only going to work if you do SELECT .. .FROM view. I It really the first time I am using views, I didn't know that ! Hm, I think I didn't explain that to well, so to avoid giving you a false impression here's another try. A SELECT without an ORDER BY clause attached to the *outmost* level never guarantees any particular ordering of the result, nor any particular relationship between the ordering of the SELECT's data sources and the ordering of the SELECT's result. The only exception are SELECT's of the form select ... from view where view has an ORDER BY attached to the outmost level. From that, it follows that an ORDER BY in views used in SELECTs more complex than the above is usually useless. Things are different for views that combine ORDER BY and LIMIT, of course. Then, not only the order of the view's results changes, but also it's output set. Which of course affects every statement which uses the view. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
On 16 October 2011 16:44, Jan Urbański wulc...@wulczer.org wrote: Hi, this idea has cropped up last PGCon - the ability to set GUC variables for the duration of a single query. It would work by setting the GUCs for the duration of the query and setting them back to what they were after it has terminated. By setting them back I mean respecting the previously set values, regardless of their source (set in run-time, per-role settings, postgresql.conf settings). An example of where this would be useful: an application maintains a persistent connection to the database and answers requests for data from a bunch of clients. Each connected client has a preferred timezone and would like to get results in that timezone. Currently the application has to either sprinkle each query with AT TIME ZONE or wrap the queries in BEGIN; SET LOCAL TIMEZONE ..; query; COMMIT. It gets more complex when things like pgbouncer come into play. Another example is a one-off query that should use a different statement_timeout than the server has configured or a REINDEX command that would like to use more maintenance_work_mem. It mostly falls into the realm of syntax sugar, but as more than one person felt it's a good idea, I thought I'd float it around here. I poked a little bit at the grammar to see where could it fit and didn't have much success of doing it without a new reserved keyword. Supposing the idea gets some traction, any suggestions for the syntax? What about SET LOCAL? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
On 16/10/11 17:49, Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: this idea has cropped up last PGCon - the ability to set GUC variables for the duration of a single query. It would work by setting the GUCs for the duration of the query and setting them back to what they were after it has terminated. Doesn't SET LOCAL cover this use-case pretty well already? It does to a certain degree. If you have a bunch of statements in a transaction and want to execute one of them with a different timezone setting, you have to do the SET/RESET dance. In theory you should also first grab the current value to set it back afterwards, in case someone else did SET LOCAL before you, but I'll admin that's far-fetched. The main use case would be apps running behing pgbouncer and using statement pooling, and plain convenience. I'd find it useful myself, but for now I'm making do with SET LOCAL and it works fine. I'm bringing it up because it appears in the TODO created at the PL Summit: * Further discussion of per-statement config parameters for things like timezone - Jan Urbanski Tryin' to do my bit and all ;) Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM
Florian Pflug f...@phlo.org writes: On Oct15, 2011, at 20:58 , Tom Lane wrote: So, at least as far as btrees are concerned, it seems like I implemented the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed down into the index AM. The above rules seem pretty btree-specific, so I don't think that we ought to have the main executor doing any of this. I envision doing this by marking btree as supporting ScalarArrayOpExpr scankeys directly, so that the executor does nothing special with them, and the planner treats them the same as regular scalar indexquals. Hm, would this make it possible to teach the array GIN ops to also handle ScalarArrayOpExpr? Hmm, maybe. In principle the index AM can always do this at least as efficiently as the executor can, and maybe there's actual wins to be had in GIST and GIN. So another route to getting rid of the executor-level support would be to implement ScalarArrayOpExpr in all the AMs. I'm not personally volunteering to do that though. I've recently had to put ARRAY[$1] @ $2 AND $1 = ANY($2) into an (inlineable) SQL function to make it use a (btree) index if $1 is a scalar-values field (and $1 constant array) and a GIN index if $2 is a GIN-indexed array-values field (and $2 a constant array). Which of course sucks from an efficiency POV. That doesn't seem like the same thing at all, because the indexed column is on different sides of the expression in the two cases. The situation that I'm worried about is indexedcolumn op ANY(arrayconstant), and what you're bringing up is constant op ANY(indexedarraycolumn). To fit the latter into the existing opclass infrastructure, we'd have to somehow teach the planner that constant op ANY(indexedarraycolumn) is interchangeable with indexedarraycolumn @ constant, for pairs of operators where that's indeed the case. Seems like that'd be a lot messier than the use-case warrants. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] (patch) regression diffs on collate.linux.utf8 test
On master, I see a minor test error (at least on my machine) as well as a diff. Patch attached. Regards, Jeff Davis *** a/src/test/regress/expected/collate.linux.utf8.out --- b/src/test/regress/expected/collate.linux.utf8.out *** *** 395,401 SELECT relname FROM pg_class WHERE relname ~* '^abc'; (0 rows) -- to_char ! SET lc_time TO 'tr_TR'; SELECT to_char(date '2010-04-01', 'DD TMMON '); to_char - --- 395,401 (0 rows) -- to_char ! SET lc_time TO 'tr_TR.UTF-8'; SELECT to_char(date '2010-04-01', 'DD TMMON '); to_char - *** *** 967,972 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype --- 967,973 ERROR: parameter lc_ctype must be specified CREATE COLLATION testx (locale = 'nonsense'); -- fail ERROR: could not create locale nonsense: No such file or directory + DETAIL: The operating system could not find any locale data for the locale name nonsense. CREATE COLLATION test4 FROM nonsense; ERROR: collation nonsense for encoding UTF8 does not exist CREATE COLLATION test5 FROM test0; *** a/src/test/regress/sql/collate.linux.utf8.sql --- b/src/test/regress/sql/collate.linux.utf8.sql *** *** 146,152 SELECT relname FROM pg_class WHERE relname ~* '^abc'; -- to_char ! SET lc_time TO 'tr_TR'; SELECT to_char(date '2010-04-01', 'DD TMMON '); SELECT to_char(date '2010-04-01', 'DD TMMON ' COLLATE tr_TR); --- 146,152 -- to_char ! SET lc_time TO 'tr_TR.UTF-8'; SELECT to_char(date '2010-04-01', 'DD TMMON '); SELECT to_char(date '2010-04-01', 'DD TMMON ' COLLATE tr_TR); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On Fri, Oct 14, 2011 at 4:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Right. I think this one falls into my class #2, ie, we have no idea how to implement it usefully. Doesn't (necessarily) mean that the core concept is without merit. Hm. given that we have an implementation I wouldn't say we have *no* clue. But there are certainly some parts we don't have consensus yet on. But then working code sometimes trumps a lack of absolute consensus. But just for the sake of argument I'm not sure that the implementation of dumping the current contents of the buffer cache is actually optimal. It doesn't handle resizing the buffer cache after a restart for example which I think would be a significant case. There could be other buffer cache algorithm parameters users might change -- though I don't think we really have any currently. If we had --to take it to an extreme-- a record of every buffer request prior to the shutdown then we could replay that log virtually with the new buffer cache size and know what buffers the new buffer cache size would have had in it. I'm not sure if there's any way to gather that data efficiently, and if we could if there's any way to bound the amount of data we would have to retain to anything less than nigh-infinite volumes, and if we could if there's any way to limit that has to be replayed on restart. But my point is that there may be other more general options than snapshotting the actual buffer cache of the system shutting down. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: We could hack around this by adding more columns to the result so that an index-only scan doesn't work. But I wonder whether it wouldn't be smarter to add ORDER BY clauses to the window function calls. I've been known to argue against adding just-in-case ORDER BYs to the regression tests in the past; but these cases bother me more because a plan change will not just rearrange the result rows but change their contents, making it really difficult to verify that nothing's seriously wrong. I'm not sure if it applies to this case but I recall I was recently running queries on Oracle that included window functions and it wouldn't even let me run them without ORDER BY clauses in the window definition. I don't know if it cleverly determines that the ORDER BY will change the results or if Oracle just requires ORDER BY on all window definitions or what. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM
On Oct16, 2011, at 19:09 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: On Oct15, 2011, at 20:58 , Tom Lane wrote: So, at least as far as btrees are concerned, it seems like I implemented the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed down into the index AM. The above rules seem pretty btree-specific, so I don't think that we ought to have the main executor doing any of this. I envision doing this by marking btree as supporting ScalarArrayOpExpr scankeys directly, so that the executor does nothing special with them, and the planner treats them the same as regular scalar indexquals. Hm, would this make it possible to teach the array GIN ops to also handle ScalarArrayOpExpr? Hmm, maybe. In principle the index AM can always do this at least as efficiently as the executor can, and maybe there's actual wins to be had in GIST and GIN. So another route to getting rid of the executor-level support would be to implement ScalarArrayOpExpr in all the AMs. I'm not personally volunteering to do that though. Hm, that sounds like we ought to leave the existing infrastructure in the main executor in place until we have GIN and GIST support. I've recently had to put ARRAY[$1] @ $2 AND $1 = ANY($2) into an (inlineable) SQL function to make it use a (btree) index if $1 is a scalar-values field (and $1 constant array) and a GIN index if $2 is a GIN-indexed array-values field (and $2 a constant array). Which of course sucks from an efficiency POV. That doesn't seem like the same thing at all, because the indexed column is on different sides of the expression in the two cases. The situation that I'm worried about is indexedcolumn op ANY(arrayconstant), and what you're bringing up is constant op ANY(indexedarraycolumn). Hm, true To fit the latter into the existing opclass infrastructure, we'd have to somehow teach the planner that constant op ANY(indexedarraycolumn) is interchangeable with indexedarraycolumn @ constant, for pairs of operators where that's indeed the case. Seems like that'd be a lot messier than the use-case warrants. That exactly was what convinced me previously that there's no easy way to do this. I had hoped that with your patch only the index AMs, instead of the planner, need to know about these equivalences. Couldn't we teach the main executor to push a ScalarArrayOpExpr down into the index AM if the operator belongs to the index's opclass, one side is indexed, and the other is constant? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Greg Stark st...@mit.edu writes: On Fri, Oct 14, 2011 at 4:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Right. I think this one falls into my class #2, ie, we have no idea how to implement it usefully. Doesn't (necessarily) mean that the core concept is without merit. Hm. given that we have an implementation I wouldn't say we have *no* clue. But there are certainly some parts we don't have consensus yet on. But then working code sometimes trumps a lack of absolute consensus. In this context working means shows a significant performance benefit, and IIRC we don't have a demonstration of that. Anyway this was all discussed back in May. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM
Florian Pflug f...@phlo.org writes: On Oct16, 2011, at 19:09 , Tom Lane wrote: That doesn't seem like the same thing at all, because the indexed column is on different sides of the expression in the two cases. The situation that I'm worried about is indexedcolumn op ANY(arrayconstant), and what you're bringing up is constant op ANY(indexedarraycolumn). Couldn't we teach the main executor to push a ScalarArrayOpExpr down into the index AM if the operator belongs to the index's opclass, one side is indexed, and the other is constant? Well, no, unless you're proposing to somehow implement the constant op ANY(indexedarraycolumn) case in all the AMs. I don't see any practical way to do it in btree, for one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
On Oct16, 2011, at 20:04 , Greg Stark wrote: On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: We could hack around this by adding more columns to the result so that an index-only scan doesn't work. But I wonder whether it wouldn't be smarter to add ORDER BY clauses to the window function calls. I've been known to argue against adding just-in-case ORDER BYs to the regression tests in the past; but these cases bother me more because a plan change will not just rearrange the result rows but change their contents, making it really difficult to verify that nothing's seriously wrong. I'm not sure if it applies to this case but I recall I was recently running queries on Oracle that included window functions and it wouldn't even let me run them without ORDER BY clauses in the window definition. I don't know if it cleverly determines that the ORDER BY will change the results or if Oracle just requires ORDER BY on all window definitions or what. I was about to point out that whether or not ORDER BY is required probably ought to depend on whether the window function acts on the frame, or the whole partition. And that oracle quite likely knows that for the individual window functions while we don't. But, actually, our documentation states in 3.5, Window Functions, that By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. Either I'm confused, or that doesn't match the observed regression test failure. I did a few experiments. Assume that table d contains three rows with v=1, v=2 and v=3. This case seems to work (the frame is always the whole partition, even though the frame_clause says between unbounded preceding and current row): postgres=# select array_agg(v) over (range between unbounded preceding and current row) from d; array_agg --- {1,2,3} {1,2,3} {1,2,3} Once one adds an ORDER BY, the frame_clause is taken into account, so that works too: postgres=# select array_agg(v) over (order by v range between unbounded preceding and current row) from d; array_agg --- {1} {1,2} {1,2,3} But some frame clauses (row 1 preceding, for example) have an effect despite there being no ORDER BY, like here: postgres=# select array_agg(v) over (rows 1 preceding) from d; array_agg --- {1} {1,2} {2,3} ISTM that we probably should ignore frame clauses, unless there's an ORDER BY. Or maybe even throw an error? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
Jan Urbański wulc...@wulczer.org writes: this idea has cropped up last PGCon - the ability to set GUC variables for the duration of a single query. It would work by setting the GUCs for the duration of the query and setting them back to what they were after it has terminated. By setting them back I mean respecting the previously set values, regardless of their source (set in run-time, per-role settings, postgresql.conf settings). +1 on the use case, allowing to do that in the statement itself would be a nice convenience. It mostly falls into the realm of syntax sugar, but as more than one person felt it's a good idea, I thought I'd float it around here. I poked a little bit at the grammar to see where could it fit and didn't have much success of doing it without a new reserved keyword. Supposing the idea gets some traction, any suggestions for the syntax? I think it would fit quite well within our extending of the WITH syntax. WITH work_mem = '1GB', timezone = 'Europe/Amsterdam', foo AS ( SELECT something ) SELECT toplevel FROM foo; Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (patch) regression diffs on collate.linux.utf8 test
Jeff Davis pg...@j-davis.com writes: On master, I see a minor test error (at least on my machine) as well as a diff. Patch attached. Hmm, yeah, I forgot to fix this regression test when I added that DETAIL line. However, I don't see the need for fooling with the lc_time value? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
Dimitri Fontaine dimi...@2ndquadrant.fr writes: I think it would fit quite well within our extending of the WITH syntax. WITH work_mem = '1GB', timezone = 'Europe/Amsterdam', foo AS ( SELECT something ) SELECT toplevel FROM foo; That looks pretty non-future-proof to me. WITH is a SQL-standard syntax, it's not an extension that we control. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
Tom Lane t...@sss.pgh.pa.us writes: That looks pretty non-future-proof to me. WITH is a SQL-standard syntax, it's not an extension that we control. Now that you mention it, the following might actually already work: WITH settings AS ( SELECT set_config('timezone', 'Europe/Amsterdam', t), set_config('work_mem', '1 GB', t) ), foo AS ( SELECT … ) INSERT INTO bar SELECT * FROM foo; So maybe what we need is to only change the is_local parameter to the function set_config() so that we can have the setting last for only the current statement? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Now that you mention it, the following might actually already work: WITH settings AS ( SELECT set_config('timezone', 'Europe/Amsterdam', t), set_config('work_mem', '1 GB', t) ), foo AS ( SELECT ⦠) INSERT INTO bar SELECT * FROM foo; Only for small values of work ... you won't be able to affect planner settings that way, nor can you assume that that WITH item is executed before all else. See recent thread pointing out that setting values mid-query is unsafe. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)
On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote: The first thing caught my eye in existing GiST code is idea of subtype_float. float8 has limited precision and can't respresent, for example, varlena values good enough. Even if we have large int8 value we can loose lower bits, but data distribution can be so that these bits are valuable. Wouldn't it better to have function like subtype_diff_float which returns difference between two values of subtype as an float? Using of such function could make penalty more sensible to even small difference between values, and accordingly more relevant. I started implementing subtype_diff, and I noticed that it requires defining an extra function for each range type. Previously, the numeric types could just use a cast, which was convenient for user-defined range types. If you have any other ideas to make that cleaner, please let me know. Otherwise I'll just finish implementing subtype_diff. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
Hitoshi Harada umi.tan...@gmail.com writes: 2011/10/15 Tom Lane t...@sss.pgh.pa.us: I can't recall whether there was some good reason for underspecifying these test queries. It looks like all the problematic ones were added in commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame options supported for window functions, which means it was either me or Hitoshi-san who wrote them that way, but memory is not serving this afternoon. I don't remember if I wrote that part or not, but I like to add explicit ORDER BY to the test cases. It doesn't appear that some reason stopped us to specify it. So +1 for adding the clauses. I looked at this more closely and realized that the reason for doing it like that was to test window frames defined using ROWS rather than RANGE. If we fully specify the window function's input ordering then there's no very interesting distinction between the two, because no rows will have any peers. So adding ORDER BY would in fact reduce the scope of the tests. At this point I'm inclined to leave it alone. Maybe we could think of some other test cases (perhaps using some other function than SUM) which would both exercise the difference between RANGE and ROWS mode, and not be sensitive to the detailed input ordering. But I doubt it's really worth the trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
Florian Pflug f...@phlo.org writes: But some frame clauses (row 1 preceding, for example) have an effect despite there being no ORDER BY, like here: Yeah, why did you expect differently? Without ORDER BY, all rows are peers in the frame ordering, so there's no way for a RANGE spec to select less than the whole partition. But with ROWS, you can select less than that. In general it's not that hard to create nondeterministic window-function queries, since the SQL standard doesn't require you to specify a unique ordering for the window function's input rows. Even in RANGE mode there are plenty of functions that are sensitive to the exact ordering, eg first_value/last_value. I guess the committee felt that locking this down would restrict the feature too much. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
On Oct17, 2011, at 00:14 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: But some frame clauses (row 1 preceding, for example) have an effect despite there being no ORDER BY, like here: Yeah, why did you expect differently? Without ORDER BY, all rows are peers in the frame ordering, so there's no way for a RANGE spec to select less than the whole partition. But with ROWS, you can select less than that. I was confused by When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. [3.5, Window Functions]. combined with the part I quoted before, which was By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. [9.19, Window Functions, Last Paragraph] But, reading those parts again, I realize the it says When ORDER BY is omitted the *default* frame consists ... , and that the second quote is followed by a footnote which says There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details. [3.5, Window Functions] So it was just me being thick. Sorry for the noise. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
Florian Pflug f...@phlo.org writes: ... reading those parts again, I realize the it says When ORDER BY is omitted the *default* frame consists ... , and that the second quote is followed by a footnote which says There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details. [3.5, Window Functions] So it was just me being thick. Sorry for the noise. Hmm. Maybe the use of a footnote there is too subtle, and we should instead have that text in-line (probably in parentheses)? Or we could use a note, but that's probably too much emphasis. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
2011/10/17 Tom Lane t...@sss.pgh.pa.us: Hitoshi Harada umi.tan...@gmail.com writes: 2011/10/15 Tom Lane t...@sss.pgh.pa.us: I can't recall whether there was some good reason for underspecifying these test queries. It looks like all the problematic ones were added in commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame options supported for window functions, which means it was either me or Hitoshi-san who wrote them that way, but memory is not serving this afternoon. I don't remember if I wrote that part or not, but I like to add explicit ORDER BY to the test cases. It doesn't appear that some reason stopped us to specify it. So +1 for adding the clauses. I looked at this more closely and realized that the reason for doing it like that was to test window frames defined using ROWS rather than RANGE. If we fully specify the window function's input ordering then there's no very interesting distinction between the two, because no rows will have any peers. So adding ORDER BY would in fact reduce the scope of the tests. At this point I'm inclined to leave it alone. Maybe we could think of some other test cases (perhaps using some other function than SUM) which would both exercise the difference between RANGE and ROWS mode, and not be sensitive to the detailed input ordering. But I doubt it's really worth the trouble. Ah, you mentioned about ORDER BY in window specification (OVER clause). I thought it was query's ORDER BY. Yes, it affects in RANGE case, and we don't have rich frame support of RANGE (like n PRECEDING ...) so the case ORDER BY affects result is limited. Agree with leaving it alone. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding CORRESPONDING to Set Operations
CORRESPONDING clause take 2 After realizing that modifying prepunion.c to include a custom subquery is not easy(incomprehensible to me) as it sounds and turning into a hassle after making several uninformed changes, I decided to go with modifying analyze.c. The incomprehensible part is constructing a custom subquery as a SubqueryScan. Anyway I managed to implement the clause as a Subquery in analyze.c. In the method transformSetOperationTree, if the node is a setoperation and contains a corresponding clause, i.e. CORRESPONDING, or CORRESPONDING BY(columns...), we determine the common column names. Column ordering in select statements are not important to the CORRESPONDING. With the common column names in hand, we create a RangeSubselect node accordingly and replace the original statement op-larg with the new RangeSubselect. RangeSubselect in turn has the original op-larg as a from clause. We do the same to op-rarg too. There were no changes done in prepunion.c There are documentation changes and one regression test in the patch. Best Regards, Kerem KAT *** a/doc/src/sgml/queries.sgml --- b/doc/src/sgml/queries.sgml *** *** 1225,1230 --- 1225,1233 primaryEXCEPT/primary /indexterm indexterm zone=queries-union +primaryCORRESPONDING/primary + /indexterm + indexterm zone=queries-union primaryset union/primary /indexterm indexterm zone=queries-union *** *** 1241,1249 The results of two queries can be combined using the set operations union, intersection, and difference. The syntax is synopsis ! replaceablequery1/replaceable UNION optionalALL/optional replaceablequery2/replaceable ! replaceablequery1/replaceable INTERSECT optionalALL/optional replaceablequery2/replaceable ! replaceablequery1/replaceable EXCEPT optionalALL/optional replaceablequery2/replaceable /synopsis replaceablequery1/replaceable and replaceablequery2/replaceable are queries that can use any of --- 1244,1252 The results of two queries can be combined using the set operations union, intersection, and difference. The syntax is synopsis ! replaceablequery1/replaceable UNION optionalALL/optional optionalCORRESPONDING optionalBY (replaceableselect_list/replaceable)/optional/optional replaceablequery2/replaceable ! replaceablequery1/replaceable INTERSECT optionalALL/optional optionalCORRESPONDING optionalBY (replaceableselect_list/replaceable)/optional/optional replaceablequery2/replaceable ! replaceablequery1/replaceable EXCEPT optionalALL/optional optionalCORRESPONDING optionalBY (replaceableselect_list/replaceable)/optional/optional replaceablequery2/replaceable /synopsis replaceablequery1/replaceable and replaceablequery2/replaceable are queries that can use any of *** *** 1283,1288 --- 1286,1299 /para para + literalCORRESPONDING/ returns all columns that are in both replaceablequery1/ and replaceablequery2/ with the same name. + /para + + para + literalCORRESPONDING BY/ returns all columns in the column list that are also in both replaceablequery1/ and replaceablequery2/ with the same name. + /para + + para In order to calculate the union, intersection, or difference of two queries, the two queries must be quoteunion compatible/quote, which means that they return the same number of columns and *** a/doc/src/sgml/sql.sgml --- b/doc/src/sgml/sql.sgml *** *** 859,865 [ WHERE replaceable class=PARAMETERcondition/replaceable ] [ GROUP BY replaceable class=PARAMETERexpression/replaceable [, ...] ] [ HAVING replaceable class=PARAMETERcondition/replaceable [, ...] ] ! [ { UNION | INTERSECT | EXCEPT } [ ALL ] replaceable class=PARAMETERselect/replaceable ] [ ORDER BY replaceable class=parameterexpression/replaceable [ ASC | DESC | USING replaceable class=parameteroperator/replaceable ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { replaceable class=PARAMETERcount/replaceable | ALL } ] [ OFFSET replaceable class=PARAMETERstart/replaceable ] --- 859,865 [ WHERE replaceable class=PARAMETERcondition/replaceable ] [ GROUP BY replaceable class=PARAMETERexpression/replaceable [, ...] ] [ HAVING replaceable class=PARAMETERcondition/replaceable [, ...] ] ! [ { UNION | INTERSECT | EXCEPT } [ ALL ] [ CORRESPONDING [ BY ( replaceable class=PARAMETERexpression/replaceable ) ] ] replaceable class=PARAMETERselect/replaceable ] [ ORDER BY replaceable class=parameterexpression/replaceable [ ASC | DESC | USING replaceable class=parameteroperator/replaceable ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { replaceable class=PARAMETERcount/replaceable | ALL } ] [ OFFSET replaceable class=PARAMETERstart/replaceable ] *** a/src/backend/nodes/copyfuncs.c --- b/src/backend/nodes/copyfuncs.c *** *** 2507,2512 --- 2507,2513
Re: [HACKERS] Underspecified window queries in regression tests
2011/10/17 Greg Stark st...@mit.edu: On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: We could hack around this by adding more columns to the result so that an index-only scan doesn't work. But I wonder whether it wouldn't be smarter to add ORDER BY clauses to the window function calls. I've been known to argue against adding just-in-case ORDER BYs to the regression tests in the past; but these cases bother me more because a plan change will not just rearrange the result rows but change their contents, making it really difficult to verify that nothing's seriously wrong. I'm not sure if it applies to this case but I recall I was recently running queries on Oracle that included window functions and it wouldn't even let me run them without ORDER BY clauses in the window definition. I don't know if it cleverly determines that the ORDER BY will change the results or if Oracle just requires ORDER BY on all window definitions or what. AFAIK, the current standard doesn't tell clearly if all/some window functions require ORDER BY clause in window specifications. Some window functions like rank and row_number is meaningless if it is omitted, so some implementation doesn't allow it omitted. And I believe Oracle implemented it before the standard, so that'd be why details are different from spec. We designed it per spec and omitting the clause doesn't violate any part of the standard. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
On Sun, Oct 16, 2011 at 4:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Dimitri Fontaine dimi...@2ndquadrant.fr writes: Now that you mention it, the following might actually already work: WITH settings AS ( SELECT set_config('timezone', 'Europe/Amsterdam', t), set_config('work_mem', '1 GB', t) ), foo AS ( SELECT … ) INSERT INTO bar SELECT * FROM foo; Only for small values of work ... you won't be able to affect planner settings that way, nor can you assume that that WITH item is executed before all else. See recent thread pointing out that setting values mid-query is unsafe. I previously floated the idea of using a new keyword, possibly LET, for this, like this: LET var = value [, ...] IN query I'm not sure if anyone bought it, but I'll run it up the flagpole again and see if anyone salutes. I tend to agree with the idea that SET LOCAL isn't always what you want; per-transaction is not the same as per-query, and multi-command query strings have funny semantics, and multiple server round-trips are frequently undesirable; and it just seems cleaner, at least IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
Robert Haas robertmh...@gmail.com writes: I previously floated the idea of using a new keyword, possibly LET, for this, like this: LET var = value [, ...] IN query I'm not sure if anyone bought it, but I'll run it up the flagpole again and see if anyone salutes. I tend to agree with the idea that SET LOCAL isn't always what you want; per-transaction is not the same as per-query, and multi-command query strings have funny semantics, and multiple server round-trips are frequently undesirable; and it just seems cleaner, at least IMHO. Well, syntax aside, the real issue here is that GUC doesn't have any notion of a statement-lifespan setting, and adding one would require adding per-statement overhead; not to mention possibly adding considerable logical complexity, depending on exactly what you wanted to define as a statement. I don't think an adequate case has been made that SET LOCAL is insufficient. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
On 10/16/2011 08:59 PM, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: I previously floated the idea of using a new keyword, possibly LET, for this, like this: LET var = value [, ...] IN query I'm not sure if anyone bought it, but I'll run it up the flagpole again and see if anyone salutes. I tend to agree with the idea that SET LOCAL isn't always what you want; per-transaction is not the same as per-query, and multi-command query strings have funny semantics, and multiple server round-trips are frequently undesirable; and it just seems cleaner, at least IMHO. Well, syntax aside, the real issue here is that GUC doesn't have any notion of a statement-lifespan setting, and adding one would require adding per-statement overhead; not to mention possibly adding considerable logical complexity, depending on exactly what you wanted to define as a statement. I don't think an adequate case has been made that SET LOCAL is insufficient. I agree. But if we are going to go there I vastly prefer Robert's suggestion of a separate syntactical structure. Mixing this up with WITH would just be an awful mess, and cause endless confusion. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: set GUC variables for single query
On Sun, Oct 16, 2011 at 8:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I previously floated the idea of using a new keyword, possibly LET, for this, like this: LET var = value [, ...] IN query I'm not sure if anyone bought it, but I'll run it up the flagpole again and see if anyone salutes. I tend to agree with the idea that SET LOCAL isn't always what you want; per-transaction is not the same as per-query, and multi-command query strings have funny semantics, and multiple server round-trips are frequently undesirable; and it just seems cleaner, at least IMHO. Well, syntax aside, the real issue here is that GUC doesn't have any notion of a statement-lifespan setting, and adding one would require adding per-statement overhead; not to mention possibly adding considerable logical complexity, depending on exactly what you wanted to define as a statement. I don't think an adequate case has been made that SET LOCAL is insufficient. Would it require adding per-statement overhead in every case, or just when the feature gets used? I suspect the latter, which is no different from anything else we have. We do already have at least one other case that seems similar to me: you can apply a setting using ALTER FUNCTION .. SET; the new value is applied when you enter the function and restored on exit. I'd imagine that this would have similar semantics. In terms of what qualifies as a statement, I would rather imagine that it would only be worthwhile to apply this to queries rather than fooling around with utility statements. I mean, it would be nice if it Just Worked Anywhere, but that's likely to be a lot more work (and grammar conflicts) than we want to deal with. Anyway, the judgement of whether or not SET LOCAL is sufficient is in the end a value judgement, and I'm not going to pretend that my opinion is superior to all others. My personal experience, however, is that I've never used or wanted SET LOCAL, but I've wanted a single-statement equivalent a few times. So for me personally, having this in lieu of SET LOCAL would be an improvement from a usability perspective. YMMV, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_comments (was: Allow \dd to show constraint comments)
On Fri, Oct 14, 2011 at 11:12 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Oct 12, 2011 at 10:20 PM, Josh Kupershmidt schmi...@gmail.com wrote: On the third hand, Josh's previous batch of changes to clean up psql's behavior in this area are clearly a huge improvement: you can now display the comment for nearly anything by running the appropriate \dfoo command for whatever the object type is. So ... is this still a good idea, or should we just forget about it? I think this question is a part of a broader concern, namely do we want to create and support system views for easier access to information which is already available in different ways through psql commands, or by manually digging around in the catalogs? I believe there are at least several examples of existing views we maintain which are very similar to pg_comments: pg_seclabel seems quite similar, for instance. That's one's a direct analogue, but I don't want to overbroaden the issue. I guess it just seems to me that if no one's going to champion adding this, maybe we shouldn't. Hearing no cries of oh, yes, please, I'm marking this Returned with Feedback for now. We can always revisit it if we hear that more people want it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers