[SQL] Utility of recursive queries?
Would recursive queries be the trick to doing things like unwinding a linked-list to either the head or tail, with: create table list ( id int primary key, parent int references list(id) ); insert into list values (1, null); -- head of chain in list insert into list values (2, 1); -- 1st child insert into list values (3, 2); -- second child Given a reference to id=3, would a recursive query be the trick to unrolling the list to discover id=1 as the head using a SQL one-liner? Is discovery possible in straight SQL w/o resorting to stored procedures (or modifying the table schema to directly point)? And, finally, would any potential recursive query implementation be noticably more efficient that a straightforward implementation in plpgsql, such as: create or replace function find_head(int) returns int as ' DECLARE cur_par INT; prev_par INT; BEGIN prev_par := $1; cur_par := parent from list where id = $1; WHILE cur_par is not null LOOP prev_par := cur_par; cur_par := parent from list where id = prev_par; END LOOP; return prev_par; END; ' language 'plpgsql'; James Robinson Socialserve.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] Prepared Statements and large where-id-in constant blocks?
Howdy: Java middlewares like JBossCMP issue many queries in the general form of: SELECT t1.att1, t1.att2 ... t1.attN FROM t1 WHERE (t1.id = X) or (t1.id = Y) where there may be anywhere between 1 and thousands of "(id = N)" blocks ORed together. These may be transformed to the "WHERE t1.id IN (X, Y, ...)" form for possibly a little performance gain (possibly -- I've not yet checked to see if this plans better than the other, but I could imagine this form being parsed into the hashjoin form as opposed to a huge index filter form). Real performance gains, however, could be attained through being able to ultimately use either the v2 PREPARE / EXECUTE statements or the better v3 protocol's bind / execute commands, but only if the SQL-level form of the query could better represent the fact there are not really N params, but, rather, one single param of type Set (or, heck, Array?). This would let all such queries map onto one single backend prepared statement, regardless of the size of the id set being passed in. I guess that separate preparation for each different cardinality would be okay performance-wise, but if there were some way to get all such queries factored-down into one single planned statement, it could: 1) Make far better use of JBoss's connection-wide LRU cache of PreparedStatements, since only one entry (with much higher cache hit rate) could exist for the entire family of queries. 2) Make better use of backend memory, since it only needed to prepare one such (generic) form, as opposed to one for each cardinality. Problems in implementation: 1) JBoss query engine would need to be educated about the potential to use this form as opposed to the "OR (t1.id=X)" form. Likewise, JBoss could / should well be educated about being able to use the "WHERE t1.id IN (X, Y, ...)" form for databases which support "WHERE .. IN ( .. )", probably an easier sell since this is most likely supported by more DBs than just PG. 2) Does the JDBC spec allow any hooks for passing in a set of ids as one single param? We'd need the SQL-template to be prepared to look something like: SELECT t1.attr1 FROM t1 where t1.id in ( ? ) From memory, perhaps setArray() might could be hacked for the job. I know JBossCMP uses the setObject() call, so perhaps JDBC could be tricked out to handle a java.util.Collection, an arguably cleaner way to do it -- no backward compat issues since could be all-new functionality. JDBC driver could just iterate through the collection contents, calling setObject accordingly. Harder part would be educating JBoss to do this. Hardest part would be convincing someone to commit it into JBoss. 3) Can a cardinality-free plan even be made? I bet I'm assuming a little too much in asserting all such plans are equal, but I suspect that Tom is going to tell me that the query for just one id value would and should be planned differently from the 800-value form, since the 800-value form might well prefer a full sequential scan, since the table might only have 900 live rows in it. Anyone have any insight or opinions? [ crossposted to pgsql-sql for anyone's insight into the pure SQL / planning matters. Apologies in advance ]. James Robinson Socialserve.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] [JDBC] Prepared Statements and large where-id-in constant blocks?
On Apr 19, 2004, at 10:57 PM, Oliver Jowett wrote: Unfortunately a bit of experimentation indicates that the planner doesn't do anything clever with ANY + constant array values (at least in 7.4.1 which is what I have to hand): Not only that, but it seems to get planned only as an index scan. Preparing the statement using "SELECT ... WHERE id = ANY (?)" plus a call to a really-hacked up version of setObject() would solve the issue of getting better use out of fewer cached prepared statements, but the only-sequential scan planning would be a downer. And while "OR (id=N)" plans exactly like "id IN (N)", there seems to be nothing really worth doing. I examined plans comparing a 6-way join used in our production code with 4 ids in the tail "OR (id=N)" nodes, then with the full 723 ids, and the plans were markedly different, preferring sequential scans for many of the intermediate table joins in the 723-id case.The runtimes for the sequential scans were faster than forcing index scans, so the planner's voodoo definitely benefits from full knowledge of how many rows should be expected (which appears crystal clear in hindsight). So, I doubt that any single server-side preparation using a single parameter representing an entire collection of ids could perform as well as it does currently with full information. Oliver, I tested your proposal of providing more-id-params-than-necessary, passing in a dummy value (-1) which will never be found as a pk in that table, and the planner handled it efficiently. The repeated instances of "or u.id=-1::int8" were, when not pre-planned using PREPARE, were nicely crunched down to a single index condition clause of " OR (id= -1::BIGINT)". But, when transforming this to PREPARE and EXECUTE pairs, the planner cannot crunch the plan down, since it has no idea that, say, 500 of the 700 params will all be the same, so it cannot factor them out at planning time (hence, I guess the warning about constant values in the notes section of the manual page for PREPARE). All roads seem to lead to don't attempt to change a thing -- there is no easy or medium difficulty way to better solve this. In writing this, I went so far as to think about shunting the list of ids into a temporary table to join off of. Doing this at the SQL-level would be far too costly in round-trip times, but could the "WHERE id in (A, B, C, ...)" form somehow be transformed into a hashjoin operation on the backend when the size of the static set is 'high'? Could this not perform (theoretically) better than what appears to be an O(N) index condition evaluation? I am asking only for personal edification -- I have no sample live query where the index condition solution performs too slowly. In Java-land, if we suppose that the size of the set could potentially be 'large', we quickly defer to containing the values in a HashSet if we're going to test for membership as opposed to performing selection searches on a list. Probably a dead-horse beaten elsewhere. Many Thanks. James Robinson Socialserve.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Design question: Scalability and tens of thousands of tables?
On Nov 4, 2005, at 9:47 AM, [EMAIL PROTECTED] wrote: The problem is I am very concerned about scalability with having a different table created for each custom object. I want to design to site to handle tens of thousands of users. If each user has 3-5 custom objects the database would have to handle tens of thousands of tables. So it would appear that this is a broken solution and will not scale. Has anyone designed a similar system or have ideas to share? Well, before you discount it, did you try out your design? You could do initial segregation of user's tables into separate schemas (say, schema 'a' -> 'z' according to username or some better hashing routine like brute-force round-robin assignment at user creation time). Assignment of objects -> schema would be one additional column in your centralized user directory table(s). James Robinson Socialserve.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[SQL] 8.4 versus 8.2 against nonexistent column "name" ...
Can anyone shine a light on why 8.4 behaves surprisingly when being queried for a bogus column "name", but only using qualified table/ column references. Here's a sample script: - create table foo ( id int, x int ); insert into foo(id, x) values (1, 23), (2, 43), (4, 45); -- Fails on both -- no column named 'name' select name from foo; -- Fails on 8.2, still no 'name' column, but 8.4 succeeds returning whole rows. select f.name from foo f; On 8.2.11, both selects fail: CREATE TABLE INSERT 0 3 ERROR: column "name" does not exist LINE 1: select name from foo; ^ ERROR: column f.name does not exist LINE 1: select f.name from foo f; ^ On 8.4.2, the first select fails, but the second succeeds, returning whole rows wrapped up like tuples: CREATE TABLE INSERT 0 3 ERROR: column "name" does not exist LINE 1: select name from foo; ^ name (1,23) (2,43) (4,45) (3 rows) A quick skim through the 8.3. and 8.4. release notes found nothing interesting related to 'name'. Thanks! James Robinson Socialserve.com -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] 9.0+ way of determining if a LISTEN channel has a backend listening?
Is there a way in post-9.0 postgres for one session to determine if a LISTEN channel currently has a backend registered to receive notifications? Prior to 9.0, one could select from system catalog table pg_listener. Function pg_listening_channels() returns list of symbol names *this* session is listening on; I'd like to find out if a given listen channel has *any* (i.e other session) listeners. Thanks! James Robinson Socialserve.com -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] SELECT FOR UPDATE/SHARE cannot be applied to the nullable side of an outer join
Given something like: create table foo (id int primary key not null); create table bar (id int primary key not null, a_id int references foo(id)); select a.id, b.id from foo a left outer join bar b on (b.a_id = a.id) for update; PG 8.1.4 balks, saying: SELECT FOR UPDATE/SHARE cannot be applied to the nullable side of an outer join Is this behavior spec-mandated, or could only the rows in B which are not-null be locked? James Robinson Socialserve.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] SELECT FOR UPDATE/SHARE cannot be applied to the nullable side of an outer join
Oh that's sweet and all I needed anyway thanks. On Oct 6, 2006, at 12:25 PM, Tom Lane wrote: Note that you can select "FOR UPDATE OF a" in this situation, it's just the B side that is problematic. James Robinson Socialserve.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[SQL] Subselects in CHECK clause ...
I see that subselects are not directly supported in check clauses, but one can work around that by writing a stored function which returns boolean and performs the subselect. Are there any known gotchas with doing this? The CREATE TABLE docs regarding CHECK clauses states: "Currently, CHECK expressions cannot contain subqueries nor refer to variables other than columns of the current row." Is this due to someone not yet writing the necessary code to remove the subquery clause, or is there a more fundamental reason? ---- James Robinson Socialserve.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Subselects in CHECK clause ...
Gotcha. This is indeed to ensure a specialized ref integrity bit, against a column which "ought not ever change". Once some more old code goes away, then can fully normalize this area, making this check subselect bit go away, replaced by a regular FK. Hmm -- probably could even now make it a FK against a column which is not the pri key of the foreign table, and it'd work better and would ensure consistence against changes on the foreign table side, eh? Thanks! On Nov 29, 2006, at 12:53 PM, Stephan Szabo wrote: To completely get the constraint, you have to also apply constraints on the tables referenced in the function that prevent modifications on those tables from causing the constraint to be violated. For example, if you were to do an exists test on another table for a row that matches up with this row in some fashion (for a specialized referential integrity constraint) modifications on that other table could also cause the constraint to be violated, but that isn't caught by the CHECK function(...) case and you'll probably need triggers or other constraints on that table. James Robinson Socialserve.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[SQL] Index to enforce non-overlapping ranges?
Academic question here: Given a table with a pair of any sort of line-segment-esqe range delimiter columns, is it possible to build a unique index to enforce non-overlapping ranges? Such as: create table test ( id int not null primary key, low_value int not null, high_value int not null ); Can one build an index to enforce a rule such that no (low_value, high_value) range is identical or overlaps with another (low_value, high_value) range described by the table? And, more interestingly, what about for ranges of dates / timestamps as opposed to simple integers? I can see how a trigger on insert or update could enforce such a constraint [ probe the table for an existing overlapping row, and raise exception one exists ], but can such an activity be performed with fewer lines using some sort of r-tree index? James Robinson Socialserve.com -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
