Re: CYCLE and SEARCH [was Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch]
David Fetter [EMAIL PROTECTED] writes: How hard would it be to add the infrastructure for CYCLE? Personally I'm more interested in getting the recursive UNION (without ALL) case to work. I think that this might just be a matter of drawing on support that's already there: have RecursiveUnion maintain a hash table of rows it's already seen, and discard anything coming from either the non-recursive term or the recursive term that is found to be already present in the hash table. Two objections to this are (a) it doesn't work for datatypes without hash equality support, and (b) it would fail for recursion output exceeding available memory. However, the alternative of using sort-and-uniq to detect duplicates seems pretty horrid in this situation --- you'd need a fresh sort and mergejoin-like scan for every iteration of the recursive term. And it would become impractical to return rows immediately after they're emitted by the subqueries. So I'd be willing to live with only supporting the hash implementation. If no objections, I'll look at that next week ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
CYCLE and SEARCH [was Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch]
On Fri, Oct 03, 2008 at 03:55:56PM -0400, Tom Lane wrote: I've successfully taught the WITH patch to do single evaluation of WITH queries. I've also been through all of the planner and executor code for it and am now feeling pretty happy with the whole thing. There are still a small number of loose ends (see XXX in the patch), but I don't believe any of them represent significant work --- I just left them undone because they weren't in the way of testing anything else. Great! How hard would it be to add the infrastructure for CYCLE? Here's the kind of awful hack I've been doing lately in order to detect and prevent cycles: CREATE TABLE adjacency( id INTEGER NOT NULL, parent_id INTEGER ); INSERT INTO adjacency VALUES(1,NULL), (2,1),(3,1),(4,1), (5,2),(6,2),(7,2),(8,3),(9,3),(10,4), (11,5),(12,5),(13,6),(14,7),(15,8), (9,1); /* Cycle! */ WITH RECURSIVE t(node, path) AS ( SELECT id, ARRAY[id] FROM adjacency WHERE parent_id IS NULL UNION ALL SELECT a1.id, t.path || a1.id FROM adjacency a1 JOIN t ON (a1.parent_id = t.node) WHERE a1.id ANY(t.path) /* Remove cycle using awful hack :P */ ) SELECT CASE WHEN array_upper(path,1)1 THEN '+-' ELSE '' END || REPEAT('--', array_upper(path,1)-2) || node AS Branch FROM t ORDER BY path; I suspect that some kind of hash structure, instantiated only when a CYCLE clause is specified, could help out with a much, much more efficient implementation of cycle prevention. Adding SEARCH will be a lot more complicated, as DEPTH FIRST is completely different from how the implementation works now. Any ideas on how this might be approached? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On 2 Oct 2008, at 05:44 AM, Tom Lane [EMAIL PROTECTED] wrote: Hitoshi Harada [EMAIL PROTECTED] writes: Hmm, I've looked over the patch. Logically window functions can access arbitrary rows that have been stored in a frame. Thus I had thought tuplestore should hold all the positions and allow arbitrary random access indicated by integer. Maybe those functionalities can be abstracted by the window function API itself. For this matter it seems that you'd better to look at my future patch. Well, the problem with defining it as arbitrary random access is that there's no way for the tuplestore to throw away old data. And that there's no way to make it work if the tuplestore has spilled to disk. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Hitoshi Harada [EMAIL PROTECTED] writes: It seems to me to share some ideas with the MemoryContext concept: what about a TupstoreContext associated with tuplestore, you get a common default one if you don't register your own, and use tuplestore_gettuple(MyTupstoreContext, ...); I'm just working on tuplestore recording multiple positions for my window function project. Attached patch is still in progress but seems it works in a situation. From my work, the setting/getting read position and delegate savig positions to the caller will probably have problems, because of memory control for saving positions and tuplestore status changing (memory - BufFile). Instead, I decided it'd better that we can indicate the row number by integer. That seems unreasonably inefficient --- it takes lots more file I/O, and there's really no need for random access, at least not in what I'm doing. I looked at the tuplestore code some more, and my idea of allowing callers to store their current position externally to tuplestore indeed doesn't work. The problem is that if dumptuples() happens there's no way to convert an index-based position to a file-based one. The current code can handle it because it has access to all the positions (the read, write, and mark pointers) but any positions stored externally wouldn't get converted. So it seems like the appropriate generalization is to have an array of read positions inside the tuplestore and allow callers to say read using position N, plus some API to allow positions to be allocated to different requestors. We could get rid of the separate mark pointer by implementing an API that allows position X to be copied to position Y. But the actual value of a position (a tuple number or file position info) would never be exposed to callers. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Greg Stark [EMAIL PROTECTED] writes: On Wed, Oct 1, 2008 at 2:54 PM, Tom Lane [EMAIL PROTECTED] wrote: So it seems like the appropriate generalization is to have an array of read positions inside the tuplestore and allow callers to say read using position N, plus some API to allow positions to be allocated to different requestors. One other reason the tuplestore should know the position of all the readers is that ideally it would want to be able to discard any tuples older than the oldest read position. That also means it needs to know when all the call sites have allocated their position and don't need to reset it. Good point. So we'd need per-position capability flags, not per-tuplestore. I hadn't realized that this would be relevant to window functions. Now that I know that, I propose fixing tuplestore for multiple positions and committing it separately, before I go back to the CTE patch. Then Hitoshi-san will have something he can work with too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Wed, Oct 1, 2008 at 2:54 PM, Tom Lane [EMAIL PROTECTED] wrote: So it seems like the appropriate generalization is to have an array of read positions inside the tuplestore and allow callers to say read using position N, plus some API to allow positions to be allocated to different requestors. We could get rid of the separate mark pointer by implementing an API that allows position X to be copied to position Y. But the actual value of a position (a tuple number or file position info) would never be exposed to callers. That's basicaly what had done (though i had n readers which encapsulated the current pointer and mark). One other reason the tuplestore should know the position of all the readers is that ideally it would want to be able to discard any tuples older than the oldest read position. That also means it needs to know when all the call sites have allocated their position and don't need to reset it. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
One other reason the tuplestore should know the position of all the readers is that ideally it would want to be able to discard any tuples older than the oldest read position. That also means it needs to know when all the call sites have allocated their position and don't need to reset it. Good point. So we'd need per-position capability flags, not per-tuplestore. I hadn't realized that this would be relevant to window functions. Now that I know that, I propose fixing tuplestore for multiple positions and committing it separately, before I go back to the CTE patch. Then Hitoshi-san will have something he can work with too. Yes, tuplestore multiple positioning will give the greate help to the window function. Ideally, it is better that tuplestore'd have all the positions and have some kind of capability to discard old rows so that it can stay in TSS_MEM, which helps window function's sliding frame. But it isn't really critical, just performance matter. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Hitoshi Harada [EMAIL PROTECTED] writes: I hadn't realized that this would be relevant to window functions. Now that I know that, I propose fixing tuplestore for multiple positions and committing it separately, before I go back to the CTE patch. Then Hitoshi-san will have something he can work with too. Yes, tuplestore multiple positioning will give the greate help to the window function. Ideally, it is better that tuplestore'd have all the positions and have some kind of capability to discard old rows so that it can stay in TSS_MEM, which helps window function's sliding frame. Okay, there's a patch in CVS HEAD that works this way. Let me know if it needs further tweaking for your purposes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
2008/10/2 Tom Lane [EMAIL PROTECTED]: Hitoshi Harada [EMAIL PROTECTED] writes: I hadn't realized that this would be relevant to window functions. Now that I know that, I propose fixing tuplestore for multiple positions and committing it separately, before I go back to the CTE patch. Then Hitoshi-san will have something he can work with too. Yes, tuplestore multiple positioning will give the greate help to the window function. Ideally, it is better that tuplestore'd have all the positions and have some kind of capability to discard old rows so that it can stay in TSS_MEM, which helps window function's sliding frame. Okay, there's a patch in CVS HEAD that works this way. Let me know if it needs further tweaking for your purposes. regards, tom lane Hmm, I've looked over the patch. Logically window functions can access arbitrary rows that have been stored in a frame. Thus I had thought tuplestore should hold all the positions and allow arbitrary random access indicated by integer. Maybe those functionalities can be abstracted by the window function API itself. For this matter it seems that you'd better to look at my future patch. Everything else is great deal. By improving tuplestore_trim(), sliding frame will be performed better than I'd thought. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Hitoshi Harada [EMAIL PROTECTED] writes: 2008/10/2 Tom Lane [EMAIL PROTECTED]: Okay, there's a patch in CVS HEAD that works this way. Let me know if it needs further tweaking for your purposes. Hmm, I've looked over the patch. Logically window functions can access arbitrary rows that have been stored in a frame. Thus I had thought tuplestore should hold all the positions and allow arbitrary random access indicated by integer. Maybe those functionalities can be abstracted by the window function API itself. For this matter it seems that you'd better to look at my future patch. Well, the problem with defining it as arbitrary random access is that there's no way for the tuplestore to throw away old data. The scheme that I have in mind here is that you keep (at least) two read pointers, one that is where you're actually reading the data and one that is nailing down the oldest point you might need to return to. This is a generalization of the previous mark/restore logic to allow any number of pairs of mark and restore points. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Greg Stark [EMAIL PROTECTED] writes: On 24 Sep 2008, at 02:45, Tom Lane [EMAIL PROTECTED] wrote: The next big thing seems to be to figure out exactly how to do multiple references to CTE outputs, so that we can de-bogotify the planner. I've looked and don't seem to still have the source tree where I worked on this. I remember how I made the changes to tuplestore which was mostly mechanical. The trick I think will be in adding a special purpose executor method which passes the call site to the node below. This depends on the observation that if we always memoize results then each call site can only have one active call. That is, we don't need to maintain a full stack tree. I looked at the tuplestore code a bit and decided that this actually doesn't need to be hard at all. Tuplestore already has a notion of a write position and an independent read position, and we don't need more than one write position. So what I think we should do is add get read position and set read position functions to tuplestore.c, and have each of the reader nodes remember its own read position. That is, each reader has to do set_read_position(tupstore, local_read_position); tuple = tuplestore_gettuple(tupstore, ...); get_read_position(tupstore, local_read_position); rather than just tuplestore_gettuple. The set/get functions will be cheap enough that this is no big deal. (Or maybe we should just provide a wrapper function that does this sequence?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Le 30 sept. 08 à 20:03, Tom Lane a écrit : set_read_position(tupstore, local_read_position); tuple = tuplestore_gettuple(tupstore, ...); get_read_position(tupstore, local_read_position); rather than just tuplestore_gettuple. The set/get functions will be cheap enough that this is no big deal. (Or maybe we should just provide a wrapper function that does this sequence?) It seems to me to share some ideas with the MemoryContext concept: what about a TupstoreContext associated with tuplestore, you get a common default one if you don't register your own, and use tuplestore_gettuple(MyTupstoreContext, ...); Maybe some other API would benefit from the idea? Regards, - -- dim -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkjigF4ACgkQlBXRlnbh1bkycQCgqs/+JBOd0SiN4xvKwLgEgi9F BOYAoLm0Se6zs8cEAnoTlH6de7pLLh/l =kzm1 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Hi, 2008/10/1 Dimitri Fontaine [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Le 30 sept. 08 à 20:03, Tom Lane a écrit : set_read_position(tupstore, local_read_position); tuple = tuplestore_gettuple(tupstore, ...); get_read_position(tupstore, local_read_position); rather than just tuplestore_gettuple. The set/get functions will be cheap enough that this is no big deal. (Or maybe we should just provide a wrapper function that does this sequence?) It seems to me to share some ideas with the MemoryContext concept: what about a TupstoreContext associated with tuplestore, you get a common default one if you don't register your own, and use tuplestore_gettuple(MyTupstoreContext, ...); Maybe some other API would benefit from the idea? I'm just working on tuplestore recording multiple positions for my window function project. Attached patch is still in progress but seems it works in a situation. From my work, the setting/getting read position and delegate savig positions to the caller will probably have problems, because of memory control for saving positions and tuplestore status changing (memory - BufFile). Instead, I decided it'd better that we can indicate the row number by integer. Regards, -- Hitoshi Harada *** a/src/backend/utils/sort/tuplestore.c --- b/src/backend/utils/sort/tuplestore.c *** *** 141,148 struct Tuplestorestate --- 141,159 int markpos_current; /* saved current */ int markpos_file; /* saved readpos_file */ off_t markpos_offset; /* saved readpos_offset */ + + /* + * When asked EXEC_FLAG_SETPOS, we need record multiple state-myfile pointer. + * This pointer is used to optimize offset calculation closer to O(1). + * npos_file is fixed layout file since only int + off_t will be needed to restore + * tuple positions. int + off_t may get better by struct{int, off_t} to reduce call + * BufFileWrite and BufFileRead? + * This feature is off unless eflags is set. + */ + BufFile *npos_file; }; + #define COPYTUP(state,tup) ((*(state)-copytup) (state, tup)) #define WRITETUP(state,tup) ((*(state)-writetup) (state, tup)) #define READTUP(state,len) ((*(state)-readtup) (state, len)) *** *** 150,155 struct Tuplestorestate --- 161,175 #define USEMEM(state,amt) ((state)-availMem -= (amt)) #define FREEMEM(state,amt) ((state)-availMem += (amt)) + #define RecordFilePointer(state,file,offset) do{\ + BufFileTell((state)-myfile, \ + (file), (offset)); \ + BufFileWrite((state)-npos_file, \ + (file), sizeof(int)); \ + BufFileWrite((state)-npos_file, \ + (offset), sizeof(off_t)); \ + }while(0) + /* * * NOTES about on-tape representation of tuples: *** *** 200,205 static Tuplestorestate *tuplestore_begin_common(int eflags, --- 220,226 int maxKBytes); static void tuplestore_puttuple_common(Tuplestorestate *state, void *tuple); static void dumptuples(Tuplestorestate *state); + static void tuplestore_recordpos(Tuplestorestate *state); static void tuplestore_trim(Tuplestorestate *state, int ntuples); static unsigned int getlen(Tuplestorestate *state, bool eofOK); static void *copytup_heap(Tuplestorestate *state, void *tup); *** *** 234,239 tuplestore_begin_common(int eflags, bool interXact, int maxKBytes) --- 255,262 state-eof_reached = false; state-current = 0; + state-npos_file = NULL; + return state; } *** *** 291,296 tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes) --- 314,320 * EXEC_FLAG_REWIND need rewind to start * EXEC_FLAG_BACKWARD need backward fetch * EXEC_FLAG_MARK need mark/restore + * EXEC_FLAG_SETPOS need setpos * If tuplestore_set_eflags is not called, REWIND and MARK are allowed, * and BACKWARD is set per randomAccess in the tuplestore_begin_xxx call. */ *** *** 315,320 tuplestore_end(Tuplestorestate *state) --- 339,346 if (state-myfile) BufFileClose(state-myfile); + if (state-npos_file) + BufFileClose(state-npos_file); if (state-memtuples) { for (i = 0; i state-memtupcount; i++) *** *** 443,452 tuplestore_puttuple_common(Tuplestorestate *state, void *tuple) --- 469,481 */ PrepareTempTablespaces(); state-myfile = BufFileCreateTemp(state-interXact); + if (state-eflags EXEC_FLAG_SETPOS) + state-npos_file = BufFileCreateTemp(state-interXact); state-status = TSS_WRITEFILE; dumptuples(state); break; case TSS_WRITEFILE: + tuplestore_recordpos(state); WRITETUP(state, tuple); break; case TSS_READFILE: *** *** 461,466 tuplestore_puttuple_common(Tuplestorestate *state, void *tuple) --- 490,496 state-writepos_file, state-writepos_offset, SEEK_SET) != 0) elog(ERROR, seek to EOF
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
greg On 24 Sep 2008, at 02:45, Tom Lane [EMAIL PROTECTED] wrote: The next big thing seems to be to figure out exactly how to do multiple references to CTE outputs, so that we can de-bogotify the planner. I've looked and don't seem to still have the source tree where I worked on this. I remember how I made the changes to tuplestore which was mostly mechanical. The trick I think will be in adding a special purpose executor method which passes the call site to the node below. This depends on the observation that if we always memoize results then each call site can only have one active call. That is, we don't need to maintain a full stack tree. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tom, WithClause node may need a location field, and almost certainly has to be handled somehow in exprLocation(). The error reports in parse_cte.c *desperately* need error locations. Included is a patch for this against your cte-0923.patch.gz. Most errors now have error locations, but some do not. I'm going to think more to enhance this. -- Tatsuo Ishii SRA OSS, Inc. Japan *** pgsql/src/backend/parser/parse_cte.c2008-09-25 11:06:12.0 +0900 --- pgsql.patched/src/backend/parser/parse_cte.c2008-09-25 10:46:41.0 +0900 *** *** 239,245 if (query-intoClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(subquery in WITH cannot have SELECT INTO))); /* Compute the derived fields if not done yet */ if (!cte-cterecursive) --- 239,247 if (query-intoClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(subquery in WITH cannot have SELECT INTO), !parser_errposition(pstate, ! exprLocation((Node *) query-intoClause; /* Compute the derived fields if not done yet */ if (!cte-cterecursive) *** *** 561,625 lresult != NON_RECURSIVE) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(Left hand side of UNION ALL must be a non-recursive term in a recursive query))); else if (stmt-op == SETOP_INTERSECT) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(non-recursive term and recursive term must not be combined with INTERSECT))); else if (stmt-op == SETOP_EXCEPT) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(non-recursive term and recursive term must not be combined with EXCEPT))); else if (stmt-op == SETOP_UNION stmt-all != true) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(non-recursive term and recursive term must be combined with UNION ALL))); else if (stmt-op == SETOP_UNION stmt-all == true rarg-op == SETOP_UNION) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(Right hand side of UNION ALL must not contain UNION operation))); else if (stmt-sortClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(ORDER BY in a recursive query not allowed))); else if (stmt-limitOffset || stmt-limitCount) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(LIMIT OFFSET in a recursive query not allowed))); else if (stmt-lockingClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(FOR UPDATE in a recursive query not allowed))); else if (lresult == NON_RECURSIVE rresult == RECURSIVE_SELF) { if (larg-distinctClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(DISTINCT in a non recursive term not allowed))); if (rarg-distinctClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !errmsg(DISTINCT in a recursive term not allowed))); if (rarg-groupClause) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), !
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Jeff Davis [EMAIL PROTECTED] writes: Here is a patch that is an initial attempt to reorganize the parse tree representation. Oh dear, we seem to have spent yesterday doing the same work :-( I'll go over this and try to merge it with my own WIP. * There are a couple of other rough points in places where it's hard to traverse up the parse tree or query tree. Yeah, I'd been running into that issue too. Adding an explicit pointer to the CTE into the RTE doesn't work because it renders the parse tree un-copiable (at least without something a lot more sophisticated than copyObject; and saving/loading rule parsetrees would be tough too). What I've got at the moment is that creation of an RTE_CTE RTE copies the CTE's lists of output column types/typmods into the RTE. This eliminates the need for expandRTE and a couple of other places to be able to find the CTE; everything they need is in the RTE. So far as I can see, everyplace else that might need to find the CTE from the RTE is in places that either have a ParseState available, or have some comparable structure that could provide a way to search upwards for CTEs (eg, in ruleutils the deparse context will need to track uplevel CTE lists as well as rtables). It is a bit tedious though. Can anyone think of another way that would still preserve the notion of multiple RTEs being links to the same CTE rather than independent subqueries? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tom Lane [EMAIL PROTECTED] writes: Yeah, I'd been running into that issue too. Adding an explicit pointer to the CTE into the RTE doesn't work because it renders the parse tree un-copiable (at least without something a lot more sophisticated than copyObject; and saving/loading rule parsetrees would be tough too). Well the alternative to direct pointers is as you did with subqueries, turning the set into a flat array and storing indexes into the array. I'm not sure if that applies here or not. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Yeah, I'd been running into that issue too. Adding an explicit pointer to the CTE into the RTE doesn't work because it renders the parse tree un-copiable (at least without something a lot more sophisticated than copyObject; and saving/loading rule parsetrees would be tough too). Well the alternative to direct pointers is as you did with subqueries, turning the set into a flat array and storing indexes into the array. I'm not sure if that applies here or not. I think that just changes the problem into where can I find the array? ... The real issue here is that simplicity of copying etc requires that child nodes in a parse tree never have back-links leading up to their parent. If we were willing to drop that requirement for Query then we'd not need any auxiliary data structures to chase up to upper-level rtables or CTEs. I'm not sure this cure is better than the disease though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Attached is the result of a couple of days hacking on the WITH RECURSIVE patch. This moves us a lot closer to having sanity in the parsing phase, though I'm still quite unhappy with the second half of the processing in parse_cte.c. I added some infrastructure to make the first half's search for CTE references reasonably bulletproof, but the second half needs to be converted to use the same infrastructure, and I didn't do that yet because I didn't understand what it was doing. In particular, what the heck is the exception in findCteName that allows some other CTE's non_recursive_term to be stored into the non_recursive_term for the current one? That seems mighty broken. There are a number of unfinished corner cases (look for XXX in the patch) but they aren't in the way of further progress. The next big thing seems to be to figure out exactly how to do multiple references to CTE outputs, so that we can de-bogotify the planner. regards, tom lane bintORC4R6tx5.bin Description: cte-0923.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
I am re-sending this message to -hackers from yesterday, because the first time it didn't appear to make it through. This time I gzipped the patch. This is just for the archives (and to give context to the replies), and this message is superseded by Tom's patch here: http://archives.postgresql.org/pgsql-hackers/2008-09/msg01521.php On Thu, 2008-09-18 at 12:55 +0900, Tatsuo Ishii wrote: Tom, thanks for the review. Here is an in-progress report. Patches against CVS HEAD attached. (uncommented items are work-in-progress). Here is a patch that is an initial attempt to reorganize the parse tree representation. The goal of this patch is to separate the RTEs from the CTEs, so that we can, for example, have multiple RTEs refer to the same CTE. This will hopefully allow us to materialize a volatile query once, and have several RTEs refer to that same value, which will meet the SQL standard. Notes: * It makes a p_cte_table in the ParseState, which is a list of CteTblEntries. This replaces p_ctenamespace, which was a list of RangeSubselects. * It copies the p_cte_table into Query.cte_table * I introduced a new type of RTE, RTE_CTE, which holds a cte_index and cte_levelsup. This is used to find the CTE that the RTE references. Weak points: * It does not change the behavior of recursive queries. That is a little more complicated, so I wanted to wait for feedback on my patch so far. * I don't understand set_subquery_pathlist, or that general area of the code. I made a new set_cte_pathlist, that is basically the same thing, except I used a hack dummy_subquery variable in the RTE to pass along a pointer to the subquery of the CTE. I think this dummy variable can be removed, but I just don't understand that part of the code well enough to know what should happen. And if it does require a subquery at that point, I'll need to find a way of locating the right cte_table from inside that function. Any advice here would be appreciated. * There are a couple of other rough points in places where it's hard to traverse up the parse tree or query tree. I can probably work around these weak points, but I wanted to send the patch to avoid a lot of conflicts or problems later. Tell me whether you think this is moving in the right direction. Regards, Jeff Davis cte_fixparse.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
PlanState.has_recursivescan seems like a complete kluge. Can't it just be removed? It looks to me like it is working around bugs that hopefully aren't there anymore. There is certainly no reason why a recursive CTE should be more in need of rescanning than any other kind of plan. I don't think so. Recursion plan needs the hash table used by sublan be re-created at each recursion loop stage. Remember that in each evaluation of recursive plan, the recursive name is replaced by a working table which is holding previous evalution result of recursion stage. Thus the hash table corresponding to the work table needs to be re-created. If it is needed then the current implementation is completely broken anyway, since it would only detect a RecursiveScan node that is directly underneath an agg or hash node. Yeah, that's right. What I have in my mind is to implement something similar to UpdateChangedParamSet family like mechanism which will inherit working table change event to child node. -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tatsuo Ishii [EMAIL PROTECTED] writes: PlanState.has_recursivescan seems like a complete kluge. Can't it just be removed? It looks to me like it is working around bugs that hopefully aren't there anymore. There is certainly no reason why a recursive CTE should be more in need of rescanning than any other kind of plan. I don't think so. Recursion plan needs the hash table used by sublan be re-created at each recursion loop stage. Remember that in each evaluation of recursive plan, the recursive name is replaced by a working table which is holding previous evalution result of recursion stage. Thus the hash table corresponding to the work table needs to be re-created. Oh, I see. I keep getting confused about whether RecursiveScan is at the top or the bottom of the recursion plan tree :-(. Maybe it would help to use a different name for it? RecursionInjector or something like that? If it is needed then the current implementation is completely broken anyway, since it would only detect a RecursiveScan node that is directly underneath an agg or hash node. Yeah, that's right. What I have in my mind is to implement something similar to UpdateChangedParamSet family like mechanism which will inherit working table change event to child node. I think it could actually *be* UpdateChangedParamSet, if you just associate some otherwise-unused Param with each RecursiveScan node, and have the Recursion node signal a change of that Param when it revises the work table. In fact, why not combine that with getting rid of the klugy addition to ExecutorState? Make the param be actually useful: it could contain some internal datastructure that passes the work table down to the RecursiveScan node. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Why does set_recursion_pathlist think that the subquery might have useful pathkeys? We know it must always be a UNION ALL, no? Right. But someday we might implement UNION (without ALL) then we have useful pathkeys... Or shall I completely remove the step to generate patheys and do not pass pathkeys to create_recursion_path? pathkeys = convert_subquery_pathkeys(root, rel, subroot-query_pathkeys); -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tatsuo Ishii [EMAIL PROTECTED] writes: Why does set_recursion_pathlist think that the subquery might have useful pathkeys? We know it must always be a UNION ALL, no? Right. But someday we might implement UNION (without ALL) then we have useful pathkeys... Good point. Might as well leave it in. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
2008/9/17 Tom Lane [EMAIL PROTECTED]: Tatsuo Ishii [EMAIL PROTECTED] writes: Do we really have to make RECURSIVE a fully reserved keyword? According to the standard, RECURSIVE is a reserved keyword, I believe. Sure, but our general rule is to make keywords no more reserved than is absolutely necessary to make the bison grammar unambiguous. I haven't tested, but I'm thinking that if WITH is fully reserved then RECURSIVE shouldn't have to be. I am not sure, if these rule is good. Somebody who develop on postgresql should have a problems when they will be port to other databases in future. Reserved words in standards should be respected. regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
I am not sure, if these rule is good. Somebody who develop on postgresql should have a problems when they will be port to other databases in future. Reserved words in standards should be respected. I disagree. I have never ported an app written for PostgreSQL to another database system, and have no plans to start. The fact that some other database system might barf on a particular bit of SQL is insufficient reason for PostgreSQL to do the same thing. If people want to write code that will work on multiple databases, they should of course avoid using any SQL reserved words for anything other than their reserved purposes. But there is no reason for the database system to unilaterally shove that down everyone's throat. It is very easy to overdo the idea of protecting users from themselves. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Robert Haas [EMAIL PROTECTED] writes: I am not sure, if these rule is good. Somebody who develop on postgresql should have a problems when they will be port to other databases in future. Reserved words in standards should be respected. I disagree. I have never ported an app written for PostgreSQL to another database system, and have no plans to start. The fact that some other database system might barf on a particular bit of SQL is insufficient reason for PostgreSQL to do the same thing. If people want to write code that will work on multiple databases, they should of course avoid using any SQL reserved words for anything other than their reserved purposes. More than that, they have to actually test their SQL on each target DB. Every DB (including us) is going to have some reserved words that are not in the standard; so imagining that Postgres can all by itself protect you from this type of problem is doomed to failure anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tom Lane [EMAIL PROTECTED] wrote: Robert Haas [EMAIL PROTECTED] writes: I am not sure, if these rule is good. Somebody who develop on postgresql should have a problems when they will be port to other databases in future. Reserved words in standards should be respected. If people want to write code that will work on multiple databases, they should of course avoid using any SQL reserved words for anything other than their reserved purposes. More than that, they have to actually test their SQL on each target DB. Every DB (including us) is going to have some reserved words that are not in the standard; so imagining that Postgres can all by itself protect you from this type of problem is doomed to failure anyway. If someone wants portable code, they can use a development tool which wraps ALL identifiers in quotes, every time. That's what we do. The important thing is that, to the extent practicable, standard SQL code is accepted and behaves in compliance with the standard. I don't see that it does anything to compromise that if you support additional, non-standard syntax for extensions. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Is physical_tlist optimization sensible for RecursiveScan? We seem to use it for every other Scan node type. To enable physical_tlist optimization, it seems build_physical_tlist, use_physical_tlist and disuse_physical_tlist need to be changed. build_physical_tlist and use_physical_tlist have been already patched and only disuse_physical_tlist needs to be patched. Any other place I miss to enable the optimization? -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tatsuo Ishii [EMAIL PROTECTED] writes: Is physical_tlist optimization sensible for RecursiveScan? We seem to use it for every other Scan node type. To enable physical_tlist optimization, it seems build_physical_tlist, use_physical_tlist and disuse_physical_tlist need to be changed. build_physical_tlist and use_physical_tlist have been already patched and only disuse_physical_tlist needs to be patched. Any other place I miss to enable the optimization? IIRC, the comment for build_physical_tlist hadn't been patched, but yeah that seems like about it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
To enable physical_tlist optimization, it seems build_physical_tlist, use_physical_tlist and disuse_physical_tlist need to be changed. build_physical_tlist and use_physical_tlist have been already patched and only disuse_physical_tlist needs to be patched. Any other place I miss to enable the optimization? IIRC, the comment for build_physical_tlist hadn't been patched, but yeah that seems like about it. Yeah, I need to fix sloppy comments in the existing patches all over the places:-) -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tom, thanks for the review. Here is an in-progress report. Patches against CVS HEAD attached. (uncommented items are work-in-progress). -- Tatsuo Ishii SRA OSS, Inc. Japan Tatsuo Ishii [EMAIL PROTECTED] writes: Included is the latest patches against CVS HEAD. I spent some time reading this patch. Here are a few comments in no particular order: RangeRecursive node lacks copyfuncs/equalfuncs support. Functions added. Query.recursive is missed in equalfuncs.c. But rather than fix that, get rid of it entirely. AFAICS the only use is in qual_is_pushdown_safe, and what is the value of that test? The callers know perfectly well whether they are operating on a recursive RTE or not. You might as well just delete all the useless qual-pushdown-attempt code from set_recursion_pathlist, and not need to touch qual_is_pushdown_safe at all. Query.recursive removed and qual_is_pushdown_safe is untouched. Is physical_tlist optimization sensible for RecursiveScan? We seem to use it for every other Scan node type. Fixed and physical_tlist optimization is enabled for RecursiveScan I believe. I dislike putting state into ExecutorState; that makes it impossible to have multiple recursion nodes in one plan tree. It would probably be better for the Recursion and RecursiveScan nodes to talk to each other directly (compare HashJoin/Hash); although since they are not adjacent in the plan tree I admit I'm not sure how to do that. es_disallow_tuplestore doesn't seem to need to be in ExecutorState at all, it could as well be in RecursionState. It was for a workaround to avoid an infinit recursion in some cases. Discussion came to the conclusion that it's user's responsibilty to avoid such that case (otherwise the semantics of our recursive query becomes to be different from the one defined in the standard) I believe. es_disallow_tuplestore removed. I don't really like the way that Append nodes are being abused here. It would be better to allow nodeRecursion.c to duplicate a little code from nodeAppend.c, and have the child plans be direct children of the Recursion node. BTW, is it actually possible to have more than two children? I didn't spend enough time analyzing the restrictions in parse_cte to be sure. If there are always just two then you could simplify the representation by treating it like a join node instead of an append. (The RTE_RECURSIVE representation sure makes it look like there can be only two...) Mark/restore support seems useless ... note the comment on ExecSupportsMarkRestore (which should be updated if this code isn't removed). RecursiveScan claims to support backwards fetch, but does not in fact contain code to do so. (Given that it will always be underneath Recursion, which can't do backwards fetch, I see little point in adding such code; fix execAmi.c instead.) ExecInitRecursion doesn't seem to be on the same page about whether it supports backward scan as execAmi.c, either. This comment in nodeRecursivescan.c seems bogus: /* * Do not initialize scan tuple type, result tuple type and * projection info in ExecInitRecursivescan. These types are * initialized after initializing Recursion node. */ because the code seems to be doing exactly what the comment says it doesn't. Numerous comments appear to have been copied-and-pasted and not modified from the original. Somebody will have to go over all that text. ruleutils.c fails completely for non-recursive WITH. It *must* regenerate such a query with a WITH clause, not as a flattened subquery which is what you seem to be doing. This isn't negotiable because the semantics are different. This will mean at least some change in the parsetree representation. Perhaps we could add a bool to subquery RTEs to mark them as coming from a nonrecursive WITH? The tests added for RTE_RECURSIVE seem a bit ugly too. If it thinks that can't happen it should Assert so, not just fall through silently. commentary for ParseState.p_ctenamespace is gratuitously unlike the comment style for the other fields, and p_recursive_namespace isn't documented at all. ParseState.p_in_with_clause is unused, should be removed. Done. The WithClause struct definition is poorly commented. It should be stated that it is used only pre-parse-analysis (assuming that continues to be true after you get done fixing ruleutils.c...), and it doesn't say what the elements of the subquery list are (specifically, what node type). A lot of the other added structs and fields could use better commenting too. For that matter subquery is a poor name for WithClause's list of CTEs, especially so since it's hard to search for. It should be a plural name and I'd be inclined to use something like ctes not subqueries. The term subquery is too overloaded already, so any place you can refer to a WITH-list member as a CTE you should do so. WithClause node may need a
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, Sep 15, 2008 at 06:46:16PM +0900, Tatsuo Ishii wrote: * Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's an easy way to fix this as Tom suggested. Maybe we should not allow WITH clause without RECURISVE for 8.4? This is a still remaing issue... I don't think that the spec explicitly forbids this. Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, 2008-09-15 at 22:41 -0700, David Fetter wrote: On Mon, Sep 15, 2008 at 06:46:16PM +0900, Tatsuo Ishii wrote: * Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's an easy way to fix this as Tom suggested. Maybe we should not allow WITH clause without RECURISVE for 8.4? This is a still remaing issue... I don't think that the spec explicitly forbids this. This has been discussed before. Regardless of the nuances of the spec (and whether it says so explicitly or implicitly), there are people in the community that see single evaluation as important, and important enough to be a showstopper. Tom Lane has suggested that this is a reasonable amount of work to complete, and minor in comparison to the rest of the patch. I think the right approach is to try to complete it so that everyone is happy. I will work on this, but unfortunately I don't have a lot of time right now, so I can't make any promises. The rest of the patch looks good so far (there are still a few things I want to look at), so I think this is the biggest open issue. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tatsuo Ishii [EMAIL PROTECTED] writes: Included is the latest patches against CVS HEAD. I spent some time reading this patch. Here are a few comments in no particular order: RangeRecursive node lacks copyfuncs/equalfuncs support. Query.recursive is missed in equalfuncs.c. But rather than fix that, get rid of it entirely. AFAICS the only use is in qual_is_pushdown_safe, and what is the value of that test? The callers know perfectly well whether they are operating on a recursive RTE or not. You might as well just delete all the useless qual-pushdown-attempt code from set_recursion_pathlist, and not need to touch qual_is_pushdown_safe at all. Is physical_tlist optimization sensible for RecursiveScan? We seem to use it for every other Scan node type. I dislike putting state into ExecutorState; that makes it impossible to have multiple recursion nodes in one plan tree. It would probably be better for the Recursion and RecursiveScan nodes to talk to each other directly (compare HashJoin/Hash); although since they are not adjacent in the plan tree I admit I'm not sure how to do that. es_disallow_tuplestore doesn't seem to need to be in ExecutorState at all, it could as well be in RecursionState. I don't really like the way that Append nodes are being abused here. It would be better to allow nodeRecursion.c to duplicate a little code from nodeAppend.c, and have the child plans be direct children of the Recursion node. BTW, is it actually possible to have more than two children? I didn't spend enough time analyzing the restrictions in parse_cte to be sure. If there are always just two then you could simplify the representation by treating it like a join node instead of an append. (The RTE_RECURSIVE representation sure makes it look like there can be only two...) Mark/restore support seems useless ... note the comment on ExecSupportsMarkRestore (which should be updated if this code isn't removed). RecursiveScan claims to support backwards fetch, but does not in fact contain code to do so. (Given that it will always be underneath Recursion, which can't do backwards fetch, I see little point in adding such code; fix execAmi.c instead.) ExecInitRecursion doesn't seem to be on the same page about whether it supports backward scan as execAmi.c, either. This comment in nodeRecursivescan.c seems bogus: /* * Do not initialize scan tuple type, result tuple type and * projection info in ExecInitRecursivescan. These types are * initialized after initializing Recursion node. */ because the code seems to be doing exactly what the comment says it doesn't. Numerous comments appear to have been copied-and-pasted and not modified from the original. Somebody will have to go over all that text. ruleutils.c fails completely for non-recursive WITH. It *must* regenerate such a query with a WITH clause, not as a flattened subquery which is what you seem to be doing. This isn't negotiable because the semantics are different. This will mean at least some change in the parsetree representation. Perhaps we could add a bool to subquery RTEs to mark them as coming from a nonrecursive WITH? The tests added for RTE_RECURSIVE seem a bit ugly too. If it thinks that can't happen it should Assert so, not just fall through silently. commentary for ParseState.p_ctenamespace is gratuitously unlike the comment style for the other fields, and p_recursive_namespace isn't documented at all. ParseState.p_in_with_clause is unused, should be removed. The WithClause struct definition is poorly commented. It should be stated that it is used only pre-parse-analysis (assuming that continues to be true after you get done fixing ruleutils.c...), and it doesn't say what the elements of the subquery list are (specifically, what node type). A lot of the other added structs and fields could use better commenting too. For that matter subquery is a poor name for WithClause's list of CTEs, especially so since it's hard to search for. It should be a plural name and I'd be inclined to use something like ctes not subqueries. The term subquery is too overloaded already, so any place you can refer to a WITH-list member as a CTE you should do so. WithClause node may need a location field, and almost certainly has to be handled somehow in exprLocation(). The error reports in parse_cte.c *desperately* need error locations. Why does transformWithClause do parse_sub_analyze twice? I'm not sure that's even safe, and it's surely unnecessary. Also, what happens if a subquery isn't a SelectStmt? Silently doing nothing doesn't seem like a good plan there. Why are we putting essentially the same information into both p_recursive_namespace and p_ctenamespace? Is there really a need for both lists? The code added to transformFromClauseItem seems quite wrong since it searches both lists even if it found a match in the first one. This whole area looks like it needs refactoring. Costing is all bogus,
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Jeff Davis [EMAIL PROTECTED] writes: I think the right approach is to try to complete it so that everyone is happy. I will work on this, but unfortunately I don't have a lot of time right now, so I can't make any promises. I think there are two significant bits there: * Fixing the parsetree representation so that the distinction between a CTE and an RTE that references the CTE is preserved. * Implementing some kind of multiple readout tuplestore to permit multiple RTEs to scan a CTE plan without causing any row to be evaluated more than once. The first of these seems relatively straightforward: the WITH clause has to be preserved explicitly in the Query representation, and RTEs for CTEs should just carry indexes into the WITH list, not copies of the subqueries. (Hm, we might need both an index and a levelsup counter, to deal with nested queries...) This would solve ruleutils' problem, too. I haven't thought much about the multiple readout tuplestore though. Anyone have a clear idea how to do that? In particular, can the existing tuplestore code be enhanced to do it (without sacrificing performance in the simple case), or do we need something new? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
I had either on that a while back. I think the existing tuplestore can be made to work without bad performance penaltiesbin the usual case. The trick was to do it without making the code unreadable. I can send the code I had but I doubt you want to use it as-is. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Greg Stark [EMAIL PROTECTED] writes: I had either on that a while back. I think the existing tuplestore can be made to work without bad performance penaltiesbin the usual case. The trick was to do it without making the code unreadable. I can send the code I had but I doubt you want to use it as-is. Sure. Maybe someone else will see a way to make it more readable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Do we really have to make RECURSIVE a fully reserved keyword? According to the standard, RECURSIVE is a reserved keyword, I believe. (Actually, the patch makes it worse than reserved, by failing to add it to the reserved_keywords list.) Yes, it's a bug. Will fix... -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Tatsuo Ishii [EMAIL PROTECTED] writes: Do we really have to make RECURSIVE a fully reserved keyword? According to the standard, RECURSIVE is a reserved keyword, I believe. Sure, but our general rule is to make keywords no more reserved than is absolutely necessary to make the bison grammar unambiguous. I haven't tested, but I'm thinking that if WITH is fully reserved then RECURSIVE shouldn't have to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
* Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's an easy way to fix this as Tom suggested. Maybe we should not allow WITH clause without RECURISVE for 8.4? This is a still remaing issue... * Binary recursion and subselect strangeness: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 union all select i+1 from foo where i X) t) select * from foo; Produces 10 rows of output regardless of what X is. This should be fixed for 8.4. Also, this is non-linear recursion, which the standard seems to disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. I have implemented rejection of non-linear recursion and now this type of query will not be executed anyway. * Multiple recursive references: with recursive foo(i) as (values (1) union all select i+1 from foo where i 10 union all select i+1 from foo where i 20) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid case. I will try to disallow this. Non-linear recursion is not allowed now. * Strange result with except: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 except select i+1 from foo where i 5) t) select * from foo; ERROR: table foo has 0 columns available but 1 columns specified This query works if you replace except with union. This should be fixed for 8.4. I will try to fix this. This is a non-linear recursion too and will not be executed anyway. * Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. Fixed. * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Ok, now this type of DISTINCT is not allowed. Included is the latest patches against CVS HEAD. -- Tatsuo Ishii SRA OSS, Inc. Japan recursive_query.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Tue, 2008-09-09 at 09:47 -0400, Robert Haas wrote: 3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. What makes you think it's going to be undocumented? Single versus multiple evaluation is a keep aspect of this feature and certainly needs to be documented one way or the other. I can't understand why we would introduce a standard syntax with non-standard behavior, but if we do, it certainly had better be mentioned in the documentation. I meant that -- hypothetically if we did accept the feature as-is -- the number of evaluations would be documented to be undefined, not N. That would avoid the backwards-compatibility problem. This one point is probably not worth discussing now, because argument #1 and #2 stand on their own. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
I meant that -- hypothetically if we did accept the feature as-is -- the number of evaluations would be documented to be undefined, not N. That would avoid the backwards-compatibility problem. This one point is probably not worth discussing now, because argument #1 and #2 stand on their own. Agreed. Plus, both Tom and Pavel seem to think this is a relatively solvable problem. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: Thanks for the review. The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? I will try to fix this. However detecting the query being not a non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. What makes you think it's going to be undocumented? Single versus multiple evaluation is a keep aspect of this feature and certainly needs to be documented one way or the other. I can't understand why we would introduce a standard syntax with non-standard behavior, but if we do, it certainly had better be mentioned in the documentation. I think that the most likely result of a CTE implementation that doesn't guarantee single evaluation is that people simply won't use it. But anyone who does will expect that their queries will return the same results in release N and release N+1, for all values of N. The only way that an incompatible change of this type won't break people's applications is if they're not using the feature in the first place, in which case there is no point in committing it anyway. I wonder if the whole approach to this patch is backward. Instead of worrying about how to implement WITH RECURSIVE, maybe it would be better to implement a really solid, spec-compliant version of WITH, and add the RECURSIVE functionality in a later patch/release. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: Thanks for the review. The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? According to Tom, to implement single evaluation we need to make big infrastructure enhancement which is likely slip the schedule for 8.4 release which Tom does not want. So as long as Tom and other people think that is a must fix, there seems no hope probably. Anyway I will continue to work on existing patches... -- Tatsuo Ishii SRA OSS, Inc. Japan I will try to fix this. However detecting the query being not a non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Hello 2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]: On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: Thanks for the review. The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? According to Tom, to implement single evaluation we need to make big infrastructure enhancement which is likely slip the schedule for 8.4 release which Tom does not want. why? why don't use a materialisation? So as long as Tom and other people think that is a must fix, there seems no hope probably. Anyway I will continue to work on existing patches... -- I would to see your patch in core early. I am working on grouping sets and I cannot finish my patch before your patch will be commited. Regards Pavel Stehule Tatsuo Ishii SRA OSS, Inc. Japan I will try to fix this. However detecting the query being not a non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Hello 2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]: On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: Thanks for the review. The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? According to Tom, to implement single evaluation we need to make big infrastructure enhancement which is likely slip the schedule for 8.4 release which Tom does not want. why? why don't use a materialisation? See: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php So as long as Tom and other people think that is a must fix, there seems no hope probably. Anyway I will continue to work on existing patches... -- I would to see your patch in core early. I am working on grouping sets and I cannot finish my patch before your patch will be commited. Regards Pavel Stehule Tatsuo Ishii SRA OSS, Inc. Japan I will try to fix this. However detecting the query being not a non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Hello 2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]: On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: Thanks for the review. The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? According to Tom, to implement single evaluation we need to make big infrastructure enhancement which is likely slip the schedule for 8.4 release which Tom does not want. why? why don't use a materialisation? See: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php So as long as Tom and other people think that is a must fix, there seems no hope probably. Anyway I will continue to work on existing patches... -- I would to see your patch in core early. I am working on grouping sets and I cannot finish my patch before your patch will be commited. Regards Pavel Stehule Tatsuo Ishii SRA OSS, Inc. Japan I will try to fix this. However detecting the query being not a non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
* Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. We already reject: select max(i) from foo where i 10) But max(i)+1 seems to slip the check. I looked into this I found the patch tried to detect the case before analyzing(see parser/parse_cte.c) which is not a right thing I think. I think we could detect the case by adding more checking in parseCheckAggregates(): /* * Check if there's aggregate function in a recursive term. */ foreach(l, qry-rtable) { RangeTblEntry *rte = (RangeTblEntry *) lfirst(l); if (qry-hasAggs rte-rtekind == RTE_RECURSIVE rte-self_reference) { ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg(aggregate functions in a recursive term not allowed))); } } What do you think? -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]: Hello 2008/9/9 Tatsuo Ishii [EMAIL PROTECTED]: On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: Thanks for the review. The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php I am blind, I didn't find any reason, why materialisation isn't useable. Regards Pavel The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. I think this is a must fix because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time. I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? According to Tom, to implement single evaluation we need to make big infrastructure enhancement which is likely slip the schedule for 8.4 release which Tom does not want. why? why don't use a materialisation? See: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php So as long as Tom and other people think that is a must fix, there seems no hope probably. Anyway I will continue to work on existing patches... -- I would to see your patch in core early. I am working on grouping sets and I cannot finish my patch before your patch will be commited. Regards Pavel Stehule Tatsuo Ishii SRA OSS, Inc. Japan I will try to fix this. However detecting the query being not a non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php I am blind, I didn't find any reason, why materialisation isn't useable. I believe it's because of these two (closely related) problems: # The basic # implementation clearly ought to be to dump the result of the subquery # into a tuplestore and then have the upper level read out from that. # However, we don't have any infrastructure for having multiple # upper-level RTEs reference the same tuplestore. (Perhaps the InitPlan # infrastructure could be enhanced in that direction, but it's not ready # for non-scalar outputs today.) Also, I think we'd have to teach # tuplestore how to support multiple readout cursors. For example, # consider # WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ... # If the planner chooses to do the join as a nested loop then each # Scan node needs to keep track of its own place in the tuplestore, # concurrently with the other node having a different place. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
2008/9/9 Robert Haas [EMAIL PROTECTED]: My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php I am blind, I didn't find any reason, why materialisation isn't useable. I believe it's because of these two (closely related) problems: # The basic # implementation clearly ought to be to dump the result of the subquery # into a tuplestore and then have the upper level read out from that. # However, we don't have any infrastructure for having multiple # upper-level RTEs reference the same tuplestore. (Perhaps the InitPlan # infrastructure could be enhanced in that direction, but it's not ready # for non-scalar outputs today.) Also, I think we'd have to teach # tuplestore how to support multiple readout cursors. For example, # consider # WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ... # If the planner chooses to do the join as a nested loop then each # Scan node needs to keep track of its own place in the tuplestore, # concurrently with the other node having a different place. hmm. I solve similar problem in grouping sets :( etc SELECT ... FROM ... GROUP BY GROUPING SETS (a,b) is almost same as With foo AS (SELECT ... FROM) SELECT ... FROM foo GROUP BY a UNION ALL SELECT ... FROM foo GROUP BY b; Regards Pavel ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote: hmm. I solve similar problem in grouping sets :( etc How did you solve it? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
2008/9/9 Jeff Davis [EMAIL PROTECTED]: On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote: hmm. I solve similar problem in grouping sets :( etc I have special executor node - feeder, it hold one tuple and others nodes read from this node. It's usable for hash aggregates. Pavel plan is like: grouping sets -- seq scan ... -- hash aggg -- feeder -- hash agg -- feeder How did you solve it? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Robert Haas [EMAIL PROTECTED] writes: I am blind, I didn't find any reason, why materialisation isn't useable. I believe it's because of these two (closely related) problems: # The basic # implementation clearly ought to be to dump the result of the subquery # into a tuplestore and then have the upper level read out from that. # However, we don't have any infrastructure for having multiple # upper-level RTEs reference the same tuplestore. (Perhaps the InitPlan # infrastructure could be enhanced in that direction, but it's not ready # for non-scalar outputs today.) Also, I think we'd have to teach # tuplestore how to support multiple readout cursors. For example, # consider # WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ... # If the planner chooses to do the join as a nested loop then each # Scan node needs to keep track of its own place in the tuplestore, # concurrently with the other node having a different place. The amount of new code needed for that seems a pittance compared to the size of the patch already, so I'm not seeing why Tatsuo-san considers it infeasible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Common Table Expressions (WITH RECURSIVE) patch
These are my initial comments about the Common Table Expressions (CTE) patch, also known as WITH [RECURSIVE]. These comments are based on the patch here: http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php This is a semantically complex feature, and the standard is fairly complex as well. So I'm approaching this by making my own interpretations from the standard first (I included my interpretations and section references at the end of this email) and comparing to the behavior of the patch. The following examples may be inconsistent with the standard. Some have already been mentioned, and I don't think they all need to be fixed for 8.4, but I mention them here for completeness. * Mutual Recursion: with recursive foo(i) as (values(1) union all select i+1 from bar where i 10), bar(i) as (values(1) union all select i+1 from foo where i 10) select * from foo; ERROR: mutual recursive call is not supported The standard allows mutual recursion. * Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. * RHS only: with recursive foo(i) as (select i+1 from foo where i 10 union all values(1)) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query The standard does not require that the recursive term be on the RHS. * UNION ALL only: with recursive foo(i) as (values(1) union select i+1 from foo where i 10) select * from foo; ERROR: non-recursive term and recursive term must be combined with UNION ALL The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT (when the recursive term is not on the RHS of the EXCEPT). * Binary recursion and subselect strangeness: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 union all select i+1 from foo where i X) t) select * from foo; Produces 10 rows of output regardless of what X is. This should be fixed for 8.4. Also, this is non-linear recursion, which the standard seems to disallow. * Multiple recursive references: with recursive foo(i) as (values (1) union all select i+1 from foo where i 10 union all select i+1 from foo where i 20) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid case. * Strange result with except: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 except select i+1 from foo where i 5) t) select * from foo; ERROR: table foo has 0 columns available but 1 columns specified This query works if you replace except with union. This should be fixed for 8.4. * Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. * ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The standard does not seem to say that these should be rejected. The following are my interpretations of relevant parts of the SQL standard (200n), and the associated sections. These are only my interpretation, so let me know if you interpret the standard differently. Non-linear recursion forbidden: 7.13: Syntax Rules: 2.g.ii My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the same query name. 7.13: Syntax Rules: 2.g.iv EXCEPT can't be used for recursive queries if a recursive reference appears on the RHS: 7.13: Syntax Rules: 2.g.iii.1 INTERSECT ALL/EXCEPT ALL can't be used for recursive queries: 7.13: Syntax Rules: 2.g.iii.5 recursive references must appear in FROM clause: 7.13: Syntax Rules: 2.g.iii.3 My interpretation of this rule is that it does not allow a
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Jeff == Jeff Davis [EMAIL PROTECTED] writes: Jeff * Mutual Recursion: This limitation isn't at all uncommon in other implementations; DB2 docs for example say: If more than one common table expression is defined in the same statement, cyclic references between the common table expressions are not permitted. A cyclic reference occurs when two common table expressions dt1 and dt2 are created such that dt1 refers to dt2 and dt2 refers to dt1. http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref2.doc/rbafzmst295.htm MSSQL's documentation is less clear, but it also appears not to allow mutual recursion (not allowing forward references to CTEs). A CTE can reference itself and previously defined CTEs in the same WITH clause. Forward referencing is not allowed. http://msdn.microsoft.com/en-us/library/ms175972.aspx Jeff * RHS only: Jeff with recursive Jeff foo(i) as (select i+1 from foo where i 10 union all values(1)) Jeff select * from foo; Jeff ERROR: Left hand side of UNION ALL must be a non-recursive term in a Jeff recursive query Jeff The standard does not require that the recursive term be on Jeff the RHS. Again, the standard may not, but existing implementations do: MSSQL: The recursive CTE definition must contain at least two CTE query definitions, an anchor member and a recursive member. Multiple anchor members and recursive members can be defined; however, all anchor member query definitions must be put before the first recursive member definition. All CTE query definitions are anchor members unless they reference the CTE itself. DB2: The following restrictions apply to a recursive common-table-expression: * A list of column-names must be specified following the table-identifier. * The UNION ALL set operator must be specified. * The first fullselect of the first union (the initialization fullselect) must not include a reference to the common-table-expression itself in any FROM clause. Jeff * UNION ALL only: Jeff with recursive Jeff foo(i) as (values(1) union select i+1 from foo where i 10) Jeff select * from foo; Jeff ERROR: non-recursive term and recursive term must be combined with Jeff UNION ALL Jeff The standard seems to allow UNION ALL, UNION, INTERSECT, and Jeff EXCEPT (when the recursive term is not on the RHS of the Jeff EXCEPT). Again, existing implementations disagree. See above for DB2, and for MSSQL: Anchor members must be combined by one of these set operators: UNION ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator allowed between the last anchor member and first recursive member, and when combining multiple recursive members. Jeff * Binary recursion and subselect strangeness: Jeff with recursive foo(i) as Jeff (values (1) Jeff union all Jeff select * from Jeff (select i+1 from foo where i 10 Jeff union all Jeff select i+1 from foo where i X) t) Jeff select * from foo; Jeff Produces 10 rows of output regardless of what X is. This Jeff should be fixed for 8.4. Also, this is non-linear recursion, Jeff which the standard seems to disallow. That looks like it should be disallowed somehow. Jeff * Multiple recursive references: [snip] Jeff If we're going to allow non-linear recursion (which the Jeff standard does not), this seems like it should be a valid Jeff case. We probably shouldn't allow it (as above). [snip * Strange result with except: which looks like a bug] Jeff * Aggregates allowed: which Jeff with recursive foo(i) as Jeff (values(1) Jeff union all Jeff select max(i)+1 from foo where i 10) Jeff select * from foo; Jeff Aggregates should be blocked according to the standard. Jeff Also, causes an infinite loop. This should be fixed for 8.4. Does the standard require anywhere that non-conforming statements must be diagnosed? (seems impractical, since it would forbid extensions) Jeff * DISTINCT should supress duplicates: Jeff with recursive foo(i) as Jeff (select distinct * from (values(1),(2)) t Jeff union all Jeff select distinct i+1 from foo where i 10) Jeff select * from foo; Jeff This outputs a lot of duplicates, but they should be Jeff supressed according to the standard. This query is essentially Jeff the same as supporting UNION for recursive queries, so we Jeff should either fix both for 8.4 or block both for consistency. Yeah, though the standard's use of DISTINCT in this way is something of a violation of the POLA. Jeff * outer joins on a recursive reference should be blocked: Jeff with recursive foo(i) as Jeff (values(1) Jeff union all Jeff select i+1 from foo left join (values(1)) t on (i=column1)) Jeff select * from foo; Jeff Causes an infinite loop, but the standard says using an outer Jeff join in this situation should be prohibited. This
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Jeff Davis [EMAIL PROTECTED] writes: * Mutual Recursion: with recursive foo(i) as (values(1) union all select i+1 from bar where i 10), bar(i) as (values(1) union all select i+1 from foo where i 10) select * from foo; ERROR: mutual recursive call is not supported The standard allows mutual recursion. This seems to be a point of confusion. I originally read the standard and concluded that mutual recursion was an optional feature. Itagaki-san showed me a copy of the spec where it seemed there was a clear blanket prohibition on mutually recursive queries and in fact anything but simple linearly expandable queries. I wonder if there are different versions of the spec floating around on this point. Take a second look at your spec and read on to where it defines linear and expandable. If it doesn't define those terms then it's definitely different from what I read. If it does, read on to see what it does with them. The main reason to define them appeared to be to use them to say that supporting mutual recursion is not required. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, 2008-09-08 at 21:13 +0100, Gregory Stark wrote: Jeff Davis [EMAIL PROTECTED] writes: * Mutual Recursion: with recursive foo(i) as (values(1) union all select i+1 from bar where i 10), bar(i) as (values(1) union all select i+1 from foo where i 10) select * from foo; ERROR: mutual recursive call is not supported The standard allows mutual recursion. This seems to be a point of confusion. I originally read the standard and concluded that mutual recursion was an optional feature. Itagaki-san showed me a copy of the spec where it seemed there was a clear blanket prohibition on mutually recursive queries and in fact anything but simple linearly expandable queries. I wonder if there are different versions of the spec floating around on this point. Take a second look at your spec and read on to where it defines linear and expandable. If it doesn't define those terms then it's definitely different from what I read. If it does, read on to see what it does with them. The main reason to define them appeared to be to use them to say that supporting mutual recursion is not required. I think we're reading the same version of the spec. I'm reading 200n. My interpretation (Syntax Rule 2.h) is that expandable is only used to determine whether it can contain a search or cycle. That being said, I don't think it should be a requirement for 8.4. The CTE patch is an important feature, and we shouldn't hold it up over something comparatively obscure like mutual recursion. As Andrew Gierth cited, other systems don't support it anyway. It should be kept in mind for future work though. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, 2008-09-08 at 18:08 +0100, Andrew Gierth wrote: Jeff * Mutual Recursion: This limitation isn't at all uncommon in other implementations; DB2 docs for example say: As with some other things in my list, this doesn't need to be supported in 8.4. I just wanted to lay out my interpretation of the standard, and places that we might (currently) fall short of it. The fact that other DBMSs don't support mutual recursion is a good indication that it's not important immediately. Jeff The standard does not require that the recursive term be on Jeff the RHS. Again, the standard may not, but existing implementations do: Again, I don't think we need this for 8.4. However, I think it's probably more important than mutual recursion. Jeff * UNION ALL only: Jeff with recursive Jeff foo(i) as (values(1) union select i+1 from foo where i 10) Jeff select * from foo; Jeff ERROR: non-recursive term and recursive term must be combined with Jeff UNION ALL Jeff The standard seems to allow UNION ALL, UNION, INTERSECT, and Jeff EXCEPT (when the recursive term is not on the RHS of the Jeff EXCEPT). Again, existing implementations disagree. See above for DB2, and for MSSQL: And again, I agree that it's not important for 8.4. At some point we need to determine what the goalposts are though. Are we copying existing implementations, or are we implementing the standard? Jeff Produces 10 rows of output regardless of what X is. This Jeff should be fixed for 8.4. Also, this is non-linear recursion, Jeff which the standard seems to disallow. That looks like it should be disallowed somehow. Agreed. I think it should just throw an error, probably. [snip * Strange result with except: which looks like a bug] Jeff * Aggregates allowed: which Jeff with recursive foo(i) as Jeff (values(1) Jeff union all Jeff select max(i)+1 from foo where i 10) Jeff select * from foo; Jeff Aggregates should be blocked according to the standard. Jeff Also, causes an infinite loop. This should be fixed for 8.4. Does the standard require anywhere that non-conforming statements must be diagnosed? (seems impractical, since it would forbid extensions) 2.g.iii.4.B explicitly says aggregates should be rejected, unless I have misinterpreted. Yeah, though the standard's use of DISTINCT in this way is something of a violation of the POLA. I agree that's kind of a funny requirement. But that's pretty typical of the SQL standard. If DB2 or SQL Server follow the standard here, we should, too. If not, it's open for discussion. No. This has already been discussed; it's neither possible nor desirable to diagnose all cases which can result in infinite loops, and there are important types of queries which would be unnecessarily forbidden. I didn't say we should forbid all infinite loops. But we should forbid ones that the standard tells us to forbid. Besides, you've misread the spec here: it prohibits the recursive reference ONLY on the nullable side of the join. You cite: Thank you for the correction. It does properly reject the outer joins that the standard says should be rejected. Jeff * ORDER BY, LIMIT, and OFFSET are rejected for recursive Jeff queries. The standard does not seem to say that these should be Jeff rejected. Note that supporting those in subqueries (including CTEs) is a separate optional feature of the standard. I don't feel strongly about this either way, but I prefer that we are consistent when possible. We do support these things in a subquery, so shouldn't we support them in all subqueries? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Jeff == Jeff Davis [EMAIL PROTECTED] writes: Jeff Aggregates should be blocked according to the standard. Jeff Also, causes an infinite loop. This should be fixed for 8.4. Does the standard require anywhere that non-conforming statements must be diagnosed? (seems impractical, since it would forbid extensions) Jeff 2.g.iii.4.B explicitly says aggregates should be rejected, Jeff unless I have misinterpreted. Yes, you've misinterpreted. When the spec says that a query shall not do such-and-such, it means that a conforming client isn't allowed to do that, it does NOT mean that the server is required to produce an error when it sees it. Chapter and verse on this is given in the Framework doc, at 6.3.3.2: In the Syntax Rules, the term shall defines conditions that are required to be true of syntactically conforming SQL language. When such conditions depend on the contents of one or more schemas, they are required to be true just before the actions specified by the General Rules are performed. The treatment of language that does not conform to the SQL Formats and Syntax Rules is implementation-dependent. If any condition required by Syntax Rules is not satisfied when the evaluation of Access or General Rules is attempted and the implementation is neither processing non-conforming SQL language nor processing conforming SQL language in a non-conforming manner, then an exception condition is raised: syntax error or access rule violation. Including an aggregate violates a shall in a syntax rule, therefore the query is non-conforming, therefore the server can either process it in an implementation-dependent manner or reject it with an exception. Yeah, though the standard's use of DISTINCT in this way is something of a violation of the POLA. Jeff I agree that's kind of a funny requirement. But that's pretty Jeff typical of the SQL standard. If DB2 or SQL Server follow the Jeff standard here, we should, too. If not, it's open for discussion. MSSQL does not: The following items are not allowed in the CTE_query_definition of a recursive member: * SELECT DISTINCT * GROUP BY * HAVING * Scalar aggregation * TOP * LEFT, RIGHT, OUTER JOIN (INNER JOIN is allowed) * Subqueries * A hint applied to a recursive reference to a CTE inside a CTE_query_definition. For DB2 the docs do not appear to specify either way; they don't seem to forbid the use of SELECT DISTINCT inside a recursive CTE, but neither do they seem to mention any unusual effect of including it. Jeff * ORDER BY, LIMIT, and OFFSET are rejected for recursive Jeff queries. The standard does not seem to say that these should be Jeff rejected. Note that supporting those in subqueries (including CTEs) is a separate optional feature of the standard. Jeff I don't feel strongly about this either way, but I prefer that we Jeff are consistent when possible. We do support these things in a Jeff subquery, so shouldn't we support them in all subqueries? Ideally we should. DB2 appears to (other than OFFSET which it doesn't seem to have at all). But it's not at all clear that it would be either useful or easy to do so. -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, 2008-09-08 at 22:53 +0100, Andrew Gierth wrote: Yes, you've misinterpreted. When the spec says that a query shall not do such-and-such, it means that a conforming client isn't allowed to do that, it does NOT mean that the server is required to produce an error when it sees it. Interesting. Thanks for the clear reference. So, we either need to reject it or define some implementation-dependent behavior, right? The patch currently rejects HAVING, which means that it's a little difficult to use an aggregate effectively. I lean toward rejecting aggregates if we reject HAVING, for consistency. If we allow it, we should allow HAVING as well. Jeff I agree that's kind of a funny requirement. But that's pretty Jeff typical of the SQL standard. If DB2 or SQL Server follow the Jeff standard here, we should, too. If not, it's open for discussion. MSSQL does not: Thanks again for a reference. If MSSQL rejects SELECT DISTINCT for a recursive query expression, then I think we should reject it. Right now the patch allows it but provides a result that is inconsistent with the standard. If we reject SELECT DISTINCT for recursive queries now, we can always meet the standard later, or decide that the standard behavior is too likely to cause confusion and just continue to block it. Ideally we should. DB2 appears to (other than OFFSET which it doesn't seem to have at all). But it's not at all clear that it would be either useful or easy to do so. Ok. If it's easy to support it should probably be done. As an aside, you seem to be an expert with the standard, have you had a chance to look at the question I asked earlier?: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00487.php Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Thanks for the review. [I dropped [EMAIL PROTECTED] from the Cc list since he has left our company and the email address is being deleted.] I'm going to look into issues which are seem to be bug (of course if you know what to fix, patches are always welcome:-). These are my initial comments about the Common Table Expressions (CTE) patch, also known as WITH [RECURSIVE]. These comments are based on the patch here: http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php This is a semantically complex feature, and the standard is fairly complex as well. So I'm approaching this by making my own interpretations from the standard first (I included my interpretations and section references at the end of this email) and comparing to the behavior of the patch. The following examples may be inconsistent with the standard. Some have already been mentioned, and I don't think they all need to be fixed for 8.4, but I mention them here for completeness. * Mutual Recursion: with recursive foo(i) as (values(1) union all select i+1 from bar where i 10), bar(i) as (values(1) union all select i+1 from foo where i 10) select * from foo; ERROR: mutual recursive call is not supported The standard allows mutual recursion. The discussion seems to agree that let it leave for post 8.4. * Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? * RHS only: with recursive foo(i) as (select i+1 from foo where i 10 union all values(1)) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query The standard does not require that the recursive term be on the RHS. The discussion seems to agree that let it leave for post 8.4. * UNION ALL only: with recursive foo(i) as (values(1) union select i+1 from foo where i 10) select * from foo; ERROR: non-recursive term and recursive term must be combined with UNION ALL The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT (when the recursive term is not on the RHS of the EXCEPT). The discussion seems to agree that let it leave for post 8.4. * Binary recursion and subselect strangeness: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 union all select i+1 from foo where i X) t) select * from foo; Produces 10 rows of output regardless of what X is. This should be fixed for 8.4. Also, this is non-linear recursion, which the standard seems to disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. * Multiple recursive references: with recursive foo(i) as (values (1) union all select i+1 from foo where i 10 union all select i+1 from foo where i 20) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid case. I will try to disallow this. * Strange result with except: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 except select i+1 from foo where i 5) t) select * from foo; ERROR: table foo has 0 columns available but 1 columns specified This query works if you replace except with union. This should be fixed for 8.4. I will try to fix this. * Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Thanks for the review. [I dropped [EMAIL PROTECTED] from the Cc list since he has left our company and the email address is being deleted.] I'm going to look into issues which are seem to be bug (of course if you know what to fix, patches are always welcome:-). These are my initial comments about the Common Table Expressions (CTE) patch, also known as WITH [RECURSIVE]. These comments are based on the patch here: http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php This is a semantically complex feature, and the standard is fairly complex as well. So I'm approaching this by making my own interpretations from the standard first (I included my interpretations and section references at the end of this email) and comparing to the behavior of the patch. The following examples may be inconsistent with the standard. Some have already been mentioned, and I don't think they all need to be fixed for 8.4, but I mention them here for completeness. * Mutual Recursion: with recursive foo(i) as (values(1) union all select i+1 from bar where i 10), bar(i) as (values(1) union all select i+1 from foo where i 10) select * from foo; ERROR: mutual recursive call is not supported The standard allows mutual recursion. The discussion seems to agree that let it leave for post 8.4. * Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's an easy way to fix this as Tom suggested. Maybe we should not allow WITH clause without RECURISVE for 8.4? * RHS only: with recursive foo(i) as (select i+1 from foo where i 10 union all values(1)) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query The standard does not require that the recursive term be on the RHS. The discussion seems to agree that let it leave for post 8.4. * UNION ALL only: with recursive foo(i) as (values(1) union select i+1 from foo where i 10) select * from foo; ERROR: non-recursive term and recursive term must be combined with UNION ALL The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT (when the recursive term is not on the RHS of the EXCEPT). The discussion seems to agree that let it leave for post 8.4. * Binary recursion and subselect strangeness: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 union all select i+1 from foo where i X) t) select * from foo; Produces 10 rows of output regardless of what X is. This should be fixed for 8.4. Also, this is non-linear recursion, which the standard seems to disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. * Multiple recursive references: with recursive foo(i) as (values (1) union all select i+1 from foo where i 10 union all select i+1 from foo where i 20) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid case. I will try to disallow this. * Strange result with except: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 except select i+1 from foo where i 5) t) select * from foo; ERROR: table foo has 0 columns available but 1 columns specified This query works if you replace except with union. This should be fixed for 8.4. I will try to fix this. * Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
* Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? ISTM that kind of misses the point. Even if it's WITH RECURSIVE rather than simply WITH, one wouldn't expect multiple evaluations of any non-recursive portion of the query. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers