Re: [sqlite] Limiting the result set size at each recursive step in a CTE
On 2019/05/07 7:57 PM, Thomas Zimmermann wrote: Hi! Sometimes it is desirable to limit the size of the queue¹ in a recursive CTE//... CREATE TABLE comment ( comment_id INTEGER PRIMARY KEY, parent_comment_id INTEGER REFERENCES comment (comment_id), created_at INTEGER NOT NULL -- timestamp, bigger means newer ); CREATE INDEX comment_hierarchy ON comment (parent_comment_id, created_at DESC); WITH RECURSIVE //... I would be very interested in a general solution that still allows for the adjacency list design, but I'm open to denormalization. :) It's tricky, but there is a solution now that Window-functions have joined the fray. Essentially what we need is to first extract from the table a list of all the sub comments by parent comment, but while the query order is not helpful, we can partition by the parent-comment-ID's and number the rows (which we CAN apply a sort-order to thanks to the Window function methodology), so the first CTE does just that. The next step is to list the origin rows (which have parent_id = NULL) and then amend to them every sub-comment belonging to them, but only where the created row-order ID is less than 100 (or whatever value you pick). Lastly, we only use full sorting order in the very final query (outside the CTE's) according to main comment dates and sub-comment id's, which would make things fastest. This solution will work universally for all similar types of tree-queries. While this compiles/runs, I have not been able to test this with data because you gave no data and I was too lazy to make up data, but I'm pretty sure it would work. If it doesn't, please send some data and expected outcome, then we can fix it. WITH RECURSIVE comments_by_parent(subid, comment_id, parent_comment_id, created_at) AS ( SELECT ROW_NUMBER() OVER (PARTITION BY c.parent_comment_id ORDER BY c.created_at DESC) AS subid, c.comment_id, c.parent_comment_id, c.created_at FROM comment AS c WHERE c.parent_comment_id IS NOT NULL ), sorted_comment(comment_id, created_at, sub_comment_id, sub_created_at, depth, idx) AS ( SELECT comment_id, created_at, NULL, NULL, 0, 0 FROM (SELECT comment_id, created_at FROM comment WHERE parent_comment_id IS NULL ORDER BY created_at DESC LIMIT 100 ) UNION ALL SELECT sc.comment_id, sc.created_at, cp.comment_id, cp.created_at, sc.depth + 1, cp.subid FROM sorted_comment AS sc JOIN comments_by_parent AS cp ON cp.parent_comment_id = sc.comment_id WHERE cp.subid < 100 ) SELECT comment_id, created_at, sub_comment_id, sub_created_at, depth, idx FROM sorted_comment ORDER BY created_at, comment_id, idx ; Good luck! Ryan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Limiting the result set size at each recursive step in a CTE
Hi! Sometimes it is desirable to limit the size of the queue¹ in a recursive CTE when the domain allows for it. That puts an upper bound on processing time, even when the given table is huge. This can be achieved by adding a LIMIT clause at every step (setup and recursive). But there is my problem: The setup step allows subqueries, but the recursive step doesn't. Is there a general solution available? As a concrete example, consider the following table for a commenting system: CREATE TABLE comment ( comment_id INTEGER PRIMARY KEY, parent_comment_id INTEGER REFERENCES comment (comment_id), created_at INTEGER NOT NULL -- timestamp, bigger means newer ); CREATE INDEX comment_hierarchy ON comment (parent_comment_id, created_at DESC); The comments should be arranged by hierarchy and creation date, newest first. Only up to 100 comments should be shown. Example: Comment 2 (2019-05-03) - Comment 3 (2019-05-04) - Comment 4 (2019-05-05) -- Comment 5 (2019-05-05) Comment 1 (2019-05-02) The following query should accomplish this, but the optimization isn't allowed (see comments inline): WITH RECURSIVE sorted_comment(comment_id, parent_comment_id, created_at, depth) AS ( -- Start with all comments at the root level. -- Optimization: Only consider the latest 100 comments, since that's the maximum amount we could pick from this level. -- This guarantees good performance even when there are millions of comments at the root level. SELECT * FROM ( SELECT comment_id, parent_comment_id, created_at, 0 AS depth FROM comment WHERE parent_comment_id IS NULL ORDER BY created_at DESC LIMIT 100 ) UNION ALL -- Find all direct descendants of the current comment from the queue. -- Same optimization: There might be many comments at this level, but we could oly ever consider up to the latest 100. -- (Actually, we only need to consider (100 - COUNT(sorted_comment)), but let's ignore that for now.) -- NOTE: This doesn't actually work, Error: recursive reference in a subquery: sorted_comment SELECT * FROM ( SELECT c.comment_id, c.parent_comment_id, c.created_at, sc.depth + 1 AS depth FROM comment AS c JOIN sorted_comment AS sc ON c.parent_comment_id = sc.comment_id ORDER BY created_at DESC LIMIT 100 ) -- Do a depth-first search, picking the latest comment from the queue. ORDER BY depth DESC, created_at DESC LIMIT 100 ) SELECT comment_id, parent_comment_id, created_at, depth FROM sorted_comment; I would be very interested in a general solution that still allows for the adjacency list design, but I'm open to denormalization. :) Kind regards, Thomas ¹ As defined in the documentation here: https://sqlite.org/lang_with.html ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users