Hi!

Sometimes it is desirable to limit the size of the queue¹ in a recursive CTE when the domain allows for it. That puts an upper bound on processing time, even when the given table is huge. This can be achieved by adding a LIMIT clause at every step (setup and recursive). But there is my problem: The setup step allows subqueries, but the recursive step doesn't.
Is there a general solution available?

As a concrete example, consider the following table for a commenting system:

CREATE TABLE comment (
    comment_id INTEGER PRIMARY KEY,
    parent_comment_id INTEGER REFERENCES comment (comment_id),
    created_at INTEGER NOT NULL -- timestamp, bigger means newer
);
CREATE INDEX comment_hierarchy ON comment (parent_comment_id, created_at DESC);

The comments should be arranged by hierarchy and creation date, newest first.
Only up to 100 comments should be shown. Example:

Comment 2 (2019-05-03)
- Comment 3 (2019-05-04)
- Comment 4 (2019-05-05)
-- Comment 5 (2019-05-05)
Comment 1 (2019-05-02)

The following query should accomplish this, but the optimization isn't allowed (see comments inline):

WITH RECURSIVE
    sorted_comment(comment_id, parent_comment_id, created_at, depth) AS (
        -- Start with all comments at the root level.
        -- Optimization: Only consider the latest 100 comments, since that's the maximum amount we could pick from this level.         -- This guarantees good performance even when there are millions of comments at the root level.
        SELECT *
        FROM (
            SELECT comment_id, parent_comment_id, created_at, 0 AS depth
            FROM comment
            WHERE parent_comment_id IS NULL
            ORDER BY created_at DESC
            LIMIT 100
        )

        UNION ALL

        -- Find all direct descendants of the current comment from the queue.         -- Same optimization: There might be many comments at this level, but we could oly ever consider up to the latest 100.         -- (Actually, we only need to consider (100 - COUNT(sorted_comment)), but let's ignore that for now.)         -- NOTE: This doesn't actually work, Error: recursive reference in a subquery: sorted_comment
        SELECT *
        FROM (
            SELECT c.comment_id, c.parent_comment_id, c.created_at, sc.depth + 1 AS depth
            FROM comment AS c
            JOIN sorted_comment AS sc
            ON c.parent_comment_id = sc.comment_id
            ORDER BY created_at DESC
            LIMIT 100
        )
        -- Do a depth-first search, picking the latest comment from the queue.
        ORDER BY depth DESC, created_at DESC
        LIMIT 100
    )
SELECT comment_id, parent_comment_id, created_at, depth
FROM sorted_comment;

I would be very interested in a general solution that still allows for the adjacency list design,
but I'm open to denormalization. :)

Kind regards,
Thomas

¹ As defined in the documentation here: https://sqlite.org/lang_with.html
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to