Re: [sqlite] Limiting the result set size at each recursive step in a CTE

2019-05-08 Thread R Smith

On 2019/05/07 7:57 PM, Thomas Zimmermann wrote:

Hi!

Sometimes it is desirable to limit the size of the queue¹ in a 
recursive CTE//...



CREATE TABLE comment (
    comment_id INTEGER PRIMARY KEY,
    parent_comment_id INTEGER REFERENCES comment (comment_id),
    created_at INTEGER NOT NULL -- timestamp, bigger means newer
);
CREATE INDEX comment_hierarchy ON comment (parent_comment_id, 
created_at DESC);


WITH RECURSIVE //...


I would be very interested in a general solution that still allows for 
the adjacency list design,

but I'm open to denormalization. :)



It's tricky, but there is a solution now that Window-functions have 
joined the fray.


Essentially what we need is to first extract from the table a list of 
all the sub comments by parent comment, but while the query order is not 
helpful, we can partition by the parent-comment-ID's and number the rows 
(which we CAN apply a sort-order to thanks to the Window function 
methodology), so the first CTE does just that.


The next step is to list the origin rows (which have parent_id = NULL) 
and then amend to them every sub-comment belonging to them, but only 
where the created row-order ID is less than 100 (or whatever value you 
pick).


Lastly, we only use full sorting order in the very final query (outside 
the CTE's) according to main comment dates and sub-comment id's, which 
would make things fastest. This solution will work universally for all 
similar types of tree-queries.


While this compiles/runs, I have not been able to test this with data 
because you gave no data and I was too lazy to make up data, but I'm 
pretty sure it would work. If it doesn't, please send some data and 
expected outcome, then we can fix it.


WITH RECURSIVE
    comments_by_parent(subid, comment_id, parent_comment_id, 
created_at) AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY c.parent_comment_id 
ORDER BY c.created_at DESC) AS subid,

   c.comment_id, c.parent_comment_id, c.created_at
  FROM comment AS c
 WHERE c.parent_comment_id IS NOT NULL
),  sorted_comment(comment_id, created_at, sub_comment_id, 
sub_created_at, depth, idx) AS (

    SELECT comment_id, created_at, NULL, NULL, 0, 0
    FROM (SELECT comment_id, created_at
        FROM comment
       WHERE parent_comment_id IS NULL
       ORDER BY created_at DESC
       LIMIT 100
         )
    UNION ALL
    SELECT sc.comment_id, sc.created_at, cp.comment_id, 
cp.created_at, sc.depth + 1, cp.subid

  FROM sorted_comment AS sc
  JOIN comments_by_parent AS cp ON cp.parent_comment_id = 
sc.comment_id

 WHERE cp.subid < 100
)
SELECT comment_id, created_at, sub_comment_id, sub_created_at, depth, idx
  FROM sorted_comment
 ORDER BY created_at, comment_id, idx
;


Good luck!
Ryan




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Limiting the result set size at each recursive step in a CTE

2019-05-07 Thread Thomas Zimmermann

Hi!

Sometimes it is desirable to limit the size of the queue¹ in a recursive 
CTE when the
domain allows for it. That puts an upper bound on processing time, even 
when the given table is huge.
This can be achieved by adding a LIMIT clause at every step (setup and 
recursive).
But there is my problem: The setup step allows subqueries, but the 
recursive step doesn't.

Is there a general solution available?

As a concrete example, consider the following table for a commenting system:

CREATE TABLE comment (
    comment_id INTEGER PRIMARY KEY,
    parent_comment_id INTEGER REFERENCES comment (comment_id),
    created_at INTEGER NOT NULL -- timestamp, bigger means newer
);
CREATE INDEX comment_hierarchy ON comment (parent_comment_id, created_at 
DESC);


The comments should be arranged by hierarchy and creation date, newest 
first.

Only up to 100 comments should be shown. Example:

Comment 2 (2019-05-03)
- Comment 3 (2019-05-04)
- Comment 4 (2019-05-05)
-- Comment 5 (2019-05-05)
Comment 1 (2019-05-02)

The following query should accomplish this, but the optimization isn't 
allowed (see comments inline):


WITH RECURSIVE
    sorted_comment(comment_id, parent_comment_id, created_at, depth) AS (
        -- Start with all comments at the root level.
        -- Optimization: Only consider the latest 100 comments, since 
that's the maximum amount we could pick from this level.
        -- This guarantees good performance even when there are 
millions of comments at the root level.

    SELECT *
    FROM (
        SELECT comment_id, parent_comment_id, created_at, 0 AS depth
        FROM comment
        WHERE parent_comment_id IS NULL
        ORDER BY created_at DESC
        LIMIT 100
    )

    UNION ALL

    -- Find all direct descendants of the current comment from the 
queue.
    -- Same optimization: There might be many comments at this 
level, but we could oly ever consider up to the latest 100.
    -- (Actually, we only need to consider (100 - 
COUNT(sorted_comment)), but let's ignore that for now.)
    -- NOTE: This doesn't actually work, Error: recursive reference 
in a subquery: sorted_comment

    SELECT *
    FROM (
    SELECT c.comment_id, c.parent_comment_id, c.created_at, 
sc.depth + 1 AS depth

    FROM comment AS c
    JOIN sorted_comment AS sc
    ON c.parent_comment_id = sc.comment_id
    ORDER BY created_at DESC
    LIMIT 100
    )
    -- Do a depth-first search, picking the latest comment from the 
queue.

    ORDER BY depth DESC, created_at DESC
    LIMIT 100
    )
SELECT comment_id, parent_comment_id, created_at, depth
FROM sorted_comment;

I would be very interested in a general solution that still allows for 
the adjacency list design,

but I'm open to denormalization. :)

Kind regards,
Thomas

¹ As defined in the documentation here: https://sqlite.org/lang_with.html
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users