On 07/15/2016 05:14 PM, Martijn van Oosterhout wrote:

On 15 July 2016 at 18:46, Mike Bayer <[email protected]
<mailto:[email protected]>> wrote:


    Here's the problem that cant be solved:

    1. fetch rows 0-1000

    2. start loading Foo objects:

         Foo(1) ->  eagerload Foo.related with 400 members
         Foo(2) ->  eagerload Foo.related with 500 members
         Foo(3) -> eagerload Foo.related with the first 100 of 250 members

    3. yield.  That is, stop reading more rows.  Send out Foo(1),
    Foo(2), Foo(3).  The application now can go nuts with these.  It can
    read them, write them, iterate through .related.

    Right there, we're broken.  Foo(3) is out in the world with less
    than half of its ".related" collection - they are still sitting on
    the cursor!


Ok, I think I see what you're getting at but I don't think that a
problem here. It looks like you have a query which joins all the
relationships in one go. That's not what I'm doing, I'm relying on the
fact that I can interleave the queries. On SQL level it would look like so:

DECLARE my_cursor CURSOR FOR SELECT * FROM bigtable;
FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (1,2,3,4,5,....)
SELECT * FROM related_table2 WHERE bigtable_id IN (1,2,3,4,5,....)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (1,2,3,4,5,....)

FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (11,12,13,14,15,....)
SELECT * FROM related_table2 WHERE bigtable_id IN (11,12,13,14,15,....)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (11,12,13,14,15,....)

<... repeat ...>
(this may be a postgresql specific feature, not sure).

Yes, this may mean that some of the related objects may be fetched
multiple times, but this is not a problem for me (the whole thing is
read-only anyway). What I need however is for the secondary queries to
populate the relationships in the original BigTable objects.

    Vastly simpler is to use the recommended window functions to do
    pagination of any scale with no chance of complex failures.


A bit of googling suggests you are referring to this:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
which indeed looks very neat and much easier than what I was thinking.
It doesn't say explicitly, but it looks like it will work transparently
with eager loading. It basically does the above, but skips the cursor
and replaces it with queries on ranges of the primary key (which is
shorter and probably more efficient to boot).

OK, but the IN loading you're doing, that's good too, I'd like to add a loader which uses that someday, and you're right, if we did, we could make it work with yield_per too due to the nature of it. If you want to use that approach take a look at https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading.


Thanks for the tip!
--
Martijn van Oosterhout <[email protected] <mailto:[email protected]>>
http://svana.org/kleptog/

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to