Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Ladislav Lenart Tue, 04 Jun 2013 09:29:51 -0700

Hello.

> OK, but with yield_per() you want to use eagerloading also, so yield_per()
> not fast enough either, I guess....


No. I use yield_per() on complex queries with join(), filter() and both
joinedload() and subqueryload(). It is possible that they sometimes returns
wrong results because of yield_per(). I am not sure about that, but it is
definitely much faster than the original WindowedRangeQuery recipe. I can only
speculate that postgres caches subqueryload results...


> Absolutely, you should do whatever you have to in order to get the range
> you want, in fact the recipe even says this...

Ok. What I want to do is basically the following:
* Suppose a complex query q with join(...), filter(...) and options(...).
* I need to create q2 from q such that:
  * It has no options.
    * Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery recipe.
    * I know I can use with_entities() for this.
* I will use q2 to obtain the window ranges.
* I will iterate over the window ranges and apply each to the original q and
execute it in a loop.

Can this strategy work?


Thank you,

Ladislav Lenart


On 4.6.2013 17:57, Michael Bayer wrote:
> 
> On Jun 4, 2013, at 11:41 AM, Ladislav Lenart <[email protected]> wrote:
> 
>> Hello.
>>
>>> You will then get the wrong results.   The docstring tries to explain this -
>>> a joinedload uses a JOIN.  For each "cls" instance, there are many rows, one
>>> for each "bar".  If you cut off the results in the middle of populating that
>>> collection, the collection is incomplete, you'll see the wrong collection on
>>> your cls.bars.  On the next load, cls.bars will be wiped out and populated
>>> with the remaining "bar" objects.
>>
>> Ok, I think I understand this too.
>>
>> I've tried WindowedRangeQuery. It looked promising at first but it is (much)
>> slower than yield_per() with all its quirks, at least for my usecase.
> 
> OK, but with yield_per() you want to use eagerloading also, so yield_per() 
> not fast enough either, I guess....
> 
> 
>> If I
>> understand the WindowedRangeQuery recipe, it does a full scan of the target
>> table first to read all the ids and calculate the bounds of all the windows. 
>> I
>> don't want to it like this. I am working with relatively large datasets but 
>> it
>> is still far less than all rows in the table. Something like 10-50000 rows 
>> from
>> a table with 1-2 million rows. The windowed query iterates over many 
>> completely
>> empty windows.
>>
>> Can I modify the recipe so it preserves the filtering and creates windows 
>> only
>> for the interesting subset of the table?
> 
> Absolutely, you should do whatever you have to in order to get the range you 
> want, in fact the recipe even says this:
> 
>  Enhance this yourself !  Add a "where" argument
>     so that windows of just a subset of rows can
>     be computed.
> 
> if your situation is even simpler than that, such as just querying from PKs 
> 50-1000, you could just make up your own integer ranges within those two 
> endpoints.   

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Reply via email to