subject:"\"Automatic prefetching in querysets\""

Re: Automatic prefetching in querysets

2020-03-25 Thread Gordon Wrigley

My existing code for this is now available as a pypi package
https://github.com/tolomea/django-auto-prefetch 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/aaf633e3-a099-4cb4-aad7-10cfe6b11596%40googlegroups.com.

Re: Automatic prefetching in querysets

2017-09-12 Thread Gordon Wrigley

This received some positive responses, so to help move the conversation
along I have created a ticket and pull request.

https://code.djangoproject.com/ticket/28586
https://github.com/django/django/pull/9064

Regards G

On Tue, Aug 15, 2017 at 10:44 AM, Gordon Wrigley 
wrote:

> I'd like to discuss automatic prefetching in querysets. Specifically
> automatically doing prefetch_related where needed without the user having
> to request it.
>
> For context consider these three snippets using the Question & Choice
> models from the tutorial
> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
> when
> there are 100 questions each with 5 choices for a total of 500 choices.
>
> Default
> for choice in Choice.objects.all():
> print(choice.question.question_text, ':', choice.choice_text)
> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>
> Prefetch_related
> for choice in Choice.objects.prefetch_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>
> Select_related
> for choice in Choice.objects.select_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>
> I've included select_related for completeness, I'm not going to propose
> changing anything about it's use. There are places where it is the best
> choice and in those places it will still be up to the user to request it. I
> will note that anywhere select_related is optimal prefetch_related is still
> better than the default and leave it at that.
>
> The 'Default' example above is a classic example of the N+1 query problem,
> a problem that is widespread in Django apps.
> This pattern of queries is what new users produce because they don't know
> enough about the database and / or ORM to do otherwise.
> Experieced users will also often produce this because it's not always
> obvious what fields will and won't be used and subsequently what should be
> prefetched.
> Additionally that list will change over time. A small change to a template
> to display an extra field can result in a denial of service on your DB due
> to a missing prefetch.
> Identifying missing prefetches is fiddly, time consuming and error prone.
> Tools like django-perf-rec <https://github.com/YPlan/django-perf-rec>
> (which I was involved in creating) and nplusone
> <https://github.com/jmcarp/nplusone> exist in part to flag missing
> prefetches introduced by changed code.
> Finally libraries like Django Rest Framework and the Admin will also
> produce queries like this because it's very difficult for them to know what
> needs prefetching without being explicitly told by an experienced user.
>
> As hinted at the top I'd like to propose changing Django so the default
> code behaves like the prefetch_related code.
> Longer term I think this should be the default behaviour but obviously it
> needs to be proved first so for now I'd suggest a new queryset function
> that enables this behaviour.
>
> I have a proof of concept of this mechanism that I've used successfully in
> production. I'm not posting it yet because I'd like to focus on desired
> behavior rather than implementation details. But in summary, what it does
> is when accessing a missing field on a model, rather than fetching it just
> for that instance, it runs a prefetch_related query to fetch it for all
> peer instances that were fetched in the same queryset. So in the example
> above it prefetches all Questions in one query.
>
> This might seem like a risky thing to do but I'd argue that it really
> isn't.
> The only time this isn't superior to the default case is when you are post
> filtering the queryset results in Python.
> Even in that case it's only inferior if you started with a large number of
> results, filtered basically all of them and the code is structured so that
> the filtered ones aren't garbage collected.
> To cover this rare case the automatic prefetching can easily be disabled
> on a per queryset or per object basis. Leaving us with a rare downside that
> can easily be manually resolved in exchange for a significant general
> improvement.
>
> In practice this thing is almost magical to work with. Unless you already
> have extensive and tightly maintained prefetches everywhere you get an
> immediate boost to virtually everything that touches the database, often
> knocking orders of magnitude off page load times.
>
> If an agreement ca

Re: Automatic prefetching in querysets

2017-08-30 Thread Patryk Zawadzki

W dniu środa, 16 sierpnia 2017 14:48:54 UTC+2 użytkownik Luke Plant napisał:
>
> I completely agree that visibility of this problem is a major issue, and 
> would really welcome work on improving this, especially in DEBUG mode. One 
> option might be a method that replaces lazy loads with exceptions. This 
> would force you to add the required `prefetch_related` or `select_related` 
> calls. You could have:
>
> .lazy_load(None)   # exception on accessing any non-loaded FK objects
>
> .lazy_load('my_fk1', 'my_fk2')  # exceptions on accessing any unloaded FK 
> objects apart from the named ones
>
> .lazy_load('__any__')   # cancel the above, restore default behaviour
>
> This (or something like it) would be a different way to tackle the problem 
> - just throwing it out. You could have a Meta option to do 
> `.lazy_load(None)` by default for a Model.  I've no idea how practical it 
> would be to wire this all up so that is works correctly, and with nested 
> objects etc.
>

Would love if Django could be told to require explicit data fetches. This 
and replacing QuerySet.__iter__ with an explicit fetch() (and lazy_fetch()) 
would make it much easier to reason about large codebases.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1fa90611-66cc-4d06-992c-55164913679d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-21 Thread Shai Berger

On Monday 21 August 2017 18:44:35 Tobias McNulty wrote:
> On Sat, Aug 19, 2017 at 1:10 PM, Luke Plant  wrote:
> > This could work something like the way that ForeignKey `on_delete` works
> > - you have options that are enumerated as constants, but in reality they
> > are classes that embody the strategy to be used. We could have something
> > similar - `on_missing_relation=FETCH | WARN | ERROR | IGNORE ... `
> 
> I like this a lot. It (1) caters (I think) to many if not all of the
> behavior preferences expressed in this thread, (2) mimics an existing API
> we support, (3) allows projects to gradually add/try the feature, and (4)
> permits but does not require a deprecation path for changing the default
> behavior in the future. I'm assuming `FETCH` represents the current
> behavior and not auto-detection of a missing relation and modification of
> the associated QuerySet.
> 

I agree that the idea sounds nice, with one exception: It seems like this 
would make sense, mostly, at the foreign-key-definition level, which means the 
feature is applied at model scope -- altough it isn't quite clear. However, 
the main argument so far was exactly about the right scope to use, with four 
different suggestions as far as I could see (queryset, model, context-manager 
and a global setting), each with its own pros and cons. So, it feels a little 
like going off the main thread of discussion.

> One alternative thought: Could we define two `ForeignKey` options (again
> using the `on_delete` analogy) which support adding the relation to
> select_related()/prefetch_related() all the time (e.g.,
> `ForeignKey(prefetch_related=ALWAYS | MANUALLY | WARN | ERROR)` and/or
> `ForeignKey(select_related=ALWAYS | MANUALLY | WARN | ERROR)`, where
> MANUALLY represents the current behavior)? One could then use `.only()` or
> `.values*()` to avoid fetching the related model, if needed.
> 

This is truly, clearly off-topic here; I just want to point out that the effect 
can already be achieved by overriding the model's default manager's 
get_queryset(). The idea might have merits in terms of maintainability and 
readability, but I'd much rather it be discussed separately.

Shai

Re: Automatic prefetching in querysets

2017-08-21 Thread Tobias McNulty

On Sat, Aug 19, 2017 at 1:10 PM, Luke Plant  wrote:

> This could work something like the way that ForeignKey `on_delete` works -
> you have options that are enumerated as constants, but in reality they are
> classes that embody the strategy to be used. We could have something
> similar - `on_missing_relation=FETCH | WARN | ERROR | IGNORE ... `
>
I like this a lot. It (1) caters (I think) to many if not all of the
behavior preferences expressed in this thread, (2) mimics an existing API
we support, (3) allows projects to gradually add/try the feature, and (4)
permits but does not require a deprecation path for changing the default
behavior in the future. I'm assuming `FETCH` represents the current
behavior and not auto-detection of a missing relation and modification of
the associated QuerySet.

One alternative thought: Could we define two `ForeignKey` options (again
using the `on_delete` analogy) which support adding the relation to
select_related()/prefetch_related() all the time (e.g.,
`ForeignKey(prefetch_related=ALWAYS | MANUALLY | WARN | ERROR)` and/or
`ForeignKey(select_related=ALWAYS | MANUALLY | WARN | ERROR)`, where
MANUALLY represents the current behavior)? One could then use `.only()` or
`.values*()` to avoid fetching the related model, if needed.


*Tobias McNulty*Chief Executive Officer

tob...@caktusgroup.com
www.caktusgroup.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMGFDKQTEO%3DZug%3DCbrugX-mYV2wFAa3t7YHEKSUgtkAsJ_yX9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-19 Thread Luke Plant

This could work something like the way that ForeignKey `on_delete` works
- you have options that are enumerated as constants, but in reality they
are classes that embody the strategy to be used. We could have something
similar - `on_missing_relation=FETCH | WARN | ERROR | IGNORE ... `

Luke

On 18/08/17 14:01, Shai Berger wrote:

On Friday 18 August 2017 09:08:11 Anssi Kääriäinen wrote:

Maybe we should just add the queryset method. This is the smallest atomic
task that can be done. Even if there's only the queryset method available,
it's possible to enable prefetches per model by using a Manager.

I disagree on both counts: I don't think it's the smallest atomic task, and
I'm not so sure it's the right thing to do.

The smallest atomic task, the way I see it, is building the infrastructure
that would allow adding the queryset method -- but would also allow different
APIs to be set around it.

And since there is as yet no consensus on the correct API for "end" users, I
would rather not define one immediately.

Shai.

--
You received this message because you are subscribed to the Google Groups "Django
developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/ba0ddd74-5240-757f-e6ea-3300f2ba3a61%40cantab.net.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-19 Thread Todor Velichkov

+1 to just add the queryset method it gives a fine granular level of 
control. Moreover when not used we could display warnings which can help 
people detect these O(n) query cases as Tom Forbes initially suggested.

On Friday, August 18, 2017 at 9:08:11 AM UTC+3, Anssi Kääriäinen wrote:
>
> On Thursday, August 17, 2017 at 11:30:45 PM UTC+3, Aymeric Augustin wrote:
>>
>> Hello, 
>>
>> > On 17 Aug 2017, at 14:48, Luke Plant  wrote: 
>> > 
>> > On 16/08/17 23:17, Aymeric Augustin wrote: 
>> >> It should kick in only when no select_related or prefetch_related is 
>> in effect, to avoid interfering with pre-existing optimizations. It's still 
>> easy to construct an example where it would degrade performance but I don't 
>> think such situations will be common in practice. 
>> > 
>> > I think Ansii's example is a non-contrived and potentially very common 
>> one where we will make things worse if this is default behaviour, 
>> especially for the non-expert programmer this behaviour is supposed to 
>> help: 
>> > 
>> > 
>> > if qs: 
>> > qs[0].do_something() 
>> > 
>> > (where do_something() access relations, or multiple relations). We will 
>> end up with *multiple* large unnecessary queries instead of just one. The 
>> difference is that the first one is easy to explain, the remainder are much 
>> harder, and will contribute even more to the "Django is slow/ORMs are bad" 
>> feeling. 
>>
>>
>> I wasn't convinced by this example because it's already sub-optimal 
>> currently. `if qs` will fetch the whole queryset just to check if it has at 
>> least one element. 
>>
>> Assuming there's 1000 elements in qs, if you don't care about a 1000x 
>> inefficiency, I don't expect you to care much about two or three 1000x 
>> inefficiencies...
>>
>
> I agree that this example isn't particularly worrying. It's something an 
> experienced developer wouldn't do. On the other hand, we are aiming at 
> making things simpler for non-experienced developers.
>
> To me the worrying part here is that we really don't have any data or 
> experience about if the cure will be worse than the disease. Likely not, 
> but what do we gain by taking risks here?
>
> Maybe we should just add the queryset method. This is the smallest atomic 
> task that can be done. Even if there's only the queryset method available, 
> it's possible to enable prefetches per model by using a Manager.
>
>  - Anssi
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/d956a26d-adcf-443f-80c8-7b7ba290d2b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-18 Thread Collin Anderson

I like that idea - keep it a private API for now and make it a public API
once people have used it a little bit.

On Fri, Aug 18, 2017 at 4:01 AM, Shai Berger  wrote:

> On Friday 18 August 2017 09:08:11 Anssi Kääriäinen wrote:
> > Maybe we should just add the queryset method. This is the smallest atomic
> > task that can be done. Even if there's only the queryset method
> available,
> > it's possible to enable prefetches per model by using a Manager.
> >
> I disagree on both counts: I don't think it's the smallest atomic task, and
> I'm not so sure it's the right thing to do.
>
> The smallest atomic task, the way I see it, is building the infrastructure
> that would allow adding the queryset method -- but would also allow
> different
> APIs to be set around it.
>
> And since there is as yet no consensus on the correct API for "end" users,
> I
> would rather not define one immediately.
>
> Shai.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFO84S4hduThLVTwN-Y0uqVy48aRRndvMgzmGOb24zTD9g0wmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-18 Thread Shai Berger

On Friday 18 August 2017 09:08:11 Anssi Kääriäinen wrote:
> Maybe we should just add the queryset method. This is the smallest atomic
> task that can be done. Even if there's only the queryset method available,
> it's possible to enable prefetches per model by using a Manager.
> 
I disagree on both counts: I don't think it's the smallest atomic task, and 
I'm not so sure it's the right thing to do.

The smallest atomic task, the way I see it, is building the infrastructure 
that would allow adding the queryset method -- but would also allow different 
APIs to be set around it.

And since there is as yet no consensus on the correct API for "end" users, I 
would rather not define one immediately.

Shai.

Re: Automatic prefetching in querysets

2017-08-17 Thread Anssi Kääriäinen

On Thursday, August 17, 2017 at 11:30:45 PM UTC+3, Aymeric Augustin wrote:
>
> Hello, 
>
> > On 17 Aug 2017, at 14:48, Luke Plant > 
> wrote: 
> > 
> > On 16/08/17 23:17, Aymeric Augustin wrote: 
> >> It should kick in only when no select_related or prefetch_related is in 
> effect, to avoid interfering with pre-existing optimizations. It's still 
> easy to construct an example where it would degrade performance but I don't 
> think such situations will be common in practice. 
> > 
> > I think Ansii's example is a non-contrived and potentially very common 
> one where we will make things worse if this is default behaviour, 
> especially for the non-expert programmer this behaviour is supposed to 
> help: 
> > 
> > 
> > if qs: 
> > qs[0].do_something() 
> > 
> > (where do_something() access relations, or multiple relations). We will 
> end up with *multiple* large unnecessary queries instead of just one. The 
> difference is that the first one is easy to explain, the remainder are much 
> harder, and will contribute even more to the "Django is slow/ORMs are bad" 
> feeling. 
>
>
> I wasn't convinced by this example because it's already sub-optimal 
> currently. `if qs` will fetch the whole queryset just to check if it has at 
> least one element. 
>
> Assuming there's 1000 elements in qs, if you don't care about a 1000x 
> inefficiency, I don't expect you to care much about two or three 1000x 
> inefficiencies...
>

I agree that this example isn't particularly worrying. It's something an 
experienced developer wouldn't do. On the other hand, we are aiming at 
making things simpler for non-experienced developers.

To me the worrying part here is that we really don't have any data or 
experience about if the cure will be worse than the disease. Likely not, 
but what do we gain by taking risks here?

Maybe we should just add the queryset method. This is the smallest atomic 
task that can be done. Even if there's only the queryset method available, 
it's possible to enable prefetches per model by using a Manager.

 - Anssi


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c9d9dca0-f71d-48dd-9985-17ef8b8ec95d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-17 Thread Aymeric Augustin

Hello,

> On 17 Aug 2017, at 14:48, Luke Plant  wrote:
> 
> On 16/08/17 23:17, Aymeric Augustin wrote:
>> It should kick in only when no select_related or prefetch_related is in 
>> effect, to avoid interfering with pre-existing optimizations. It's still 
>> easy to construct an example where it would degrade performance but I don't 
>> think such situations will be common in practice.
> 
> I think Ansii's example is a non-contrived and potentially very common one 
> where we will make things worse if this is default behaviour, especially for 
> the non-expert programmer this behaviour is supposed to help:
> 
> 
> if qs:
> qs[0].do_something()
> 
> (where do_something() access relations, or multiple relations). We will end 
> up with *multiple* large unnecessary queries instead of just one. The 
> difference is that the first one is easy to explain, the remainder are much 
> harder, and will contribute even more to the "Django is slow/ORMs are bad" 
> feeling.


I wasn't convinced by this example because it's already sub-optimal currently. 
`if qs` will fetch the whole queryset just to check if it has at least one 
element.

Assuming there's 1000 elements in qs, if you don't care about a 1000x 
inefficiency, I don't expect you to care much about two or three 1000x 
inefficiencies...

-- 
Aymeric.


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/E99B1838-FBE9-4E0F-91EC-8A5DD78B8106%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-17 Thread Collin Anderson

"turn it on/off per model" - I wonder if we just have a custom
manager/mixin for the per model case. objects =
AutoPrefetchRelatedManager(). The manager can return a queryset with
prefetch_related(auto=True) or whatever already applied.

On Thu, Aug 17, 2017 at 1:33 PM, Andrew Godwin  wrote:

> Just some quick thoughts, as most of the points I would bring up have been
> discussed:
>
> - This should definitely be opt-in for the first release, it's too big a
> change to make opt-out straight away.
>
> - You should be able to turn it on/off per model, probably in Meta, not in
> a setting, and obviously be able to override it on the queryset level
>
> - I am concerned for the nasty performance edge cases this will cause with
> single-object access patterns (like Anssi's example), but I think the
> overall gain would likely be worth it.
>
> In general, very large sites won't be able to use this at all as JOINs or
> cross-table queries in one piece of code aren't allowed, but they probably
> already have plenty of expertise in this area. We also need to give
> consideration to how it will interact with multiple database support, and
> third-party solutions to things like sharding.
>
> Andrew
>
> On Thu, Aug 17, 2017 at 2:02 AM, Malcolm Box  wrote:
>
>> Hi,
>>
>> I think there's a potential to make this opt-in, and improve the
>> out-of-box experience.
>>
>> Summarising the discussion, it seems that the rough consensus is that if
>> we were building the ORM from scratch, then this would be entirely sensible
>> behaviour (with necessary per-QS ways to disable) - it would remove a
>> common performance problem (N+1 queries), would improve areas where adding
>> prefetch_related to queries is awkward, and in rare cases where it
>> decreased performance there would be documented ways to fix.
>>
>> So the main disagreement is about how to get there from here, and there's
>> concern about three types of users:
>>
>> 1 - Existing, non-expert users whose sites work now (because otherwise
>> they would have already fixed the problem) but who have lurking N+1 issues
>> which will break them later
>> 2 - Existing, non-expert users whose sites work now, but would have
>> performance issues if this was enabled by an upgrade
>> 3 - Existing, expert users who have already found and fixed the issues
>> and who could therefore get no benefit but might suffer a performance
>> degradation.
>>
>> I'll assert that the size of these populations above is listed in roughly
>> size order, with #1 being the biggest. This is a hunch based on most sites
>> not having huge tables where N+1 becomes a problem - at least not until
>> they've been running for a few years and accumulated lots of data...
>>
>> There is another population that hasn't been considered - users starting
>> django projects (ie people running django-admin startproject). Over time,
>> this is by far the largest population.
>>
>> So would a sensible approach be:
>>
>> - Feature is globally opt-in
>> - startproject opts in for new projects
>> - Release notes mention the new flag loudly, and encourage people to try
>> switching it on
>> - We add the debug tracing to help people find places where this setting
>> would help - and encourage them to enable it globally before trying
>> individual queryset.prefetch_related
>>
>> Then over time, all new projects will have the new behaviour. Old
>> projects will gradually upgrade - everyone in category 1 will hit the "make
>> it work" switch the first time they see the warning / see a problem.
>> Experts can choose how they migrate - as Adam points out, even experts can
>> miss things.
>>
>> Finally after a suitable warning period, this can become an opt-out
>> feature and we arrive in the sunny world of an ORM that works better for
>> all users.
>>
>> Cheers,
>>
>> Malcolm
>>
>> On Wednesday, 16 August 2017 21:17:49 UTC+1, Aymeric Augustin wrote:
>>>
>>> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
>>>
>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>> automatically doing prefetch_related where needed without the user having
>>> to request it.
>>>
>>>
>>>
>>> Hello,
>>>
>>> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in
>>> the admin or elsewhere gets old.
>>>
>>>
>>> In addition to everything that was said already, I'd like to point out
>>

Re: Automatic prefetching in querysets

2017-08-17 Thread Andrew Godwin

Just some quick thoughts, as most of the points I would bring up have been
discussed:

- This should definitely be opt-in for the first release, it's too big a
change to make opt-out straight away.

- You should be able to turn it on/off per model, probably in Meta, not in
a setting, and obviously be able to override it on the queryset level

- I am concerned for the nasty performance edge cases this will cause with
single-object access patterns (like Anssi's example), but I think the
overall gain would likely be worth it.

In general, very large sites won't be able to use this at all as JOINs or
cross-table queries in one piece of code aren't allowed, but they probably
already have plenty of expertise in this area. We also need to give
consideration to how it will interact with multiple database support, and
third-party solutions to things like sharding.

Andrew

On Thu, Aug 17, 2017 at 2:02 AM, Malcolm Box  wrote:

> Hi,
>
> I think there's a potential to make this opt-in, and improve the
> out-of-box experience.
>
> Summarising the discussion, it seems that the rough consensus is that if
> we were building the ORM from scratch, then this would be entirely sensible
> behaviour (with necessary per-QS ways to disable) - it would remove a
> common performance problem (N+1 queries), would improve areas where adding
> prefetch_related to queries is awkward, and in rare cases where it
> decreased performance there would be documented ways to fix.
>
> So the main disagreement is about how to get there from here, and there's
> concern about three types of users:
>
> 1 - Existing, non-expert users whose sites work now (because otherwise
> they would have already fixed the problem) but who have lurking N+1 issues
> which will break them later
> 2 - Existing, non-expert users whose sites work now, but would have
> performance issues if this was enabled by an upgrade
> 3 - Existing, expert users who have already found and fixed the issues and
> who could therefore get no benefit but might suffer a performance
> degradation.
>
> I'll assert that the size of these populations above is listed in roughly
> size order, with #1 being the biggest. This is a hunch based on most sites
> not having huge tables where N+1 becomes a problem - at least not until
> they've been running for a few years and accumulated lots of data...
>
> There is another population that hasn't been considered - users starting
> django projects (ie people running django-admin startproject). Over time,
> this is by far the largest population.
>
> So would a sensible approach be:
>
> - Feature is globally opt-in
> - startproject opts in for new projects
> - Release notes mention the new flag loudly, and encourage people to try
> switching it on
> - We add the debug tracing to help people find places where this setting
> would help - and encourage them to enable it globally before trying
> individual queryset.prefetch_related
>
> Then over time, all new projects will have the new behaviour. Old projects
> will gradually upgrade - everyone in category 1 will hit the "make it work"
> switch the first time they see the warning / see a problem. Experts can
> choose how they migrate - as Adam points out, even experts can miss things.
>
> Finally after a suitable warning period, this can become an opt-out
> feature and we arrive in the sunny world of an ORM that works better for
> all users.
>
> Cheers,
>
> Malcolm
>
> On Wednesday, 16 August 2017 21:17:49 UTC+1, Aymeric Augustin wrote:
>>
>> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
>>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>>
>>
>> Hello,
>>
>> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in
>> the admin or elsewhere gets old.
>>
>>
>> In addition to everything that was said already, I'd like to point out
>> that Django already has a very similar "magic auto prefetching" behavior in
>> some cases :-)
>>
>> I'm referring to the admin which calls select_related() on non-nullable
>> foreign keys in the changelist view. The "non-nullable" condition makes
>> that behavior hard to predict — I'd go as far as to call it non
>> deterministic. For details, see slide 54 of https://myks.org/data/20161
>> 103-Django_Under_the_Hood-Debugging_Performance.pdf and the audio
>> commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>>
>>
>> The feature proposed here is most useful if it's opt-out because it
>> targets p

Re: Automatic prefetching in querysets

2017-08-17 Thread Malcolm Box

Hi,

I think there's a potential to make this opt-in, and improve the out-of-box 
experience.

Summarising the discussion, it seems that the rough consensus is that if we 
were building the ORM from scratch, then this would be entirely sensible 
behaviour (with necessary per-QS ways to disable) - it would remove a 
common performance problem (N+1 queries), would improve areas where adding 
prefetch_related to queries is awkward, and in rare cases where it 
decreased performance there would be documented ways to fix.

So the main disagreement is about how to get there from here, and there's 
concern about three types of users:

1 - Existing, non-expert users whose sites work now (because otherwise they 
would have already fixed the problem) but who have lurking N+1 issues which 
will break them later
2 - Existing, non-expert users whose sites work now, but would have 
performance issues if this was enabled by an upgrade
3 - Existing, expert users who have already found and fixed the issues and 
who could therefore get no benefit but might suffer a performance 
degradation.

I'll assert that the size of these populations above is listed in roughly 
size order, with #1 being the biggest. This is a hunch based on most sites 
not having huge tables where N+1 becomes a problem - at least not until 
they've been running for a few years and accumulated lots of data...

There is another population that hasn't been considered - users starting 
django projects (ie people running django-admin startproject). Over time, 
this is by far the largest population. 

So would a sensible approach be:

- Feature is globally opt-in 
- startproject opts in for new projects
- Release notes mention the new flag loudly, and encourage people to try 
switching it on
- We add the debug tracing to help people find places where this setting 
would help - and encourage them to enable it globally before trying 
individual queryset.prefetch_related

Then over time, all new projects will have the new behaviour. Old projects 
will gradually upgrade - everyone in category 1 will hit the "make it work" 
switch the first time they see the warning / see a problem. Experts can 
choose how they migrate - as Adam points out, even experts can miss things.

Finally after a suitable warning period, this can become an opt-out feature 
and we arrive in the sunny world of an ORM that works better for all users.

Cheers,

Malcolm

On Wednesday, 16 August 2017 21:17:49 UTC+1, Aymeric Augustin wrote:
>
> On 15 Aug 2017, at 11:44, Gordon Wrigley  > wrote:
>
> I'd like to discuss automatic prefetching in querysets. Specifically 
> automatically doing prefetch_related where needed without the user having 
> to request it.
>
>
>
> Hello,
>
> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in 
> the admin or elsewhere gets old.
>
>
> In addition to everything that was said already, I'd like to point out 
> that Django already has a very similar "magic auto prefetching" behavior in 
> some cases :-)
>
> I'm referring to the admin which calls select_related() on non-nullable 
> foreign keys in the changelist view. The "non-nullable" condition makes 
> that behavior hard to predict — I'd go as far as to call it non 
> deterministic. For details, see slide 54 of 
> https://myks.org/data/20161103-Django_Under_the_Hood-Debugging_Performance.pdf
>  and 
> the audio commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>
>
> The feature proposed here is most useful if it's opt-out because it 
> targets people who aren't aware that the problem even exists — at best they 
> notice that Django is slow and that reminds them vaguely of a rant that 
> explains why ORMs are the worst thing since object oriented programming.
>
> It should kick in only when no select_related or prefetch_related is in 
> effect, to avoid interfering with pre-existing optimizations. It's still 
> easy to construct an example where it would degrade performance but I don't 
> think such situations will be common in practice. Still, there should be a 
> per-queryset opt-out for these cases.
>
> We may want to introduce it with a deprecation path, that is, make it 
> opt-in at first and log a deprecation warning where the behavior would 
> kick-in, so developers who want to disable it can add the per-queryset 
> opt-out.
>
>
> At this point, my main concerns are:
>
> 1) The difficulty of identifying where the queryset originates, given that 
> querysets are lazy. Passing objects around is common; sometimes it can be 
> hard to figure out where an object comes from. It isn't visible in the 
> stack trace. In my opinion this is the strongest argument against the 
> feature.
>
> 2) The lack of this feature for reverse

Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley

I'm not advocating either way on this, but it's probably worth noting that
the context manager proposal is compatible with a queryset level optin/out
and an object level optout.

So you could for example have prefetch_related(auto=None) which can be
explicitly set to True / False and takes it's value from the context
manager when it's None.


On Thu, Aug 17, 2017 at 12:42 AM, Josh Smeaton 
wrote:

> I think there's a lot right with your suggestions here Shai.
>
> It delivers better default behaviour for new projects, does not affect
> existing deployments, and seems pretty easy to enable/disable selectively
> at any level of the stack.
>
> My only concern would be libraries leaning on this behaviour by enabling
> it locally and users being unable to change it. That's only a small concern
> though, and wouldn't prevent me from recommending the proposal.
>
>
> On Thursday, 17 August 2017 09:34:03 UTC+10, Shai Berger wrote:
>>
>> First of all, I think making the auto-prefetch feature available in some
>> form
>> is a great idea. As an opt-in, I would have made use of it on several
>> occasions, and cut days of optimization work to minutes. It's true that
>> this
>> wouldn't achieve the best possible optimization in many cases, but it
>> would be
>> close enough, for a small fraction of the effort.
>>
>> But of the choices that have been suggested here, I find none quite
>> satisfactory. I agree, basically, with almost all of the objections
>> raised on
>> all sides. And I think I have a better suggestion.
>>
>> Rather than a setting (global and essentially constant) or all-local
>> opt-ins,
>> suppose the feature was controlled -- for all querysets -- by a context
>> manager.
>>
>> So, rather than
>>
>> > "qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():",
>>
>> suppose we had
>>
>> with QuerySet.auto_prefetch(active=True):
>>
>> and under that, all querysets would auto-prefetch. Then:
>>
>> - We could put that in a middleware; using that middleware would then,
>> effectively, be a global opt in. We could add the middleware -- perhaps
>> commented-out -- to the new project template, for beginner's sake. Call
>> it
>> PerformanceTrainingWheelsMiddleware, to make people aware that it's
>> something
>> they should grow out of, but make it easily accessible.
>>
>> - We could easily build a decorator for views
>>
>> - We could have local opt-out, at the correct level -- not the objects,
>> but
>> the control flow.
>>
>> For this to work, the flag controlling the feature would need to be
>> thread-
>> local and managed as a stack. But these aren't problems.
>>
>> If (as likely) this suggestion still does not generate a concensus, I
>> would
>> prefer that we follow the path Anssi suggested --
>>
>> > Note that it should be possible to implement this fully as 3rd party
>> > project if we add an easy way to customise how related object fetching
>> is
>> > done. I'm not sure if we can add an easy way for that.
>>
>> Except I don't think "easy" is a requirement here. If we can add a
>> sensible
>> way, that would be enough -- I don't expect many Django users to develop
>> implementations of the feature, only to use them.
>>
>> My 2 cents,
>> Shai.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/django-developers/EplZGj-ejvg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/bf3e22af-9a41-4106-b986-
> fc136d54c4c6%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD-wiX3RkYYo9twG%2BMn4OL%3DBZvHMO1_6_LMvn%2BQrpM7J0FkiKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-16 Thread Josh Smeaton

I think there's a lot right with your suggestions here Shai. 

It delivers better default behaviour for new projects, does not affect 
existing deployments, and seems pretty easy to enable/disable selectively 
at any level of the stack.

My only concern would be libraries leaning on this behaviour by enabling it 
locally and users being unable to change it. That's only a small concern 
though, and wouldn't prevent me from recommending the proposal.

On Thursday, 17 August 2017 09:34:03 UTC+10, Shai Berger wrote:
>
> First of all, I think making the auto-prefetch feature available in some 
> form 
> is a great idea. As an opt-in, I would have made use of it on several 
> occasions, and cut days of optimization work to minutes. It's true that 
> this 
> wouldn't achieve the best possible optimization in many cases, but it 
> would be 
> close enough, for a small fraction of the effort. 
>
> But of the choices that have been suggested here, I find none quite 
> satisfactory. I agree, basically, with almost all of the objections raised 
> on 
> all sides. And I think I have a better suggestion. 
>
> Rather than a setting (global and essentially constant) or all-local 
> opt-ins, 
> suppose the feature was controlled -- for all querysets -- by a context 
> manager. 
>
> So, rather than 
>
> > "qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():", 
>
> suppose we had 
>
> with QuerySet.auto_prefetch(active=True): 
>
> and under that, all querysets would auto-prefetch. Then: 
>
> - We could put that in a middleware; using that middleware would then, 
> effectively, be a global opt in. We could add the middleware -- perhaps 
> commented-out -- to the new project template, for beginner's sake. Call it 
> PerformanceTrainingWheelsMiddleware, to make people aware that it's 
> something 
> they should grow out of, but make it easily accessible. 
>
> - We could easily build a decorator for views 
>
> - We could have local opt-out, at the correct level -- not the objects, 
> but 
> the control flow. 
>
> For this to work, the flag controlling the feature would need to be 
> thread- 
> local and managed as a stack. But these aren't problems. 
>
> If (as likely) this suggestion still does not generate a concensus, I 
> would 
> prefer that we follow the path Anssi suggested -- 
>
> > Note that it should be possible to implement this fully as 3rd party 
> > project if we add an easy way to customise how related object fetching 
> is 
> > done. I'm not sure if we can add an easy way for that. 
>
> Except I don't think "easy" is a requirement here. If we can add a 
> sensible 
> way, that would be enough -- I don't expect many Django users to develop 
> implementations of the feature, only to use them. 
>
> My 2 cents, 
> Shai. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/bf3e22af-9a41-4106-b986-fc136d54c4c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-16 Thread Shai Berger

First of all, I think making the auto-prefetch feature available in some form 
is a great idea. As an opt-in, I would have made use of it on several 
occasions, and cut days of optimization work to minutes. It's true that this 
wouldn't achieve the best possible optimization in many cases, but it would be 
close enough, for a small fraction of the effort.

But of the choices that have been suggested here, I find none quite 
satisfactory. I agree, basically, with almost all of the objections raised on 
all sides. And I think I have a better suggestion.

Rather than a setting (global and essentially constant) or all-local opt-ins, 
suppose the feature was controlled -- for all querysets -- by a context 
manager.

So, rather than

> "qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():", 

suppose we had

with QuerySet.auto_prefetch(active=True):

and under that, all querysets would auto-prefetch. Then:

- We could put that in a middleware; using that middleware would then, 
effectively, be a global opt in. We could add the middleware -- perhaps 
commented-out -- to the new project template, for beginner's sake. Call it 
PerformanceTrainingWheelsMiddleware, to make people aware that it's something 
they should grow out of, but make it easily accessible.

- We could easily build a decorator for views

- We could have local opt-out, at the correct level -- not the objects, but 
the control flow.

For this to work, the flag controlling the feature would need to be thread-
local and managed as a stack. But these aren't problems.

If (as likely) this suggestion still does not generate a concensus, I would 
prefer that we follow the path Anssi suggested -- 

> Note that it should be possible to implement this fully as 3rd party
> project if we add an easy way to customise how related object fetching is
> done. I'm not sure if we can add an easy way for that.

Except I don't think "easy" is a requirement here. If we can add a sensible 
way, that would be enough -- I don't expect many Django users to develop 
implementations of the feature, only to use them.

My 2 cents,
Shai.

Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley

Regarding 2, it does work for reverse one-to-one relations.

On Wed, Aug 16, 2017 at 9:17 PM, Aymeric Augustin <
aymeric.augus...@polytechnique.org> wrote:

> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
>
> I'd like to discuss automatic prefetching in querysets. Specifically
> automatically doing prefetch_related where needed without the user having
> to request it.
>
>
>
> Hello,
>
> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in
> the admin or elsewhere gets old.
>
>
> In addition to everything that was said already, I'd like to point out
> that Django already has a very similar "magic auto prefetching" behavior in
> some cases :-)
>
> I'm referring to the admin which calls select_related() on non-nullable
> foreign keys in the changelist view. The "non-nullable" condition makes
> that behavior hard to predict — I'd go as far as to call it non
> deterministic. For details, see slide 54 of https://myks.org/data/
> 20161103-Django_Under_the_Hood-Debugging_Performance.pdf and the audio
> commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>
>
> The feature proposed here is most useful if it's opt-out because it
> targets people who aren't aware that the problem even exists — at best they
> notice that Django is slow and that reminds them vaguely of a rant that
> explains why ORMs are the worst thing since object oriented programming.
>
> It should kick in only when no select_related or prefetch_related is in
> effect, to avoid interfering with pre-existing optimizations. It's still
> easy to construct an example where it would degrade performance but I don't
> think such situations will be common in practice. Still, there should be a
> per-queryset opt-out for these cases.
>
> We may want to introduce it with a deprecation path, that is, make it
> opt-in at first and log a deprecation warning where the behavior would
> kick-in, so developers who want to disable it can add the per-queryset
> opt-out.
>
>
> At this point, my main concerns are:
>
> 1) The difficulty of identifying where the queryset originates, given that
> querysets are lazy. Passing objects around is common; sometimes it can be
> hard to figure out where an object comes from. It isn't visible in the
> stack trace. In my opinion this is the strongest argument against the
> feature.
>
> 2) The lack of this feature for reverse one-to-one relations; it's only
> implemented for foreign keys. It's hard to tell them apart in Python code.
> The subtle differences, like return None vs. raise ObjectDoesNotExist when
> there's no related object, degrade the developer experience.
>
> 3) The strong opinions expressed against the feature. I'm not sure that
> consensus is within reach. If we can't agree that this is an adequate
> amount of magic, we're likely to stick with the status quo. I'd rather not
> have this question decided by a vote of the technical board.
>
>
> In the grand scheme of things, going from "prefetching a related instance
> for an object" to "prefetching related instances for all objects in the
> queryset" isn't that much of a stretch... But I admit it's rather scary to
> make this change for all existing Django projects!
>
>
> Best regards,
>
> --
> Aymeric.
>
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/django-developers/EplZGj-ejvg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/2D9DCC0F-EAB0-4512-8AEC-
> 08A694DF9074%40polytechnique.org
> <https://groups.google.com/d/msgid/django-developers/2D9DCC0F-EAB0-4512-8AEC-08A694DF9074%40polytechnique.org?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD-wiX0ScioW28wNYmY76Ma03kbSN9dWjRvKWs60%2B0f1VwvkFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-16 Thread Aymeric Augustin

> On 15 Aug 2017, at 11:44, Gordon Wrigley wrote:
>
> I'd like to discuss automatic prefetching in querysets. Specifically
> automatically doing prefetch_related where needed without the user having to
> request it.

Hello,

I'm rather sympathetic to this proposal. Figuring out N + 1 problems in the
admin or elsewhere gets old.

In addition to everything that was said already, I'd like to point out that
Django already has a very similar "magic auto prefetching" behavior in some
cases :-)

I'm referring to the admin which calls select_related() on non-nullable foreign
keys in the changelist view. The "non-nullable" condition makes that behavior
hard to predict — I'd go as far as to call it non deterministic. For details,
see slide 54 of
https://myks.org/data/20161103-Django_Under_the_Hood-Debugging_Performance.pdf
<https://myks.org/data/20161103-Django_Under_the_Hood-Debugging_Performance.pdf>
and the audio commentary at https://youtu.be/5fheDDj3oHY?t=2024
<https://youtu.be/5fheDDj3oHY?t=2024>.

The feature proposed here is most useful if it's opt-out because it targets
people who aren't aware that the problem even exists — at best they notice that
Django is slow and that reminds them vaguely of a rant that explains why ORMs
are the worst thing since object oriented programming.

It should kick in only when no select_related or prefetch_related is in effect,
to avoid interfering with pre-existing optimizations. It's still easy to
construct an example where it would degrade performance but I don't think such
situations will be common in practice. Still, there should be a per-queryset
opt-out for these cases.

We may want to introduce it with a deprecation path, that is, make it opt-in at
first and log a deprecation warning where the behavior would kick-in, so
developers who want to disable it can add the per-queryset opt-out.

At this point, my main concerns are:

1) The difficulty of identifying where the queryset originates, given that
querysets are lazy. Passing objects around is common; sometimes it can be hard
to figure out where an object comes from. It isn't visible in the stack trace.
In my opinion this is the strongest argument against the feature.

2) The lack of this feature for reverse one-to-one relations; it's only
implemented for foreign keys. It's hard to tell them apart in Python code. The
subtle differences, like return None vs. raise ObjectDoesNotExist when there's
no related object, degrade the developer experience.

3) The strong opinions expressed against the feature. I'm not sure that
consensus is within reach. If we can't agree that this is an adequate amount of
magic, we're likely to stick with the status quo. I'd rather not have this
question decided by a vote of the technical board.

In the grand scheme of things, going from "prefetching a related instance for
an object" to "prefetching related instances for all objects in the queryset"
isn't that much of a stretch... But I admit it's rather scary to make this
change for all existing Django projects!

Best regards,

--
Aymeric.

--
You received this message because you are subscribed to the Google Groups
"Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/2D9DCC0F-EAB0-4512-8AEC-08A694DF9074%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-16 Thread 'Tom Evans' via Django developers (Contributions to Django itself)

Is this opt-{in,out} considered to be a global flag, meant to be
toggled on or off depending on whether it is an "expert" working on
the project or not?

I don't think that would be a good idea, almost all of our projects
have a mix of skill levels, and people move from team to team on a
regular basis. I don't think we have a project that has only been
worked on by senior developers or only worked on by junior developers.

What I think would be exceptionally useful would be to emit loud
warnings when developing (DEBUG=True or runserver) whenever this code
can detect that a prefetch MAY be applicable. A lot of our junior
developers are not as aware about DB structures and SQL performance
(particularly now that they don't design schemas and write DB
queries), so warnings as part of their normal development help trigger
teaching moments. Magically (sometimes) making things faster doesn't
teach them anything.

Turning this feature on/off per app is scary. If we're dealing with
models from AppA, we will get auto pre-fetching, but if we work with
models from AppB, we do not? If we're dealing with those models in the
same module, we will have different behaviour depending on which model
is being used? Please no.

I also think that it might be handy to specify
"qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():", which
would then trigger the newly proposed behaviour. It's like "I want you
to prefetch all the things I use, but I don't know what just yet".
Having said that, I'd also like that to spit out a warning (again, dev
only) that specified what was actually prefetched, because why waste a
DB query in production to determine what we know whilst developing?

Cheers

Tom

On Wed, Aug 16, 2017 at 5:28 PM, Brice Parent  wrote:
> Le 16/08/17 à 14:07, Adam Johnson a écrit :
>>
>> I wouldn't want is to give free optimisations to non-ORM pros
>
>
> Something like 99% of django projects are by non-ORM pros, it can be easy to
> forget that on a mailing list of experts.
>
> I don't count myself as an ORM or SQL expert at all, but I've worked on at
> least 2 projects for which they had contracted such experts to optimise
> their projects once they reached some limits. Those experts only worked for
> them for a limited period of time, and if we changed this behaviour by
> default, I'm 100% sure they wouldn't understanding why their website was
> getting slower with the new release. And they would probably only see the
> performance issue on staging servers if they have some, or in production
> with a full database.
> I admit this can be a great thing for new comers and even small sites. But
> remember that, with or without this functionality, their website will need
> to be optimised in many ways when they start to have more users, and
> removing this 1+N common pitfall (or hiding the problem, as it is not
> completely solved as it's not the more efficient fix) will not be the only
> point they should look at.
> I think the solution for this would simply be to have it as opt-out, not to
> harm any already-working and optimised websites and allow to maintain a
> stable behaviour for 3rd party apps, but make it opt-in to new users just by
> pre-setting some setting in the settings.py file generated by "django-admin
> startproject". It could be a simple boolean setting or a list of apps for
> which we want the auto-fetch feature to be activated on. With this, new
> projects would be optimised unless they want to optimise manually, existing
> projects would still work without decreasing performance or creating memory
> problems in any case. Best of both worlds.
> And for existing non-optimised but existing websites, I prefer the
> status-quo than risking to create problems that might occur on production
> website.
> We could also, as proposed elsewhere in this thread, warn the developers (so
> only when settings.debug is set) when this kind of 1+N queries are executed
> that it could be way more efficient, either by activating the auto-fetch
> functionality or by manually prefetch related data.
>
> - Brice
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/dc483976-f7d0-3aee-4f9e-76f515e5c761%40brice.xyz.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to th

Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley

 going
>> to know about a global setting and can opt in to the old behaviour rather
>> easily. Newer users that do not know about select/prefetch_related or these
>> settings will fall into the new behaviour by default.
>>
>> It's unreasonable to expect every user of django learn the ins and outs
>> of all queryset methods. I'm probably considered a django orm expert, and I
>> still sometimes write queries that are non-optimal or *become* non-optimal
>> after changes in unrelated areas. At an absolute minimum we should be
>> screaming and shouting when this happens. But we can also fix the issue
>> while complaining, and help guide users into correct behaviour.
>>
>>
>> On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
>>>
>>> Automatically prefetching is something I feel should be avoided.
>>>
>>> A common gripe I have with ORMs is they hide what's actually happening
>>> with the database, resulting in beginners-going-on-intermediates
>>> building libraries/systems that don't scale well.
>>>
>>> We have several views in a dashboard, where a relation may be accessed
>>> once or twice while iterating over a large python filtered queryset.
>>> Prefetching this relation based on the original queryset has the
>>> potential to add around 5 seconds to the response time (probably more, that
>>> table has doubled in size since I last measured it).
>>>
>>> I feel it would be better to optimise for your usecase, as apposed to
>>> try to prevent uncalled-for behaviour.
>>>
>>>
>>>
>>> On Aug 15, 2017 23:15, "Luke Plant"  wrote:
>>>
>>>> I agree with Marc here that the proposed optimizations are 'magical'. I
>>>> think when it comes to optimizations like these you simply cannot know in
>>>> advance whether doing extra queries is going to a be an optimization or a
>>>> pessimization. If I can come up with a single example where it would
>>>> significantly decrease performance (either memory usage or speed) compared
>>>> to the default (and I'm sure I can), then I would be strongly opposed to it
>>>> ever being default behaviour.
>>>>
>>>> Concerning implementing it as an additional  QuerySet method like
>>>> `auto_prefetch()` - I'm not sure what I think, I feel like it could get
>>>> icky (i.e. increase our technical debt), due to the way it couples things
>>>> together. I can't imagine ever wanting to use it, though, I would always
>>>> prefer the manual option.
>>>>
>>>> Luke
>>>>
>>>>
>>>>
>>>> On 15/08/17 21:02, Marc Tamlyn wrote:
>>>>
>>>> Hi Gordon,
>>>>
>>>> Thanks for the suggestion.
>>>>
>>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>>> possible prefetches be identified? What happens when an initial loop in a
>>>> view requires one prefetch, but a subsequent loop in a template requires
>>>> some other prefetch? What about nested loops resulting in nested
>>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>>> multiple ways. Personally, I would argue that correctly setting up and
>>>> maintaining appropriate prefetches and selects is a necessary part of
>>>> working with an ORM.
>>>>
>>>> Do you know of any other ORMs which attempt similar magical
>>>> optimisations? How do they go about identifying the cases where it is
>>>> necessary?
>>>>
>>>> On 15 August 2017 at 10:44, Gordon Wrigley 
>>>> wrote:
>>>>
>>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>>> automatically doing prefetch_related where needed without the user having
>>>>> to request it.
>>>>>
>>>>> For context consider these three snippets using the Question & Choice
>>>>> models from the tutorial
>>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models>
>>>>>  when
>>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>>
>>>>> Default
>>>>> for choice in Choice.objects.all():
>>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>>> 501 db queries, fetches 500

Re: Automatic prefetching in querysets

2017-08-16 Thread Tom Forbes

 behaviour.
>
>
> On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
>>
>> Automatically prefetching is something I feel should be avoided.
>>
>> A common gripe I have with ORMs is they hide what's actually happening
>> with the database, resulting in beginners-going-on-intermediates
>> building libraries/systems that don't scale well.
>>
>> We have several views in a dashboard, where a relation may be accessed
>> once or twice while iterating over a large python filtered queryset.
>> Prefetching this relation based on the original queryset has the
>> potential to add around 5 seconds to the response time (probably more, that
>> table has doubled in size since I last measured it).
>>
>> I feel it would be better to optimise for your usecase, as apposed to try
>> to prevent uncalled-for behaviour.
>>
>>
>>
>> On Aug 15, 2017 23:15, "Luke Plant"  wrote:
>>
>>> I agree with Marc here that the proposed optimizations are 'magical'. I
>>> think when it comes to optimizations like these you simply cannot know in
>>> advance whether doing extra queries is going to a be an optimization or a
>>> pessimization. If I can come up with a single example where it would
>>> significantly decrease performance (either memory usage or speed) compared
>>> to the default (and I'm sure I can), then I would be strongly opposed to it
>>> ever being default behaviour.
>>>
>>> Concerning implementing it as an additional  QuerySet method like
>>> `auto_prefetch()` - I'm not sure what I think, I feel like it could get
>>> icky (i.e. increase our technical debt), due to the way it couples things
>>> together. I can't imagine ever wanting to use it, though, I would always
>>> prefer the manual option.
>>>
>>> Luke
>>>
>>>
>>>
>>> On 15/08/17 21:02, Marc Tamlyn wrote:
>>>
>>> Hi Gordon,
>>>
>>> Thanks for the suggestion.
>>>
>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>> possible prefetches be identified? What happens when an initial loop in a
>>> view requires one prefetch, but a subsequent loop in a template requires
>>> some other prefetch? What about nested loops resulting in nested
>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>> multiple ways. Personally, I would argue that correctly setting up and
>>> maintaining appropriate prefetches and selects is a necessary part of
>>> working with an ORM.
>>>
>>> Do you know of any other ORMs which attempt similar magical
>>> optimisations? How do they go about identifying the cases where it is
>>> necessary?
>>>
>>> On 15 August 2017 at 10:44, Gordon Wrigley  wrote:
>>>
>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>> automatically doing prefetch_related where needed without the user having
>>>> to request it.
>>>>
>>>> For context consider these three snippets using the Question & Choice
>>>> models from the tutorial
>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>>> when
>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>
>>>> Default
>>>> for choice in Choice.objects.all():
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 501 db queries, fetches 500 choice rows and 500 question rows from the
>>>> DB
>>>>
>>>> Prefetch_related
>>>> for choice in Choice.objects.prefetch_related('question'):
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>>>
>>>> Select_related
>>>> for choice in Choice.objects.select_related('question'):
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>>>
>>>> I've included select_related for completeness, I'm not going to propose
>>>> changing anything about it's use. There are places where it is the best
>>>> choice and in those places it will still be up to the user to request it. I
>>>> will note that anywhere select_related

Re: Automatic prefetching in querysets

2017-08-16 Thread Marc Tamlyn

 on the original queryset has the
>> potential to add around 5 seconds to the response time (probably more, that
>> table has doubled in size since I last measured it).
>>
>> I feel it would be better to optimise for your usecase, as apposed to try
>> to prevent uncalled-for behaviour.
>>
>>
>>
>> On Aug 15, 2017 23:15, "Luke Plant"  wrote:
>>
>>> I agree with Marc here that the proposed optimizations are 'magical'. I
>>> think when it comes to optimizations like these you simply cannot know in
>>> advance whether doing extra queries is going to a be an optimization or a
>>> pessimization. If I can come up with a single example where it would
>>> significantly decrease performance (either memory usage or speed) compared
>>> to the default (and I'm sure I can), then I would be strongly opposed to it
>>> ever being default behaviour.
>>>
>>> Concerning implementing it as an additional  QuerySet method like
>>> `auto_prefetch()` - I'm not sure what I think, I feel like it could get
>>> icky (i.e. increase our technical debt), due to the way it couples things
>>> together. I can't imagine ever wanting to use it, though, I would always
>>> prefer the manual option.
>>>
>>> Luke
>>>
>>>
>>>
>>> On 15/08/17 21:02, Marc Tamlyn wrote:
>>>
>>> Hi Gordon,
>>>
>>> Thanks for the suggestion.
>>>
>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>> possible prefetches be identified? What happens when an initial loop in a
>>> view requires one prefetch, but a subsequent loop in a template requires
>>> some other prefetch? What about nested loops resulting in nested
>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>> multiple ways. Personally, I would argue that correctly setting up and
>>> maintaining appropriate prefetches and selects is a necessary part of
>>> working with an ORM.
>>>
>>> Do you know of any other ORMs which attempt similar magical
>>> optimisations? How do they go about identifying the cases where it is
>>> necessary?
>>>
>>> On 15 August 2017 at 10:44, Gordon Wrigley  wrote:
>>>
>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>> automatically doing prefetch_related where needed without the user having
>>>> to request it.
>>>>
>>>> For context consider these three snippets using the Question & Choice
>>>> models from the tutorial
>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>>> when
>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>
>>>> Default
>>>> for choice in Choice.objects.all():
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 501 db queries, fetches 500 choice rows and 500 question rows from the
>>>> DB
>>>>
>>>> Prefetch_related
>>>> for choice in Choice.objects.prefetch_related('question'):
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>>>
>>>> Select_related
>>>> for choice in Choice.objects.select_related('question'):
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>>>
>>>> I've included select_related for completeness, I'm not going to propose
>>>> changing anything about it's use. There are places where it is the best
>>>> choice and in those places it will still be up to the user to request it. I
>>>> will note that anywhere select_related is optimal prefetch_related is still
>>>> better than the default and leave it at that.
>>>>
>>>> The 'Default' example above is a classic example of the N+1 query
>>>> problem, a problem that is widespread in Django apps.
>>>> This pattern of queries is what new users produce because they don't
>>>> know enough about the database and / or ORM to do otherwise.
>>>> Experieced users will also often produce this because it's not always
>>>> obvious what fields will and won't be used and subseque

Re: Automatic prefetching in querysets

2017-08-16 Thread Luke Plant

t similar magical
    optimisations? How do they go about identifying the cases
where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley
> wrote:

I'd like to discuss automatic prefetching in querysets.
Specifically automatically doing prefetch_related where
needed without the user having to request it.

For context consider these three snippets using the
Question & Choice models from the tutorial

<https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> when
there are 100 questions each with 5 choices for a total
of 500 choices.

Default
|
forchoice inChoice.objects.all():
print(choice.question.question_text,':',choice.choice_text)
|
501 db queries, fetches 500 choice rows and 500 question
rows from the DB

Prefetch_related
|
forchoice inChoice.objects.prefetch_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
2 db queries, fetches 500 choice rows and 100 question
rows from the DB

Select_related
|
forchoice inChoice.objects.select_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
1 db query, fetches 500 choice rows and 500 question rows
from the DB

I've included select_related for completeness, I'm not
going to propose changing anything about it's use. There
are places where it is the best choice and in those
places it will still be up to the user to request it. I
will note that anywhere select_related is optimal
prefetch_related is still better than the default and
leave it at that.

The 'Default' example above is a classic example of the
N+1 query problem, a problem that is widespread in Django
apps.
This pattern of queries is what new users produce because
they don't know enough about the database and / or ORM to
do otherwise.
Experieced users will also often produce this because
it's not always obvious what fields will and won't be
used and subsequently what should be prefetched.
Additionally that list will change over time. A small
change to a template to display an extra field can result
in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming
and error prone. Tools like django-perf-rec
<https://github.com/YPlan/django-perf-rec> (which I was
involved in creating) and nplusone
<https://github.com/jmcarp/nplusone> exist in part to
flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the
Admin will also produce queries like this because it's
very difficult for them to know what needs prefetching
without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django
so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour
but obviously it needs to be proved first so for now I'd
suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've
used successfully in production. I'm not posting it yet
because I'd like to focus on desired behavior rather than
implementation details. But in summary, what it does is
when accessing a missing field on a model, rather than
fetching it just for that instance, it runs a
prefetch_related query to fetch it for all peer instances
that were fetched in the same queryset. So in the example
above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue
that it really isn't.
The only time this isn't superior to the default case is
when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with
a large number of results, filtered basically all of them
and the code is structured so that the filtered ones
aren't garbage collected.
To cover this rare case the automatic prefetching can

Re: Automatic prefetching in querysets

2017-08-16 Thread Anssi Kääriäinen

On Wednesday, August 16, 2017 at 2:27:11 PM UTC+3, Josh Smeaton wrote:
>
> It won't affect experienced users. They'll read the release notes, see 
>> that this change has been implemented, and either go and delete a bunch of 
>> prefetch_related() calls, grumble a bit and turn auto-prefetch off globally 
>> or just file it away as another fact they know about the Django ORM.
>>
>
> This is, for me, the biggest point in favour of having it opt out. It'll 
> barely affect anyone who is really experienced, but can provide such a 
> large upside for others.
>

There is huge potential for this feature going very bad for performance. 
For example, this common anti-pattern would immediately cause interesting 
issues:

if qs:
qs[0].do_something()

then if do_something() does access a couple of relations, you'll load data 
for the whole queryset where you only need it for the first one. So, at 
least make this opt-in instead of opt-out.

Doing this for "select_related" cases on opt-in basis seems like a 
plausible idea. Doing this for reverse foreign key or m2m cases where you 
need to use prefetch_related seems complex to implement, and just for that 
reason I'd avoid the prefetch related case at least initially.

Note that it should be possible to implement this fully as 3rd party 
project if we add an easy way to customise how related object fetching is 
done. I'm not sure if we can add an easy way for that.

By the way, a very efficient test for N+1 queries issue is to do something 
like this:

def test_nplusone_queries(self):
self.create_object()
with self.count_queries() as cnt:
self.client.get('object_listing/')
self.create_object()
with self.assertNumQueries(cnt):
self.client.get('object_listing/')

that is, you test the same list view with one and two objects. If there was 
an easy way to write tests like this, that would make n+1 queries problems 
a lot easier to detect. Maybe something to make this kind of test trivial 
to write could be in Django? Something like 
self.assertListViewScales(object_creator, 'view_name')?

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1dcd8559-5f99-4635-b1d1-959bea066192%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-16 Thread Adam Johnson

>
> I wouldn't want is to give free optimisations to non-ORM pros
>

Something like 99% of django projects are by non-ORM pros, it can be easy
to forget that on a mailing list of experts.

On 16 August 2017 at 12:27, Josh Smeaton  wrote:

> It won't affect experienced users. They'll read the release notes, see
>> that this change has been implemented, and either go and delete a bunch of
>> prefetch_related() calls, grumble a bit and turn auto-prefetch off globally
>> or just file it away as another fact they know about the Django ORM.
>>
>
> This is, for me, the biggest point in favour of having it opt out. It'll
> barely affect anyone who is really experienced, but can provide such a
> large upside for others.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/3bf2f56e-8534-404b-8a70-
> 12a657be0514%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM1FyuGNEkWQrrC6Owo1V5c2YDmBaK5NFRkRgdMJtzQjfg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-16 Thread Josh Smeaton


>
> It won't affect experienced users. They'll read the release notes, see 
> that this change has been implemented, and either go and delete a bunch of 
> prefetch_related() calls, grumble a bit and turn auto-prefetch off globally 
> or just file it away as another fact they know about the Django ORM.
>

This is, for me, the biggest point in favour of having it opt out. It'll 
barely affect anyone who is really experienced, but can provide such a 
large upside for others.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/3bf2f56e-8534-404b-8a70-12a657be0514%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Automatic prefetching in querysets

2017-08-15 Thread Alexander Hill

I think this is an excellent suggestion.

It seems generally accepted in this thread that although there are cases
where this would hurt performance, it would on average solve more problems
than it creates. The debate seems to be more whether or not it's "right"
for the ORM to behave in this magical way. With that in mind...

It won't affect experienced users. They'll read the release notes, see that
this change has been implemented, and either go and delete a bunch of
prefetch_related() calls, grumble a bit and turn auto-prefetch off globally
or just file it away as another fact they know about the Django ORM.

For beginners, this can defer the pain of learning about N+1 problems,
potentially forever depending on the scale of the project. Ultimately
Django's job isn't to teach hard lessons about SQL gotchas, it's to make it
easy to make a nice website. This proposal would reduce both average load
time of Django pages and debugging time, at the cost of the average Django
developer being a little more ignorant about what queries the ORM
generates. I think that's a good trade and in line with the goals of the
project.

Django's ORM isn't SQLAlchemy - it's built on high-level concepts, designed
to be relatively beginner-friendly, and already pretty magical.
select_related() and prefetch_related() are somewhat awkward plugs for a
leaky abstraction, and IMO this is just a better plug.

Alex

On Wed, 16 Aug 2017 at 12:12 Cristiano Coelho 
wrote:

> I would rather have warnings as well, adding more magical behavior is bad
> and might even degrade performance on some cases, automatically selecting a
> bunch of data that "might" be used is bad, and specially considering how
> slow python is, accidentally loading/building 1k+ objects when maybe only
> one of them is used would be as bad as doing 1k+ queries.
>
> If the systems you are building are that large and complicated you can't
> have people with 0 SQL knowledge doing stuff neither! So many things to
> tweak, indexes, data denormalization, proper joins here and there, unique
> constraints, locks and race conditions, someone attempting to code
> something that's not a blog or hello world really needs to know a bit about
> all of that.
>
>
>
> El martes, 15 de agosto de 2017, 6:44:19 (UTC-3), Gordon Wrigley escribió:
>>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>> For context consider these three snippets using the Question & Choice
>> models from the tutorial
>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>> when
>> there are 100 questions each with 5 choices for a total of 500 choices.
>>
>> Default
>> for choice in Choice.objects.all():
>> print(choice.question.question_text, ':', choice.choice_text)
>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>
>> Prefetch_related
>> for choice in Choice.objects.prefetch_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>
>> Select_related
>> for choice in Choice.objects.select_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>
>> I've included select_related for completeness, I'm not going to propose
>> changing anything about it's use. There are places where it is the best
>> choice and in those places it will still be up to the user to request it. I
>> will note that anywhere select_related is optimal prefetch_related is still
>> better than the default and leave it at that.
>>
>> The 'Default' example above is a classic example of the N+1 query
>> problem, a problem that is widespread in Django apps.
>> This pattern of queries is what new users produce because they don't know
>> enough about the database and / or ORM to do otherwise.
>> Experieced users will also often produce this because it's not always
>> obvious what fields will and won't be used and subsequently what should be
>> prefetched.
>> Additionally that list will change over time. A small change to a
>> template to display an extra field can result in a denial of service on
>> your DB due to a missing prefetch.
>> Identifying missing prefetches is fiddly, time consuming and error prone.
>> Tools like django-perf-rec <https://github.com/YPlan

Re: Automatic prefetching in querysets

2017-08-15 Thread Cristiano Coelho

I would rather have warnings as well, adding more magical behavior is bad 
and might even degrade performance on some cases, automatically selecting a 
bunch of data that "might" be used is bad, and specially considering how 
slow python is, accidentally loading/building 1k+ objects when maybe only 
one of them is used would be as bad as doing 1k+ queries.

If the systems you are building are that large and complicated you can't 
have people with 0 SQL knowledge doing stuff neither! So many things to 
tweak, indexes, data denormalization, proper joins here and there, unique 
constraints, locks and race conditions, someone attempting to code 
something that's not a blog or hello world really needs to know a bit about 
all of that.


El martes, 15 de agosto de 2017, 6:44:19 (UTC-3), Gordon Wrigley escribió:
>
> I'd like to discuss automatic prefetching in querysets. Specifically 
> automatically doing prefetch_related where needed without the user having 
> to request it.
>
> For context consider these three snippets using the Question & Choice 
> models from the tutorial 
> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
> when 
> there are 100 questions each with 5 choices for a total of 500 choices.
>
> Default
> for choice in Choice.objects.all():
> print(choice.question.question_text, ':', choice.choice_text)
> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>
> Prefetch_related
> for choice in Choice.objects.prefetch_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>
> Select_related
> for choice in Choice.objects.select_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>
> I've included select_related for completeness, I'm not going to propose 
> changing anything about it's use. There are places where it is the best 
> choice and in those places it will still be up to the user to request it. I 
> will note that anywhere select_related is optimal prefetch_related is still 
> better than the default and leave it at that.
>
> The 'Default' example above is a classic example of the N+1 query problem, 
> a problem that is widespread in Django apps.
> This pattern of queries is what new users produce because they don't know 
> enough about the database and / or ORM to do otherwise.
> Experieced users will also often produce this because it's not always 
> obvious what fields will and won't be used and subsequently what should be 
> prefetched.
> Additionally that list will change over time. A small change to a template 
> to display an extra field can result in a denial of service on your DB due 
> to a missing prefetch.
> Identifying missing prefetches is fiddly, time consuming and error prone. 
> Tools like django-perf-rec <https://github.com/YPlan/django-perf-rec> 
> (which I was involved in creating) and nplusone 
> <https://github.com/jmcarp/nplusone> exist in part to flag missing 
> prefetches introduced by changed code.
> Finally libraries like Django Rest Framework and the Admin will also 
> produce queries like this because it's very difficult for them to know what 
> needs prefetching without being explicitly told by an experienced user.
>
> As hinted at the top I'd like to propose changing Django so the default 
> code behaves like the prefetch_related code.
> Longer term I think this should be the default behaviour but obviously it 
> needs to be proved first so for now I'd suggest a new queryset function 
> that enables this behaviour.
>
> I have a proof of concept of this mechanism that I've used successfully in 
> production. I'm not posting it yet because I'd like to focus on desired 
> behavior rather than implementation details. But in summary, what it does 
> is when accessing a missing field on a model, rather than fetching it just 
> for that instance, it runs a prefetch_related query to fetch it for all 
> peer instances that were fetched in the same queryset. So in the example 
> above it prefetches all Questions in one query.
>
> This might seem like a risky thing to do but I'd argue that it really 
> isn't.
> The only time this isn't superior to the default case is when you are post 
> filtering the queryset results in Python.
> Even in that case it's only inferior if you started with a large number of 
> results, filtered basically all of them and the code is structured so that 
> the filtered ones aren&#

Re: Automatic prefetching in querysets

2017-08-15 Thread Curtis Maloney

g to
use it, though, I would always prefer the manual option.

Luke



On 15/08/17 21:02, Marc Tamlyn wrote:

Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever.
How would possible prefetches be identified? What happens when
an initial loop in a view requires one prefetch, but a
subsequent loop in a template requires some other prefetch?
What about nested loops resulting in nested prefetches? Code
like this is almost guaranteed to break unexpectedly in
multiple ways. Personally, I would argue that correctly
setting up and maintaining appropriate prefetches and selects
is a necessary part of working with an ORM.

Do you know of any other ORMs which attempt similar magical
optimisations? How do they go about identifying the cases
    where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley
> wrote:

I'd like to discuss automatic prefetching in querysets.
Specifically automatically doing prefetch_related where
needed without the user having to request it.

For context consider these three snippets using the
Question & Choice models from the tutorial

<https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> when
there are 100 questions each with 5 choices for a total of
500 choices.

Default
|
forchoice inChoice.objects.all():
print(choice.question.question_text,':',choice.choice_text)
|
501 db queries, fetches 500 choice rows and 500 question
rows from the DB

Prefetch_related
|
forchoice inChoice.objects.prefetch_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
2 db queries, fetches 500 choice rows and 100 question
rows from the DB

Select_related
|
forchoice inChoice.objects.select_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
1 db query, fetches 500 choice rows and 500 question rows
from the DB

I've included select_related for completeness, I'm not
going to propose changing anything about it's use. There
are places where it is the best choice and in those places
it will still be up to the user to request it. I will note
that anywhere select_related is optimal prefetch_related
is still better than the default and leave it at that.

The 'Default' example above is a classic example of the
N+1 query problem, a problem that is widespread in Django
apps.
This pattern of queries is what new users produce because
they don't know enough about the database and / or ORM to
do otherwise.
Experieced users will also often produce this because it's
not always obvious what fields will and won't be used and
subsequently what should be prefetched.
Additionally that list will change over time. A small
change to a template to display an extra field can result
in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming
and error prone. Tools like django-perf-rec
<https://github.com/YPlan/django-perf-rec> (which I was
involved in creating) and nplusone
<https://github.com/jmcarp/nplusone> exist in part to flag
missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin
will also produce queries like this because it's very
difficult for them to know what needs prefetching without
being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django
so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour
but obviously it needs to be proved first so for now I'd
suggest a new queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used
successfully in production. I'm not posting it yet because
I'd like to focus on desired behavior rather than
implementation details. But in summary, what it does is
when accessing a missing field on a model, rather than
f

Re: Automatic prefetching in querysets

2017-08-15 Thread Josh Smeaton

ed? What happens when an initial loop in a 
>> view requires one prefetch, but a subsequent loop in a template requires 
>> some other prefetch? What about nested loops resulting in nested 
>> prefetches? Code like this is almost guaranteed to break unexpectedly in 
>> multiple ways. Personally, I would argue that correctly setting up and 
>> maintaining appropriate prefetches and selects is a necessary part of 
>> working with an ORM. 
>>
>> Do you know of any other ORMs which attempt similar magical 
>> optimisations? How do they go about identifying the cases where it is 
>> necessary?
>>
>> On 15 August 2017 at 10:44, Gordon Wrigley > > wrote:
>>
>>> I'd like to discuss automatic prefetching in querysets. Specifically 
>>> automatically doing prefetch_related where needed without the user having 
>>> to request it.
>>>
>>> For context consider these three snippets using the Question & Choice 
>>> models from the tutorial 
>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>> when 
>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>
>>> Default
>>> for choice in Choice.objects.all():
>>> print(choice.question.question_text, ':', choice.choice_text)
>>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>>
>>> Prefetch_related
>>> for choice in Choice.objects.prefetch_related('question'):
>>> print(choice.question.question_text, ':', choice.choice_text)
>>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>>
>>> Select_related
>>> for choice in Choice.objects.select_related('question'):
>>> print(choice.question.question_text, ':', choice.choice_text)
>>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>>
>>> I've included select_related for completeness, I'm not going to propose 
>>> changing anything about it's use. There are places where it is the best 
>>> choice and in those places it will still be up to the user to request it. I 
>>> will note that anywhere select_related is optimal prefetch_related is still 
>>> better than the default and leave it at that.
>>>
>>> The 'Default' example above is a classic example of the N+1 query 
>>> problem, a problem that is widespread in Django apps.
>>> This pattern of queries is what new users produce because they don't 
>>> know enough about the database and / or ORM to do otherwise.
>>> Experieced users will also often produce this because it's not always 
>>> obvious what fields will and won't be used and subsequently what should be 
>>> prefetched.
>>> Additionally that list will change over time. A small change to a 
>>> template to display an extra field can result in a denial of service on 
>>> your DB due to a missing prefetch.
>>> Identifying missing prefetches is fiddly, time consuming and error 
>>> prone. Tools like django-perf-rec 
>>> <https://github.com/YPlan/django-perf-rec> (which I was involved in 
>>> creating) and nplusone <https://github.com/jmcarp/nplusone> exist in 
>>> part to flag missing prefetches introduced by changed code.
>>> Finally libraries like Django Rest Framework and the Admin will also 
>>> produce queries like this because it's very difficult for them to know what 
>>> needs prefetching without being explicitly told by an experienced user.
>>>
>>> As hinted at the top I'd like to propose changing Django so the default 
>>> code behaves like the prefetch_related code.
>>> Longer term I think this should be the default behaviour but obviously 
>>> it needs to be proved first so for now I'd suggest a new queryset function 
>>> that enables this behaviour.
>>>
>>> I have a proof of concept of this mechanism that I've used successfully 
>>> in production. I'm not posting it yet because I'd like to focus on desired 
>>> behavior rather than implementation details. But in summary, what it does 
>>> is when accessing a missing field on a model, rather than fetching it just 
>>> for that instance, it runs a prefetch_related query to fetch it for all 
>>> peer instances that were fetched in the same queryset. So in the example 
>>> above it prefetches all Questions in one query.
>>>
>>> This

Re: Automatic prefetching in querysets

2017-08-15 Thread Anthony King

Automatically prefetching is something I feel should be avoided.

A common gripe I have with ORMs is they hide what's actually happening with
the database, resulting in beginners-going-on-intermediates building
libraries/systems that don't scale well.

We have several views in a dashboard, where a relation may be accessed once
or twice while iterating over a large python filtered queryset.
Prefetching this relation based on the original queryset has the potential
to add around 5 seconds to the response time (probably more, that table has
doubled in size since I last measured it).

I feel it would be better to optimise for your usecase, as apposed to try
to prevent uncalled-for behaviour.



On Aug 15, 2017 23:15, "Luke Plant"  wrote:

> I agree with Marc here that the proposed optimizations are 'magical'. I
> think when it comes to optimizations like these you simply cannot know in
> advance whether doing extra queries is going to a be an optimization or a
> pessimization. If I can come up with a single example where it would
> significantly decrease performance (either memory usage or speed) compared
> to the default (and I'm sure I can), then I would be strongly opposed to it
> ever being default behaviour.
>
> Concerning implementing it as an additional  QuerySet method like
> `auto_prefetch()` - I'm not sure what I think, I feel like it could get
> icky (i.e. increase our technical debt), due to the way it couples things
> together. I can't imagine ever wanting to use it, though, I would always
> prefer the manual option.
>
> Luke
>
>
>
> On 15/08/17 21:02, Marc Tamlyn wrote:
>
> Hi Gordon,
>
> Thanks for the suggestion.
>
> I'm not a fan of adding a layer that tries to be this clever. How would
> possible prefetches be identified? What happens when an initial loop in a
> view requires one prefetch, but a subsequent loop in a template requires
> some other prefetch? What about nested loops resulting in nested
> prefetches? Code like this is almost guaranteed to break unexpectedly in
> multiple ways. Personally, I would argue that correctly setting up and
> maintaining appropriate prefetches and selects is a necessary part of
> working with an ORM.
>
> Do you know of any other ORMs which attempt similar magical optimisations?
> How do they go about identifying the cases where it is necessary?
>
> On 15 August 2017 at 10:44, Gordon Wrigley 
> wrote:
>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>> For context consider these three snippets using the Question & Choice
>> models from the tutorial
>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>> when
>> there are 100 questions each with 5 choices for a total of 500 choices.
>>
>> Default
>> for choice in Choice.objects.all():
>> print(choice.question.question_text, ':', choice.choice_text)
>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>
>> Prefetch_related
>> for choice in Choice.objects.prefetch_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>
>> Select_related
>> for choice in Choice.objects.select_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>
>> I've included select_related for completeness, I'm not going to propose
>> changing anything about it's use. There are places where it is the best
>> choice and in those places it will still be up to the user to request it. I
>> will note that anywhere select_related is optimal prefetch_related is still
>> better than the default and leave it at that.
>>
>> The 'Default' example above is a classic example of the N+1 query
>> problem, a problem that is widespread in Django apps.
>> This pattern of queries is what new users produce because they don't know
>> enough about the database and / or ORM to do otherwise.
>> Experieced users will also often produce this because it's not always
>> obvious what fields will and won't be used and subsequently what should be
>> prefetched.
>> Additionally that list will change over time. A small change to a
>> template to display an extra field can result in a denial of service on
>> your DB due to a missing prefetch.
>> Identifyin

Re: Automatic prefetching in querysets

2017-08-15 Thread Adam Johnson

>>>> status quo.
>>>> However for that to be a complete solution Django would also need to
>>>> detect places where there are redundant prefetch_relateds.
>>>>
>>>> Additionally tools like the Admin and DRF would need to provide
>>>> adequate hooks for inserting these calls.
>>>> For example ModelAdmin.get_queryset is not really granular enough as
>>>> it's used by both the list and detail views which might touch quite
>>>> different sets of fields. (Although in practice what you generally do is
>>>> optimize the list view as that's the one that tends to explode)
>>>>
>>>> That aside I sincerely believe that the proposed approach is superior
>>>> to the current default behavior in the majority of cases and further more
>>>> doesn't fail as badly as the current behavior when it's not appropriate. I
>>>> expect that if implemented as an option then in time that belief would
>>>> prove itself.
>>>>
>>>> On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes  wrote:
>>>>
>>>>> Exploding query counts are definitely a pain point in Django, anything
>>>>> to improve that is definitely worth considering. They have been a problem
>>>>> in all Django projects I have seen.
>>>>>
>>>>> However I think the correct solution is for developers to correctly
>>>>> add select/prefetch calls. There is no general solution for automatically
>>>>> applying them that works for enough cases, and i think adding such a 
>>>>> method
>>>>> to querysets would be used incorrectly and too often.
>>>>>
>>>>> Perhaps a better solution would be for Django to detect these O(n)
>>>>> query cases and display intelligent warnings, with suggestions as to the
>>>>> correct select/prefetch calls to add. When debug mode is enabled we could
>>>>> detect repeated foreign key referencing from the same source.
>>>>>
>>>>> On 15 Aug 2017 19:44, "Gordon Wrigley"  wrote:
>>>>>
>>>>> Sorry maybe I wasn't clear enough about the proposed mechanism.
>>>>>
>>>>> Currently when you dereference a foreign key field on an object (so
>>>>> 'choice.question' in the examples above) if it doesn't have the value
>>>>> cached from an earlier access, prefetch_related or select_related then
>>>>> Django will automatically perform a db query to fetch it. After that the
>>>>> value will then be cached on the object for any future dereferences.
>>>>>
>>>>> This automatic fetching is the source the N+1 query problems and in my
>>>>> experience most gross performance problems in Django apps.
>>>>>
>>>>> The proposal essentially is to add a new queryset function that says
>>>>> for the group of objects fetched by this queryset, whenever one of these
>>>>> automatic foreign key queries happens on one of them instead of fetching
>>>>> the foreign key for just that one use the prefetch mechanism to fetch it
>>>>> for all of them.
>>>>> The presumption being that the vast majority of the time when you
>>>>> access a field on one object from a queryset result, probably you are 
>>>>> going
>>>>> to access the same field on many of the others as well.
>>>>>
>>>>> The implementation I've used in production does nest across foreign
>>>>> keys so something (admittedly contrived) like:
>>>>> for choice in Choice.objects.all():
>>>>> print(choice.question.author)
>>>>> Will produce 3 queries, one for all choices, one for the questions of
>>>>> those choices and one for the authors of those questions.
>>>>>
>>>>> It's worth noting that because these are foreign keys in their "to
>>>>> one" direction each of those queryset results will be at most the same 
>>>>> size
>>>>> (in rows) as the proceeding one and often (due to nulls and duplicates)
>>>>> smaller.
>>>>>
>>>>> I do not propose touching reverse foreign key or many2many fields as
>>>>> the generated queries could request substantially more rows from the DB
>>>>> than the original query and it's not at all clear how this mechanism would

Re: Automatic prefetching in querysets

2017-08-15 Thread Adam Johnson

definitely worth considering. They have been a problem
>>>> in all Django projects I have seen.
>>>>
>>>> However I think the correct solution is for developers to correctly add
>>>> select/prefetch calls. There is no general solution for automatically
>>>> applying them that works for enough cases, and i think adding such a method
>>>> to querysets would be used incorrectly and too often.
>>>>
>>>> Perhaps a better solution would be for Django to detect these O(n)
>>>> query cases and display intelligent warnings, with suggestions as to the
>>>> correct select/prefetch calls to add. When debug mode is enabled we could
>>>> detect repeated foreign key referencing from the same source.
>>>>
>>>> On 15 Aug 2017 19:44, "Gordon Wrigley"  wrote:
>>>>
>>>> Sorry maybe I wasn't clear enough about the proposed mechanism.
>>>>
>>>> Currently when you dereference a foreign key field on an object (so
>>>> 'choice.question' in the examples above) if it doesn't have the value
>>>> cached from an earlier access, prefetch_related or select_related then
>>>> Django will automatically perform a db query to fetch it. After that the
>>>> value will then be cached on the object for any future dereferences.
>>>>
>>>> This automatic fetching is the source the N+1 query problems and in my
>>>> experience most gross performance problems in Django apps.
>>>>
>>>> The proposal essentially is to add a new queryset function that says
>>>> for the group of objects fetched by this queryset, whenever one of these
>>>> automatic foreign key queries happens on one of them instead of fetching
>>>> the foreign key for just that one use the prefetch mechanism to fetch it
>>>> for all of them.
>>>> The presumption being that the vast majority of the time when you
>>>> access a field on one object from a queryset result, probably you are going
>>>> to access the same field on many of the others as well.
>>>>
>>>> The implementation I've used in production does nest across foreign
>>>> keys so something (admittedly contrived) like:
>>>> for choice in Choice.objects.all():
>>>> print(choice.question.author)
>>>> Will produce 3 queries, one for all choices, one for the questions of
>>>> those choices and one for the authors of those questions.
>>>>
>>>> It's worth noting that because these are foreign keys in their "to one"
>>>> direction each of those queryset results will be at most the same size (in
>>>> rows) as the proceeding one and often (due to nulls and duplicates) 
>>>> smaller.
>>>>
>>>> I do not propose touching reverse foreign key or many2many fields as
>>>> the generated queries could request substantially more rows from the DB
>>>> than the original query and it's not at all clear how this mechanism would
>>>> sanely interact with filtering etc. So this is purely about the forward
>>>> direction of foreign keys.
>>>>
>>>> I hope that clarifies my thinking some.
>>>>
>>>> Regards
>>>> G
>>>>
>>>> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
>>>> wrote:
>>>>
>>>>> Hi Gordon,
>>>>>
>>>>> Thanks for the suggestion.
>>>>>
>>>>> I'm not a fan of adding a layer that tries to be this clever. How
>>>>> would possible prefetches be identified? What happens when an initial loop
>>>>> in a view requires one prefetch, but a subsequent loop in a template
>>>>> requires some other prefetch? What about nested loops resulting in nested
>>>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>>>> multiple ways. Personally, I would argue that correctly setting up and
>>>>> maintaining appropriate prefetches and selects is a necessary part of
>>>>> working with an ORM.
>>>>>
>>>>> Do you know of any other ORMs which attempt similar magical
>>>>> optimisations? How do they go about identifying the cases where it is
>>>>> necessary?
>>>>>
>>>>> On 15 August 2017 at 10:44, Gordon Wrigley 
>>>>> wrote:
>>>>>
>>>>>> I'd like to discuss automatic prefetching

Re: Automatic prefetching in querysets

2017-08-15 Thread Sean Brant

I wonder if a solution similar to [1] from the rails world would satisfy
this request. Rather then doing anything 'magical' we instead log when we
detect things like accessing a related model that has not been pre-fetched.

[1] https://github.com/flyerhzm/bullet

On Tue, Aug 15, 2017 at 5:14 PM, Luke Plant  wrote:

> I agree with Marc here that the proposed optimizations are 'magical'. I
> think when it comes to optimizations like these you simply cannot know in
> advance whether doing extra queries is going to a be an optimization or a
> pessimization. If I can come up with a single example where it would
> significantly decrease performance (either memory usage or speed) compared
> to the default (and I'm sure I can), then I would be strongly opposed to it
> ever being default behaviour.
>
> Concerning implementing it as an additional  QuerySet method like
> `auto_prefetch()` - I'm not sure what I think, I feel like it could get
> icky (i.e. increase our technical debt), due to the way it couples things
> together. I can't imagine ever wanting to use it, though, I would always
> prefer the manual option.
>
> Luke
>
>
>
> On 15/08/17 21:02, Marc Tamlyn wrote:
>
> Hi Gordon,
>
> Thanks for the suggestion.
>
> I'm not a fan of adding a layer that tries to be this clever. How would
> possible prefetches be identified? What happens when an initial loop in a
> view requires one prefetch, but a subsequent loop in a template requires
> some other prefetch? What about nested loops resulting in nested
> prefetches? Code like this is almost guaranteed to break unexpectedly in
> multiple ways. Personally, I would argue that correctly setting up and
> maintaining appropriate prefetches and selects is a necessary part of
> working with an ORM.
>
> Do you know of any other ORMs which attempt similar magical optimisations?
> How do they go about identifying the cases where it is necessary?
>
> On 15 August 2017 at 10:44, Gordon Wrigley 
> wrote:
>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>> For context consider these three snippets using the Question & Choice
>> models from the tutorial
>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>> when
>> there are 100 questions each with 5 choices for a total of 500 choices.
>>
>> Default
>> for choice in Choice.objects.all():
>> print(choice.question.question_text, ':', choice.choice_text)
>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>
>> Prefetch_related
>> for choice in Choice.objects.prefetch_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>
>> Select_related
>> for choice in Choice.objects.select_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>
>> I've included select_related for completeness, I'm not going to propose
>> changing anything about it's use. There are places where it is the best
>> choice and in those places it will still be up to the user to request it. I
>> will note that anywhere select_related is optimal prefetch_related is still
>> better than the default and leave it at that.
>>
>> The 'Default' example above is a classic example of the N+1 query
>> problem, a problem that is widespread in Django apps.
>> This pattern of queries is what new users produce because they don't know
>> enough about the database and / or ORM to do otherwise.
>> Experieced users will also often produce this because it's not always
>> obvious what fields will and won't be used and subsequently what should be
>> prefetched.
>> Additionally that list will change over time. A small change to a
>> template to display an extra field can result in a denial of service on
>> your DB due to a missing prefetch.
>> Identifying missing prefetches is fiddly, time consuming and error prone.
>> Tools like django-perf-rec <https://github.com/YPlan/django-perf-rec>
>> (which I was involved in creating) and nplusone
>> <https://github.com/jmcarp/nplusone> exist in part to flag missing
>> prefetches introduced by changed code.
>> Finally libraries like Django Rest Framework and the Admin will also
>> produce queries l

Re: Automatic prefetching in querysets

2017-08-15 Thread Luke Plant

I agree with Marc here that the proposed optimizations are 'magical'. I 
think when it comes to optimizations like these you simply cannot know 
in advance whether doing extra queries is going to a be an optimization 
or a pessimization. If I can come up with a single example where it 
would significantly decrease performance (either memory usage or speed) 
compared to the default (and I'm sure I can), then I would be strongly 
opposed to it ever being default behaviour.


Concerning implementing it as an additional  QuerySet method like 
`auto_prefetch()` - I'm not sure what I think, I feel like it could get 
icky (i.e. increase our technical debt), due to the way it couples 
things together. I can't imagine ever wanting to use it, though, I would 
always prefer the manual option.


Luke



On 15/08/17 21:02, Marc Tamlyn wrote:

Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How 
would possible prefetches be identified? What happens when an initial 
loop in a view requires one prefetch, but a subsequent loop in a 
template requires some other prefetch? What about nested loops 
resulting in nested prefetches? Code like this is almost guaranteed to 
break unexpectedly in multiple ways. Personally, I would argue that 
correctly setting up and maintaining appropriate prefetches and 
selects is a necessary part of working with an ORM.


Do you know of any other ORMs which attempt similar magical 
optimisations? How do they go about identifying the cases where it is 
necessary?


On 15 August 2017 at 10:44, Gordon Wrigley <mailto:gordon.wrig...@gmail.com>> wrote:


I'd like to discuss automatic prefetching in querysets.
Specifically automatically doing prefetch_related where needed
without the user having to request it.

For context consider these three snippets using the Question &
Choice models from the tutorial
<https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
when
there are 100 questions each with 5 choices for a total of 500
choices.

Default
|
forchoice inChoice.objects.all():
print(choice.question.question_text,':',choice.choice_text)
|
501 db queries, fetches 500 choice rows and 500 question rows from
the DB

Prefetch_related
|
forchoice inChoice.objects.prefetch_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
2 db queries, fetches 500 choice rows and 100 question rows from
the DB

Select_related
|
forchoice inChoice.objects.select_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to
propose changing anything about it's use. There are places where
it is the best choice and in those places it will still be up to
the user to request it. I will note that anywhere select_related
is optimal prefetch_related is still better than the default and
leave it at that.

The 'Default' example above is a classic example of the N+1 query
problem, a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they
don't know enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not
always obvious what fields will and won't be used and subsequently
what should be prefetched.
Additionally that list will change over time. A small change to a
template to display an extra field can result in a denial of
service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error
prone. Tools like django-perf-rec
<https://github.com/YPlan/django-perf-rec> (which I was involved
in creating) and nplusone <https://github.com/jmcarp/nplusone>
exist in part to flag missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will
also produce queries like this because it's very difficult for
them to know what needs prefetching without being explicitly told
by an experienced user.

As hinted at the top I'd like to propose changing Django so the
default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but
obviously it needs to be proved first so for now I'd suggest a new
queryset function that enables this behaviour.

I have a proof of concept of this mechanism that I've used
successfully in production. I'm not posting it yet because I'd
like to focus on desired behavior rather than implementation
de

Re: Automatic prefetching in querysets

2017-08-15 Thread Josh Smeaton

;Gordon Wrigley" >> > wrote:
>>>
>>> Sorry maybe I wasn't clear enough about the proposed mechanism.
>>>
>>> Currently when you dereference a foreign key field on an object (so 
>>> 'choice.question' in the examples above) if it doesn't have the value 
>>> cached from an earlier access, prefetch_related or select_related then 
>>> Django will automatically perform a db query to fetch it. After that the 
>>> value will then be cached on the object for any future dereferences.
>>>
>>> This automatic fetching is the source the N+1 query problems and in my 
>>> experience most gross performance problems in Django apps.
>>>
>>> The proposal essentially is to add a new queryset function that says for 
>>> the group of objects fetched by this queryset, whenever one of these 
>>> automatic foreign key queries happens on one of them instead of fetching 
>>> the foreign key for just that one use the prefetch mechanism to fetch it 
>>> for all of them.
>>> The presumption being that the vast majority of the time when you access 
>>> a field on one object from a queryset result, probably you are going to 
>>> access the same field on many of the others as well.
>>>
>>> The implementation I've used in production does nest across foreign keys 
>>> so something (admittedly contrived) like:
>>> for choice in Choice.objects.all():
>>> print(choice.question.author)
>>> Will produce 3 queries, one for all choices, one for the questions of 
>>> those choices and one for the authors of those questions.
>>>
>>> It's worth noting that because these are foreign keys in their "to one" 
>>> direction each of those queryset results will be at most the same size (in 
>>> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>>>
>>> I do not propose touching reverse foreign key or many2many fields as the 
>>> generated queries could request substantially more rows from the DB than 
>>> the original query and it's not at all clear how this mechanism would 
>>> sanely interact with filtering etc. So this is purely about the forward 
>>> direction of foreign keys.
>>>
>>> I hope that clarifies my thinking some.
>>>
>>> Regards
>>> G
>>>
>>> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn >> > wrote:
>>>
>>>> Hi Gordon,
>>>>
>>>> Thanks for the suggestion.
>>>>
>>>> I'm not a fan of adding a layer that tries to be this clever. How would 
>>>> possible prefetches be identified? What happens when an initial loop in a 
>>>> view requires one prefetch, but a subsequent loop in a template requires 
>>>> some other prefetch? What about nested loops resulting in nested 
>>>> prefetches? Code like this is almost guaranteed to break unexpectedly in 
>>>> multiple ways. Personally, I would argue that correctly setting up and 
>>>> maintaining appropriate prefetches and selects is a necessary part of 
>>>> working with an ORM.
>>>>
>>>> Do you know of any other ORMs which attempt similar magical 
>>>> optimisations? How do they go about identifying the cases where it is 
>>>> necessary?
>>>>
>>>> On 15 August 2017 at 10:44, Gordon Wrigley >>> > wrote:
>>>>
>>>>> I'd like to discuss automatic prefetching in querysets. Specifically 
>>>>> automatically doing prefetch_related where needed without the user having 
>>>>> to request it.
>>>>>
>>>>> For context consider these three snippets using the Question & Choice 
>>>>> models from the tutorial 
>>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models>
>>>>>  when 
>>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>>
>>>>> Default
>>>>> for choice in Choice.objects.all():
>>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>>> 501 db queries, fetches 500 choice rows and 500 question rows from the 
>>>>> DB
>>>>>
>>>>> Prefetch_related
>>>>> for choice in Choice.objects.prefetch_related('question'):
>>>>> print(choice.question.question_text, ':', choice.choice_text)
>>

Re: Automatic prefetching in querysets

2017-08-15 Thread Adam Johnson

oting that because these are foreign keys in their "to one"
>> direction each of those queryset results will be at most the same size (in
>> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>>
>> I do not propose touching reverse foreign key or many2many fields as the
>> generated queries could request substantially more rows from the DB than
>> the original query and it's not at all clear how this mechanism would
>> sanely interact with filtering etc. So this is purely about the forward
>> direction of foreign keys.
>>
>> I hope that clarifies my thinking some.
>>
>> Regards
>> G
>>
>> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
>> wrote:
>>
>>> Hi Gordon,
>>>
>>> Thanks for the suggestion.
>>>
>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>> possible prefetches be identified? What happens when an initial loop in a
>>> view requires one prefetch, but a subsequent loop in a template requires
>>> some other prefetch? What about nested loops resulting in nested
>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>> multiple ways. Personally, I would argue that correctly setting up and
>>> maintaining appropriate prefetches and selects is a necessary part of
>>> working with an ORM.
>>>
>>> Do you know of any other ORMs which attempt similar magical
>>> optimisations? How do they go about identifying the cases where it is
>>> necessary?
>>>
>>> On 15 August 2017 at 10:44, Gordon Wrigley 
>>> wrote:
>>>
>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>> automatically doing prefetch_related where needed without the user having
>>>> to request it.
>>>>
>>>> For context consider these three snippets using the Question & Choice
>>>> models from the tutorial
>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>>> when
>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>
>>>> Default
>>>> for choice in Choice.objects.all():
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 501 db queries, fetches 500 choice rows and 500 question rows from the
>>>> DB
>>>>
>>>> Prefetch_related
>>>> for choice in Choice.objects.prefetch_related('question'):
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>>>
>>>> Select_related
>>>> for choice in Choice.objects.select_related('question'):
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>>>
>>>> I've included select_related for completeness, I'm not going to propose
>>>> changing anything about it's use. There are places where it is the best
>>>> choice and in those places it will still be up to the user to request it. I
>>>> will note that anywhere select_related is optimal prefetch_related is still
>>>> better than the default and leave it at that.
>>>>
>>>> The 'Default' example above is a classic example of the N+1 query
>>>> problem, a problem that is widespread in Django apps.
>>>> This pattern of queries is what new users produce because they don't
>>>> know enough about the database and / or ORM to do otherwise.
>>>> Experieced users will also often produce this because it's not always
>>>> obvious what fields will and won't be used and subsequently what should be
>>>> prefetched.
>>>> Additionally that list will change over time. A small change to a
>>>> template to display an extra field can result in a denial of service on
>>>> your DB due to a missing prefetch.
>>>> Identifying missing prefetches is fiddly, time consuming and error
>>>> prone. Tools like django-perf-rec
>>>> <https://github.com/YPlan/django-perf-rec> (which I was involved in
>>>> creating) and nplusone <https://github.com/jmcarp/nplusone> exist in
>>>> part to flag missing prefetches introduced by changed code.
>>>> Finally libraries like Djan

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley

The warnings you propose would certainly be an improvement on the status
quo.
However for that to be a complete solution Django would also need to detect
places where there are redundant prefetch_relateds.

Additionally tools like the Admin and DRF would need to provide adequate
hooks for inserting these calls.
For example ModelAdmin.get_queryset is not really granular enough as it's
used by both the list and detail views which might touch quite different
sets of fields. (Although in practice what you generally do is optimize the
list view as that's the one that tends to explode)

That aside I sincerely believe that the proposed approach is superior to
the current default behavior in the majority of cases and further more
doesn't fail as badly as the current behavior when it's not appropriate. I
expect that if implemented as an option then in time that belief would
prove itself.

On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes  wrote:

> Exploding query counts are definitely a pain point in Django, anything to
> improve that is definitely worth considering. They have been a problem in
> all Django projects I have seen.
>
> However I think the correct solution is for developers to correctly add
> select/prefetch calls. There is no general solution for automatically
> applying them that works for enough cases, and i think adding such a method
> to querysets would be used incorrectly and too often.
>
> Perhaps a better solution would be for Django to detect these O(n) query
> cases and display intelligent warnings, with suggestions as to the correct
> select/prefetch calls to add. When debug mode is enabled we could detect
> repeated foreign key referencing from the same source.
>
> On 15 Aug 2017 19:44, "Gordon Wrigley"  wrote:
>
> Sorry maybe I wasn't clear enough about the proposed mechanism.
>
> Currently when you dereference a foreign key field on an object (so
> 'choice.question' in the examples above) if it doesn't have the value
> cached from an earlier access, prefetch_related or select_related then
> Django will automatically perform a db query to fetch it. After that the
> value will then be cached on the object for any future dereferences.
>
> This automatic fetching is the source the N+1 query problems and in my
> experience most gross performance problems in Django apps.
>
> The proposal essentially is to add a new queryset function that says for
> the group of objects fetched by this queryset, whenever one of these
> automatic foreign key queries happens on one of them instead of fetching
> the foreign key for just that one use the prefetch mechanism to fetch it
> for all of them.
> The presumption being that the vast majority of the time when you access a
> field on one object from a queryset result, probably you are going to
> access the same field on many of the others as well.
>
> The implementation I've used in production does nest across foreign keys
> so something (admittedly contrived) like:
> for choice in Choice.objects.all():
> print(choice.question.author)
> Will produce 3 queries, one for all choices, one for the questions of
> those choices and one for the authors of those questions.
>
> It's worth noting that because these are foreign keys in their "to one"
> direction each of those queryset results will be at most the same size (in
> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>
> I do not propose touching reverse foreign key or many2many fields as the
> generated queries could request substantially more rows from the DB than
> the original query and it's not at all clear how this mechanism would
> sanely interact with filtering etc. So this is purely about the forward
> direction of foreign keys.
>
> I hope that clarifies my thinking some.
>
> Regards
> G
>
> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
> wrote:
>
>> Hi Gordon,
>>
>> Thanks for the suggestion.
>>
>> I'm not a fan of adding a layer that tries to be this clever. How would
>> possible prefetches be identified? What happens when an initial loop in a
>> view requires one prefetch, but a subsequent loop in a template requires
>> some other prefetch? What about nested loops resulting in nested
>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>> multiple ways. Personally, I would argue that correctly setting up and
>> maintaining appropriate prefetches and selects is a necessary part of
>> working with an ORM.
>>
>> Do you know of any other ORMs which attempt similar magical
>> optimisations? How do they go about identifying the cases where it is
>> necessary?
>>
>> On 15 August 2017 at 10

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley

I didn't answer your questions directly. Sorry for the quoting but it's the
easiest way to deal with a half dozen questions.

> How would possible prefetches be identified?

Wherever we currently automatically fetch a foreign key value.

> What happens when an initial loop in a view requires one prefetch, but a
subsequent loop in a template requires some other prefetch?

They each do whatever prefetch they need, just as a human optimizing this
would add two prefetch clauses.

> What about nested loops resulting in nested prefetches?

Nested loops only really come up when you are dealing with RelatedManagers
which are outside the scope of this. Or did you have some other nested loop
case in mind?

>  I would argue that correctly setting up and maintaining appropriate
prefetches and selects is a necessary part of working with an ORM.

Having been lead engineer on a code base of ~100,000 lines with over 100
calls to prefetch_related and a lot of tests specifically for finding
missing ones I'd argue it's one of the worst aspects of working with
Djangos ORM at non trivial scale.

> Do you know of any other ORMs which attempt similar magical optimisations?


I don't, but unlike Django where I have years of experience I have next to
no experience with other ORM's.

Regards G

On Tue, Aug 15, 2017 at 7:44 PM, Gordon Wrigley 
wrote:

> Sorry maybe I wasn't clear enough about the proposed mechanism.
>
> Currently when you dereference a foreign key field on an object (so
> 'choice.question' in the examples above) if it doesn't have the value
> cached from an earlier access, prefetch_related or select_related then
> Django will automatically perform a db query to fetch it. After that the
> value will then be cached on the object for any future dereferences.
>
> This automatic fetching is the source the N+1 query problems and in my
> experience most gross performance problems in Django apps.
>
> The proposal essentially is to add a new queryset function that says for
> the group of objects fetched by this queryset, whenever one of these
> automatic foreign key queries happens on one of them instead of fetching
> the foreign key for just that one use the prefetch mechanism to fetch it
> for all of them.
> The presumption being that the vast majority of the time when you access a
> field on one object from a queryset result, probably you are going to
> access the same field on many of the others as well.
>
> The implementation I've used in production does nest across foreign keys
> so something (admittedly contrived) like:
> for choice in Choice.objects.all():
> print(choice.question.author)
> Will produce 3 queries, one for all choices, one for the questions of
> those choices and one for the authors of those questions.
>
> It's worth noting that because these are foreign keys in their "to one"
> direction each of those queryset results will be at most the same size (in
> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>
> I do not propose touching reverse foreign key or many2many fields as the
> generated queries could request substantially more rows from the DB than
> the original query and it's not at all clear how this mechanism would
> sanely interact with filtering etc. So this is purely about the forward
> direction of foreign keys.
>
> I hope that clarifies my thinking some.
>
> Regards
> G
>
> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
> wrote:
>
>> Hi Gordon,
>>
>> Thanks for the suggestion.
>>
>> I'm not a fan of adding a layer that tries to be this clever. How would
>> possible prefetches be identified? What happens when an initial loop in a
>> view requires one prefetch, but a subsequent loop in a template requires
>> some other prefetch? What about nested loops resulting in nested
>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>> multiple ways. Personally, I would argue that correctly setting up and
>> maintaining appropriate prefetches and selects is a necessary part of
>> working with an ORM.
>>
>> Do you know of any other ORMs which attempt similar magical
>> optimisations? How do they go about identifying the cases where it is
>> necessary?
>>
>> On 15 August 2017 at 10:44, Gordon Wrigley 
>> wrote:
>>
>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>> automatically doing prefetch_related where needed without the user having
>>> to request it.
>>>
>>> For context consider these three snippets using the Question & Choice
>>> models from the tutorial
>>> <https://docs.djangoproject.com/en/1.11/i

Re: Automatic prefetching in querysets

2017-08-15 Thread Tom Forbes

Exploding query counts are definitely a pain point in Django, anything to
improve that is definitely worth considering. They have been a problem in
all Django projects I have seen.

However I think the correct solution is for developers to correctly add
select/prefetch calls. There is no general solution for automatically
applying them that works for enough cases, and i think adding such a method
to querysets would be used incorrectly and too often.

Perhaps a better solution would be for Django to detect these O(n) query
cases and display intelligent warnings, with suggestions as to the correct
select/prefetch calls to add. When debug mode is enabled we could detect
repeated foreign key referencing from the same source.

On 15 Aug 2017 19:44, "Gordon Wrigley"  wrote:

Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so
'choice.question' in the examples above) if it doesn't have the value
cached from an earlier access, prefetch_related or select_related then
Django will automatically perform a db query to fetch it. After that the
value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my
experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for
the group of objects fetched by this queryset, whenever one of these
automatic foreign key queries happens on one of them instead of fetching
the foreign key for just that one use the prefetch mechanism to fetch it
for all of them.
The presumption being that the vast majority of the time when you access a
field on one object from a queryset result, probably you are going to
access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so
something (admittedly contrived) like:
for choice in Choice.objects.all():
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those
choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one"
direction each of those queryset results will be at most the same size (in
rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the
generated queries could request substantially more rows from the DB than
the original query and it's not at all clear how this mechanism would
sanely interact with filtering etc. So this is purely about the forward
direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn  wrote:

> Hi Gordon,
>
> Thanks for the suggestion.
>
> I'm not a fan of adding a layer that tries to be this clever. How would
> possible prefetches be identified? What happens when an initial loop in a
> view requires one prefetch, but a subsequent loop in a template requires
> some other prefetch? What about nested loops resulting in nested
> prefetches? Code like this is almost guaranteed to break unexpectedly in
> multiple ways. Personally, I would argue that correctly setting up and
> maintaining appropriate prefetches and selects is a necessary part of
> working with an ORM.
>
> Do you know of any other ORMs which attempt similar magical optimisations?
> How do they go about identifying the cases where it is necessary?
>
> On 15 August 2017 at 10:44, Gordon Wrigley 
> wrote:
>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>> For context consider these three snippets using the Question & Choice
>> models from the tutorial
>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>> when
>> there are 100 questions each with 5 choices for a total of 500 choices.
>>
>> Default
>> for choice in Choice.objects.all():
>> print(choice.question.question_text, ':', choice.choice_text)
>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>
>> Prefetch_related
>> for choice in Choice.objects.prefetch_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>
>> Select_related
>> for choice in Choice.objects.select_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley

In my current version each object keeps a reference to a WeakSet of the
results of the queryset it came from.
This is populated in _fetch_all and if it is populated then
ForwardManyToOneDescriptor does a prefetch across all the objects in the
WeakSet instead of it's regular fetching.

On Tue, Aug 15, 2017 at 8:03 PM, Collin Anderson 
wrote:

> Hi Gordon,
>
> How is it implemented? Does each object keep a reference to the queryset
> it came from?
>
> Collin
>
> On Tue, Aug 15, 2017 at 2:44 PM, Gordon Wrigley 
> wrote:
>
>> Sorry maybe I wasn't clear enough about the proposed mechanism.
>>
>> Currently when you dereference a foreign key field on an object (so
>> 'choice.question' in the examples above) if it doesn't have the value
>> cached from an earlier access, prefetch_related or select_related then
>> Django will automatically perform a db query to fetch it. After that the
>> value will then be cached on the object for any future dereferences.
>>
>> This automatic fetching is the source the N+1 query problems and in my
>> experience most gross performance problems in Django apps.
>>
>> The proposal essentially is to add a new queryset function that says for
>> the group of objects fetched by this queryset, whenever one of these
>> automatic foreign key queries happens on one of them instead of fetching
>> the foreign key for just that one use the prefetch mechanism to fetch it
>> for all of them.
>> The presumption being that the vast majority of the time when you access
>> a field on one object from a queryset result, probably you are going to
>> access the same field on many of the others as well.
>>
>> The implementation I've used in production does nest across foreign keys
>> so something (admittedly contrived) like:
>> for choice in Choice.objects.all():
>> print(choice.question.author)
>> Will produce 3 queries, one for all choices, one for the questions of
>> those choices and one for the authors of those questions.
>>
>> It's worth noting that because these are foreign keys in their "to one"
>> direction each of those queryset results will be at most the same size (in
>> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>>
>> I do not propose touching reverse foreign key or many2many fields as the
>> generated queries could request substantially more rows from the DB than
>> the original query and it's not at all clear how this mechanism would
>> sanely interact with filtering etc. So this is purely about the forward
>> direction of foreign keys.
>>
>> I hope that clarifies my thinking some.
>>
>> Regards
>> G
>>
>> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
>> wrote:
>>
>>> Hi Gordon,
>>>
>>> Thanks for the suggestion.
>>>
>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>> possible prefetches be identified? What happens when an initial loop in a
>>> view requires one prefetch, but a subsequent loop in a template requires
>>> some other prefetch? What about nested loops resulting in nested
>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>> multiple ways. Personally, I would argue that correctly setting up and
>>> maintaining appropriate prefetches and selects is a necessary part of
>>> working with an ORM.
>>>
>>> Do you know of any other ORMs which attempt similar magical
>>> optimisations? How do they go about identifying the cases where it is
>>> necessary?
>>>
>>> On 15 August 2017 at 10:44, Gordon Wrigley 
>>> wrote:
>>>
>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>> automatically doing prefetch_related where needed without the user having
>>>> to request it.
>>>>
>>>> For context consider these three snippets using the Question & Choice
>>>> models from the tutorial
>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>>> when
>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>
>>>> Default
>>>> for choice in Choice.objects.all():
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 501 db queries, fetches 500 choice rows and 500 question rows from the
>>>> DB
>>>>
>>>> Prefetch_related
>>>> for choice in Choice.objects.

Re: Automatic prefetching in querysets

2017-08-15 Thread Collin Anderson

Hi Gordon,

How is it implemented? Does each object keep a reference to the queryset it
came from?

Collin

On Tue, Aug 15, 2017 at 2:44 PM, Gordon Wrigley 
wrote:

> Sorry maybe I wasn't clear enough about the proposed mechanism.
>
> Currently when you dereference a foreign key field on an object (so
> 'choice.question' in the examples above) if it doesn't have the value
> cached from an earlier access, prefetch_related or select_related then
> Django will automatically perform a db query to fetch it. After that the
> value will then be cached on the object for any future dereferences.
>
> This automatic fetching is the source the N+1 query problems and in my
> experience most gross performance problems in Django apps.
>
> The proposal essentially is to add a new queryset function that says for
> the group of objects fetched by this queryset, whenever one of these
> automatic foreign key queries happens on one of them instead of fetching
> the foreign key for just that one use the prefetch mechanism to fetch it
> for all of them.
> The presumption being that the vast majority of the time when you access a
> field on one object from a queryset result, probably you are going to
> access the same field on many of the others as well.
>
> The implementation I've used in production does nest across foreign keys
> so something (admittedly contrived) like:
> for choice in Choice.objects.all():
> print(choice.question.author)
> Will produce 3 queries, one for all choices, one for the questions of
> those choices and one for the authors of those questions.
>
> It's worth noting that because these are foreign keys in their "to one"
> direction each of those queryset results will be at most the same size (in
> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>
> I do not propose touching reverse foreign key or many2many fields as the
> generated queries could request substantially more rows from the DB than
> the original query and it's not at all clear how this mechanism would
> sanely interact with filtering etc. So this is purely about the forward
> direction of foreign keys.
>
> I hope that clarifies my thinking some.
>
> Regards
> G
>
> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
> wrote:
>
>> Hi Gordon,
>>
>> Thanks for the suggestion.
>>
>> I'm not a fan of adding a layer that tries to be this clever. How would
>> possible prefetches be identified? What happens when an initial loop in a
>> view requires one prefetch, but a subsequent loop in a template requires
>> some other prefetch? What about nested loops resulting in nested
>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>> multiple ways. Personally, I would argue that correctly setting up and
>> maintaining appropriate prefetches and selects is a necessary part of
>> working with an ORM.
>>
>> Do you know of any other ORMs which attempt similar magical
>> optimisations? How do they go about identifying the cases where it is
>> necessary?
>>
>> On 15 August 2017 at 10:44, Gordon Wrigley 
>> wrote:
>>
>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>> automatically doing prefetch_related where needed without the user having
>>> to request it.
>>>
>>> For context consider these three snippets using the Question & Choice
>>> models from the tutorial
>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>> when
>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>
>>> Default
>>> for choice in Choice.objects.all():
>>> print(choice.question.question_text, ':', choice.choice_text)
>>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>>
>>> Prefetch_related
>>> for choice in Choice.objects.prefetch_related('question'):
>>> print(choice.question.question_text, ':', choice.choice_text)
>>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>>
>>> Select_related
>>> for choice in Choice.objects.select_related('question'):
>>> print(choice.question.question_text, ':', choice.choice_text)
>>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>>
>>> I've included select_related for completeness, I'm not going to propose
>>> changing anything about it's use. There are places where it is the best
>>> choice and in those plac

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley

Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so
'choice.question' in the examples above) if it doesn't have the value
cached from an earlier access, prefetch_related or select_related then
Django will automatically perform a db query to fetch it. After that the
value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my
experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for
the group of objects fetched by this queryset, whenever one of these
automatic foreign key queries happens on one of them instead of fetching
the foreign key for just that one use the prefetch mechanism to fetch it
for all of them.
The presumption being that the vast majority of the time when you access a
field on one object from a queryset result, probably you are going to
access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so
something (admittedly contrived) like:
for choice in Choice.objects.all():
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those
choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one"
direction each of those queryset results will be at most the same size (in
rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the
generated queries could request substantially more rows from the DB than
the original query and it's not at all clear how this mechanism would
sanely interact with filtering etc. So this is purely about the forward
direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn  wrote:

> Hi Gordon,
>
> Thanks for the suggestion.
>
> I'm not a fan of adding a layer that tries to be this clever. How would
> possible prefetches be identified? What happens when an initial loop in a
> view requires one prefetch, but a subsequent loop in a template requires
> some other prefetch? What about nested loops resulting in nested
> prefetches? Code like this is almost guaranteed to break unexpectedly in
> multiple ways. Personally, I would argue that correctly setting up and
> maintaining appropriate prefetches and selects is a necessary part of
> working with an ORM.
>
> Do you know of any other ORMs which attempt similar magical optimisations?
> How do they go about identifying the cases where it is necessary?
>
> On 15 August 2017 at 10:44, Gordon Wrigley 
> wrote:
>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>> For context consider these three snippets using the Question & Choice
>> models from the tutorial
>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>> when
>> there are 100 questions each with 5 choices for a total of 500 choices.
>>
>> Default
>> for choice in Choice.objects.all():
>> print(choice.question.question_text, ':', choice.choice_text)
>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>
>> Prefetch_related
>> for choice in Choice.objects.prefetch_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>
>> Select_related
>> for choice in Choice.objects.select_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>
>> I've included select_related for completeness, I'm not going to propose
>> changing anything about it's use. There are places where it is the best
>> choice and in those places it will still be up to the user to request it. I
>> will note that anywhere select_related is optimal prefetch_related is still
>> better than the default and leave it at that.
>>
>> The 'Default' example above is a classic example of the N+1 query
>> problem, a problem that is widespread in Django apps.
>> This pattern of queries is what new users produce because they don't know
>> enough about the database and / or ORM to do otherwise.
>> Experieced users will also often produce this because it's no

Re: Automatic prefetching in querysets

2017-08-15 Thread Marc Tamlyn

Hi Gordon,

Thanks for the suggestion.

I'm not a fan of adding a layer that tries to be this clever. How would
possible prefetches be identified? What happens when an initial loop in a
view requires one prefetch, but a subsequent loop in a template requires
some other prefetch? What about nested loops resulting in nested
prefetches? Code like this is almost guaranteed to break unexpectedly in
multiple ways. Personally, I would argue that correctly setting up and
maintaining appropriate prefetches and selects is a necessary part of
working with an ORM.

Do you know of any other ORMs which attempt similar magical optimisations?
How do they go about identifying the cases where it is necessary?

On 15 August 2017 at 10:44, Gordon Wrigley  wrote:

> I'd like to discuss automatic prefetching in querysets. Specifically
> automatically doing prefetch_related where needed without the user having
> to request it.
>
> For context consider these three snippets using the Question & Choice
> models from the tutorial
> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
> when
> there are 100 questions each with 5 choices for a total of 500 choices.
>
> Default
> for choice in Choice.objects.all():
> print(choice.question.question_text, ':', choice.choice_text)
> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>
> Prefetch_related
> for choice in Choice.objects.prefetch_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>
> Select_related
> for choice in Choice.objects.select_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>
> I've included select_related for completeness, I'm not going to propose
> changing anything about it's use. There are places where it is the best
> choice and in those places it will still be up to the user to request it. I
> will note that anywhere select_related is optimal prefetch_related is still
> better than the default and leave it at that.
>
> The 'Default' example above is a classic example of the N+1 query problem,
> a problem that is widespread in Django apps.
> This pattern of queries is what new users produce because they don't know
> enough about the database and / or ORM to do otherwise.
> Experieced users will also often produce this because it's not always
> obvious what fields will and won't be used and subsequently what should be
> prefetched.
> Additionally that list will change over time. A small change to a template
> to display an extra field can result in a denial of service on your DB due
> to a missing prefetch.
> Identifying missing prefetches is fiddly, time consuming and error prone.
> Tools like django-perf-rec <https://github.com/YPlan/django-perf-rec>
> (which I was involved in creating) and nplusone
> <https://github.com/jmcarp/nplusone> exist in part to flag missing
> prefetches introduced by changed code.
> Finally libraries like Django Rest Framework and the Admin will also
> produce queries like this because it's very difficult for them to know what
> needs prefetching without being explicitly told by an experienced user.
>
> As hinted at the top I'd like to propose changing Django so the default
> code behaves like the prefetch_related code.
> Longer term I think this should be the default behaviour but obviously it
> needs to be proved first so for now I'd suggest a new queryset function
> that enables this behaviour.
>
> I have a proof of concept of this mechanism that I've used successfully in
> production. I'm not posting it yet because I'd like to focus on desired
> behavior rather than implementation details. But in summary, what it does
> is when accessing a missing field on a model, rather than fetching it just
> for that instance, it runs a prefetch_related query to fetch it for all
> peer instances that were fetched in the same queryset. So in the example
> above it prefetches all Questions in one query.
>
> This might seem like a risky thing to do but I'd argue that it really
> isn't.
> The only time this isn't superior to the default case is when you are post
> filtering the queryset results in Python.
> Even in that case it's only inferior if you started with a large number of
> results, filtered basically all of them and the code is structured so that
> the filtered ones aren't garbage collected.
> To cover this rare case the automatic prefetching can easily be disabled
> on a per q

Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley

I'd like to discuss automatic prefetching in querysets. Specifically
automatically doing prefetch_related where needed without the user having
to request it.

For context consider these three snippets using the Question & Choice
models from the tutorial
<https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> when
there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose
changing anything about it's use. There are places where it is the best
choice and in those places it will still be up to the user to request it. I
will note that anywhere select_related is optimal prefetch_related is still
better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem,
a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know
enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always
obvious what fields will and won't be used and subsequently what should be
prefetched.
Additionally that list will change over time. A small change to a template
to display an extra field can result in a denial of service on your DB due
to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone.
Tools like django-perf-rec <https://github.com/YPlan/django-perf-rec>
(which I was involved in creating) and nplusone
<https://github.com/jmcarp/nplusone> exist in part to flag missing
prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also
produce queries like this because it's very difficult for them to know what
needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default
code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it
needs to be proved first so for now I'd suggest a new queryset function
that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in
production. I'm not posting it yet because I'd like to focus on desired
behavior rather than implementation details. But in summary, what it does
is when accessing a missing field on a model, rather than fetching it just
for that instance, it runs a prefetch_related query to fetch it for all
peer instances that were fetched in the same queryset. So in the example
above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post
filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of
results, filtered basically all of them and the code is structured so that
the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on
a per queryset or per object basis. Leaving us with a rare downside that
can easily be manually resolved in exchange for a significant general
improvement.

In practice this thing is almost magical to work with. Unless you already
have extensive and tightly maintained prefetches everywhere you get an
immediate boost to virtually everything that touches the database, often
knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in
the work to productize the proof of concept.

--
You received this message because you are subscribed to the Google Groups
"Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

46 matches

Mail list logo