Re: QuerySet.iterator together with prefetch_related because of chunk_size

2020-05-22 Thread Adam Johnson
After over a year, it's probably reasonable to create a new PR. You can use
Co-Authored-By to credit the original author:
https://github.blog/2018-01-29-commit-together-with-co-authors/

On Fri, 22 May 2020 at 05:43, 'Taylor Hakes' via Django developers
(Contributions to Django itself)  wrote:

> That makes sense to me. It is a little weird, but it’s the best option.
>
> I will post a comment on the PR. I am happy to update the code if he’s not
> available.
>
> On Wed, May 20, 2020 at 9:59 PM charettes  wrote:
>
>> Taylor, I think that 2. is the way forward.
>>
>> It looks like the linked PR doesn't ensure a chunk_size is specified to
>> turn on the feature though. I like Adam's idea to first warn and don't do a
>> prefetch if chunk_size is not specified and eventually turn that into an
>> exception. What do you think of it?
>>
>> Simon
>>
>> Le mercredi 20 mai 2020 19:25:21 UTC-4, Taylor a écrit :
>>>
>>> Checking in here. What are the next steps to make a decision?
>>>
>>> On Wednesday, February 13, 2019 at 2:06:29 PM UTC-5, Taylor wrote:

 I agree that 2 makes sense and isn't a huge usability issue. It looks
 like the PR is ready to go whenever this is decided
 https://github.com/django/django/pull/10707/files

 On Monday, January 14, 2019 at 5:29:18 PM UTC-5, Adam Johnson wrote:
>
> I agree with your reasoning, and also favour option 2 now, especially
> since the default can break on sqlite.
>
> It would also be possible to go through a deprecation period to first
> raise a warning and not prefetch, before a later version raises an
> exception, which is probably kinder since previously it was documented 
> that
> it just was a no-op.
>
> On Mon, 14 Jan 2019 at 19:03, charettes  wrote:
>
>> > This seems reasonable, but would you do in the case chunk_size
>> isn't explicitly defined - throw an exception?
>>
>> That's would be the plan.
>>
>> > Currently it silently fails to prefetch which means N + 1 queries,
>> so even prefetching for the default chunk_size of 2000 would be a huge 
>> gain
>> in cases where chunk_size isn't defined.
>>
>> That's assuming related relationships are actually accessed during
>> iteration over the objects. If it's not currently the case turning that 
>> on
>> would cause P + C extra queries.
>>
>> To summarize here's the options we have:
>>
>> 1. Always turn on the prefetching feature.
>> 2. Require chunk_size to be explicitly specified for a deprecation
>> period in order to enable the feature.
>>
>> Pros/Cons of 1. is that developers who actually didn't follow the
>> documentation and/or didn't assert that prefetching didn't work and left
>> some lingering prefetch_related() will probably see their code perform 
>> less
>> queries but result in a slight increase in memory (given they discard
>> objects on iterating). This is also likely to break code because of some
>> backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 
>> 1000)[0]
>>
>> Pros/Cons of 2. is that most prefetch_related().iterator() won't
>> break for developers that followed the documentation and fail for others
>> requiring them to acknowledge what their code is going to start doing.
>>
>> As I've expressed in my previous messages I'm slightly in favor of 2
>> even if it is likely to cause more breakage because of the exceptional
>> situation where this API should have raised an exception from the 
>> beginning.
>>
>> Cheers,
>> Simon
>>
>> [0] https://code.djangoproject.com/ticket/27833
>>
>> Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :
>>>
>>> ...what if we required chunk_size to be explicitly specified when
 the queryset has prefetch lookups?
>>>
>>>
>>> This seems reasonable, but would you do in the case chunk_size isn't
>>> explicitly defined - throw an exception? Currently it silently fails to
>>> prefetch which means N + 1 queries, so even prefetching for the default
>>> chunk_size of 2000 would be a huge gain in cases where chunk_size isn't
>>> defined.
>>>
>>> On Sun, 13 Jan 2019 at 02:05, charettes  wrote:
>>>
 Replying to my concerns about the P * C queries.

 Throwing that idea out there but what if we required chunk_size to
 be explicitly specified when the queryset has prefetch lookups?

 That would at least force the developers to consider how many
 prefetching queries iterating through the results would require. Plus 
 since
 the argument was only recently introduced it is less likely to silently
 cause changes under developers feet.

 Simon

 Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>
> Josh, I agree that 

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2020-05-21 Thread 'Taylor Hakes' via Django developers (Contributions to Django itself)
That makes sense to me. It is a little weird, but it’s the best option.

I will post a comment on the PR. I am happy to update the code if he’s not
available.

On Wed, May 20, 2020 at 9:59 PM charettes  wrote:

> Taylor, I think that 2. is the way forward.
>
> It looks like the linked PR doesn't ensure a chunk_size is specified to
> turn on the feature though. I like Adam's idea to first warn and don't do a
> prefetch if chunk_size is not specified and eventually turn that into an
> exception. What do you think of it?
>
> Simon
>
> Le mercredi 20 mai 2020 19:25:21 UTC-4, Taylor a écrit :
>>
>> Checking in here. What are the next steps to make a decision?
>>
>> On Wednesday, February 13, 2019 at 2:06:29 PM UTC-5, Taylor wrote:
>>>
>>> I agree that 2 makes sense and isn't a huge usability issue. It looks
>>> like the PR is ready to go whenever this is decided
>>> https://github.com/django/django/pull/10707/files
>>>
>>> On Monday, January 14, 2019 at 5:29:18 PM UTC-5, Adam Johnson wrote:

 I agree with your reasoning, and also favour option 2 now, especially
 since the default can break on sqlite.

 It would also be possible to go through a deprecation period to first
 raise a warning and not prefetch, before a later version raises an
 exception, which is probably kinder since previously it was documented that
 it just was a no-op.

 On Mon, 14 Jan 2019 at 19:03, charettes  wrote:

> > This seems reasonable, but would you do in the case chunk_size isn't
> explicitly defined - throw an exception?
>
> That's would be the plan.
>
> > Currently it silently fails to prefetch which means N + 1 queries,
> so even prefetching for the default chunk_size of 2000 would be a huge 
> gain
> in cases where chunk_size isn't defined.
>
> That's assuming related relationships are actually accessed during
> iteration over the objects. If it's not currently the case turning that on
> would cause P + C extra queries.
>
> To summarize here's the options we have:
>
> 1. Always turn on the prefetching feature.
> 2. Require chunk_size to be explicitly specified for a deprecation
> period in order to enable the feature.
>
> Pros/Cons of 1. is that developers who actually didn't follow the
> documentation and/or didn't assert that prefetching didn't work and left
> some lingering prefetch_related() will probably see their code perform 
> less
> queries but result in a slight increase in memory (given they discard
> objects on iterating). This is also likely to break code because of some
> backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 1000)[0]
>
> Pros/Cons of 2. is that most prefetch_related().iterator() won't break
> for developers that followed the documentation and fail for others
> requiring them to acknowledge what their code is going to start doing.
>
> As I've expressed in my previous messages I'm slightly in favor of 2
> even if it is likely to cause more breakage because of the exceptional
> situation where this API should have raised an exception from the 
> beginning.
>
> Cheers,
> Simon
>
> [0] https://code.djangoproject.com/ticket/27833
>
> Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :
>>
>> ...what if we required chunk_size to be explicitly specified when the
>>> queryset has prefetch lookups?
>>
>>
>> This seems reasonable, but would you do in the case chunk_size isn't
>> explicitly defined - throw an exception? Currently it silently fails to
>> prefetch which means N + 1 queries, so even prefetching for the default
>> chunk_size of 2000 would be a huge gain in cases where chunk_size isn't
>> defined.
>>
>> On Sun, 13 Jan 2019 at 02:05, charettes  wrote:
>>
>>> Replying to my concerns about the P * C queries.
>>>
>>> Throwing that idea out there but what if we required chunk_size to
>>> be explicitly specified when the queryset has prefetch lookups?
>>>
>>> That would at least force the developers to consider how many
>>> prefetching queries iterating through the results would require. Plus 
>>> since
>>> the argument was only recently introduced it is less likely to silently
>>> cause changes under developers feet.
>>>
>>> Simon
>>>
>>> Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :

 Josh, I agree that silently not working is problematic but it has
 been this way since prefetch_related() was introduced.

 Something to keep in mind as well is that silently turning it on
 would also perform P * C extra queries where P is the number of 
 prefetches
 requested through prefetch_related() and C the number of chunks the 
 results
 contains. This is non negligible IMO.

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2020-05-20 Thread charettes
Taylor, I think that 2. is the way forward.

It looks like the linked PR doesn't ensure a chunk_size is specified to 
turn on the feature though. I like Adam's idea to first warn and don't do a 
prefetch if chunk_size is not specified and eventually turn that into an 
exception. What do you think of it?

Simon

Le mercredi 20 mai 2020 19:25:21 UTC-4, Taylor a écrit :
>
> Checking in here. What are the next steps to make a decision?
>
> On Wednesday, February 13, 2019 at 2:06:29 PM UTC-5, Taylor wrote:
>>
>> I agree that 2 makes sense and isn't a huge usability issue. It looks 
>> like the PR is ready to go whenever this is decided 
>> https://github.com/django/django/pull/10707/files
>>
>> On Monday, January 14, 2019 at 5:29:18 PM UTC-5, Adam Johnson wrote:
>>>
>>> I agree with your reasoning, and also favour option 2 now, especially 
>>> since the default can break on sqlite.
>>>
>>> It would also be possible to go through a deprecation period to first 
>>> raise a warning and not prefetch, before a later version raises an 
>>> exception, which is probably kinder since previously it was documented that 
>>> it just was a no-op.
>>>
>>> On Mon, 14 Jan 2019 at 19:03, charettes  wrote:
>>>
 > This seems reasonable, but would you do in the case chunk_size isn't 
 explicitly defined - throw an exception?

 That's would be the plan.

 > Currently it silently fails to prefetch which means N + 1 queries, so 
 even prefetching for the default chunk_size of 2000 would be a huge gain 
 in 
 cases where chunk_size isn't defined.

 That's assuming related relationships are actually accessed during 
 iteration over the objects. If it's not currently the case turning that on 
 would cause P + C extra queries.

 To summarize here's the options we have:

 1. Always turn on the prefetching feature.
 2. Require chunk_size to be explicitly specified for a deprecation 
 period in order to enable the feature.

 Pros/Cons of 1. is that developers who actually didn't follow the 
 documentation and/or didn't assert that prefetching didn't work and left 
 some lingering prefetch_related() will probably see their code perform 
 less 
 queries but result in a slight increase in memory (given they discard 
 objects on iterating). This is also likely to break code because of some 
 backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 1000)[0]

 Pros/Cons of 2. is that most prefetch_related().iterator() won't break 
 for developers that followed the documentation and fail for others 
 requiring them to acknowledge what their code is going to start doing.

 As I've expressed in my previous messages I'm slightly in favor of 2 
 even if it is likely to cause more breakage because of the exceptional 
 situation where this API should have raised an exception from the 
 beginning.

 Cheers,
 Simon

 [0] https://code.djangoproject.com/ticket/27833

 Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :
>
> ...what if we required chunk_size to be explicitly specified when the 
>> queryset has prefetch lookups?
>
>
> This seems reasonable, but would you do in the case chunk_size isn't 
> explicitly defined - throw an exception? Currently it silently fails to 
> prefetch which means N + 1 queries, so even prefetching for the default 
> chunk_size of 2000 would be a huge gain in cases where chunk_size isn't 
> defined.
>
> On Sun, 13 Jan 2019 at 02:05, charettes  wrote:
>
>> Replying to my concerns about the P * C queries.
>>
>> Throwing that idea out there but what if we required chunk_size to be 
>> explicitly specified when the queryset has prefetch lookups?
>>
>> That would at least force the developers to consider how many 
>> prefetching queries iterating through the results would require. Plus 
>> since 
>> the argument was only recently introduced it is less likely to silently 
>> cause changes under developers feet.
>>
>> Simon
>>
>> Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>>>
>>> Josh, I agree that silently not working is problematic but it has 
>>> been this way since prefetch_related() was introduced.
>>>
>>> Something to keep in mind as well is that silently turning it on 
>>> would also perform P * C extra queries where P is the number of 
>>> prefetches 
>>> requested through prefetch_related() and C the number of chunks the 
>>> results 
>>> contains. This is non negligible IMO.
>>>
>>> What I'd be in favor off is raising an exception on 
>>> prefetch_related(*prefetches).iterator() in the next release at least 
>>> to 
>>> allow users using this pattern to adjust their code and then ship the 
>>> optimization with all the warnings related to the 

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2020-05-20 Thread 'Taylor' via Django developers (Contributions to Django itself)
Checking in here. What are the next steps to make a decision?

On Wednesday, February 13, 2019 at 2:06:29 PM UTC-5, Taylor wrote:
>
> I agree that 2 makes sense and isn't a huge usability issue. It looks like 
> the PR is ready to go whenever this is decided 
> https://github.com/django/django/pull/10707/files
>
> On Monday, January 14, 2019 at 5:29:18 PM UTC-5, Adam Johnson wrote:
>>
>> I agree with your reasoning, and also favour option 2 now, especially 
>> since the default can break on sqlite.
>>
>> It would also be possible to go through a deprecation period to first 
>> raise a warning and not prefetch, before a later version raises an 
>> exception, which is probably kinder since previously it was documented that 
>> it just was a no-op.
>>
>> On Mon, 14 Jan 2019 at 19:03, charettes  wrote:
>>
>>> > This seems reasonable, but would you do in the case chunk_size isn't 
>>> explicitly defined - throw an exception?
>>>
>>> That's would be the plan.
>>>
>>> > Currently it silently fails to prefetch which means N + 1 queries, so 
>>> even prefetching for the default chunk_size of 2000 would be a huge gain in 
>>> cases where chunk_size isn't defined.
>>>
>>> That's assuming related relationships are actually accessed during 
>>> iteration over the objects. If it's not currently the case turning that on 
>>> would cause P + C extra queries.
>>>
>>> To summarize here's the options we have:
>>>
>>> 1. Always turn on the prefetching feature.
>>> 2. Require chunk_size to be explicitly specified for a deprecation 
>>> period in order to enable the feature.
>>>
>>> Pros/Cons of 1. is that developers who actually didn't follow the 
>>> documentation and/or didn't assert that prefetching didn't work and left 
>>> some lingering prefetch_related() will probably see their code perform less 
>>> queries but result in a slight increase in memory (given they discard 
>>> objects on iterating). This is also likely to break code because of some 
>>> backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 1000)[0]
>>>
>>> Pros/Cons of 2. is that most prefetch_related().iterator() won't break 
>>> for developers that followed the documentation and fail for others 
>>> requiring them to acknowledge what their code is going to start doing.
>>>
>>> As I've expressed in my previous messages I'm slightly in favor of 2 
>>> even if it is likely to cause more breakage because of the exceptional 
>>> situation where this API should have raised an exception from the beginning.
>>>
>>> Cheers,
>>> Simon
>>>
>>> [0] https://code.djangoproject.com/ticket/27833
>>>
>>> Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :

 ...what if we required chunk_size to be explicitly specified when the 
> queryset has prefetch lookups?


 This seems reasonable, but would you do in the case chunk_size isn't 
 explicitly defined - throw an exception? Currently it silently fails to 
 prefetch which means N + 1 queries, so even prefetching for the default 
 chunk_size of 2000 would be a huge gain in cases where chunk_size isn't 
 defined.

 On Sun, 13 Jan 2019 at 02:05, charettes  wrote:

> Replying to my concerns about the P * C queries.
>
> Throwing that idea out there but what if we required chunk_size to be 
> explicitly specified when the queryset has prefetch lookups?
>
> That would at least force the developers to consider how many 
> prefetching queries iterating through the results would require. Plus 
> since 
> the argument was only recently introduced it is less likely to silently 
> cause changes under developers feet.
>
> Simon
>
> Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>>
>> Josh, I agree that silently not working is problematic but it has 
>> been this way since prefetch_related() was introduced.
>>
>> Something to keep in mind as well is that silently turning it on 
>> would also perform P * C extra queries where P is the number of 
>> prefetches 
>> requested through prefetch_related() and C the number of chunks the 
>> results 
>> contains. This is non negligible IMO.
>>
>> What I'd be in favor off is raising an exception on 
>> prefetch_related(*prefetches).iterator() in the next release at least to 
>> allow users using this pattern to adjust their code and then ship the 
>> optimization with all the warnings related to the interactions between 
>> prefetch_related(*prefetches) and iterator(chunk_size) in the next one.
>>
>> I'm not completely completely against skipping the exception release 
>> phase entirely given there might be users out there accessing attributes 
>> expected to be prefetched on iterated instances and inadvertently 
>> performing tons of queries but the exception phase just feels safer 
>> given 
>> iterator() is usually used in memory constrained contexts.
>>

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2019-02-13 Thread Taylor
I agree that 2 makes sense and isn't a huge usability issue. It looks like 
the PR is ready to go whenever this is 
decided https://github.com/django/django/pull/10707/files

On Monday, January 14, 2019 at 5:29:18 PM UTC-5, Adam Johnson wrote:
>
> I agree with your reasoning, and also favour option 2 now, especially 
> since the default can break on sqlite.
>
> It would also be possible to go through a deprecation period to first 
> raise a warning and not prefetch, before a later version raises an 
> exception, which is probably kinder since previously it was documented that 
> it just was a no-op.
>
> On Mon, 14 Jan 2019 at 19:03, charettes > 
> wrote:
>
>> > This seems reasonable, but would you do in the case chunk_size isn't 
>> explicitly defined - throw an exception?
>>
>> That's would be the plan.
>>
>> > Currently it silently fails to prefetch which means N + 1 queries, so 
>> even prefetching for the default chunk_size of 2000 would be a huge gain in 
>> cases where chunk_size isn't defined.
>>
>> That's assuming related relationships are actually accessed during 
>> iteration over the objects. If it's not currently the case turning that on 
>> would cause P + C extra queries.
>>
>> To summarize here's the options we have:
>>
>> 1. Always turn on the prefetching feature.
>> 2. Require chunk_size to be explicitly specified for a deprecation period 
>> in order to enable the feature.
>>
>> Pros/Cons of 1. is that developers who actually didn't follow the 
>> documentation and/or didn't assert that prefetching didn't work and left 
>> some lingering prefetch_related() will probably see their code perform less 
>> queries but result in a slight increase in memory (given they discard 
>> objects on iterating). This is also likely to break code because of some 
>> backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 1000)[0]
>>
>> Pros/Cons of 2. is that most prefetch_related().iterator() won't break 
>> for developers that followed the documentation and fail for others 
>> requiring them to acknowledge what their code is going to start doing.
>>
>> As I've expressed in my previous messages I'm slightly in favor of 2 even 
>> if it is likely to cause more breakage because of the exceptional situation 
>> where this API should have raised an exception from the beginning.
>>
>> Cheers,
>> Simon
>>
>> [0] https://code.djangoproject.com/ticket/27833
>>
>> Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :
>>>
>>> ...what if we required chunk_size to be explicitly specified when the 
 queryset has prefetch lookups?
>>>
>>>
>>> This seems reasonable, but would you do in the case chunk_size isn't 
>>> explicitly defined - throw an exception? Currently it silently fails to 
>>> prefetch which means N + 1 queries, so even prefetching for the default 
>>> chunk_size of 2000 would be a huge gain in cases where chunk_size isn't 
>>> defined.
>>>
>>> On Sun, 13 Jan 2019 at 02:05, charettes  wrote:
>>>
 Replying to my concerns about the P * C queries.

 Throwing that idea out there but what if we required chunk_size to be 
 explicitly specified when the queryset has prefetch lookups?

 That would at least force the developers to consider how many 
 prefetching queries iterating through the results would require. Plus 
 since 
 the argument was only recently introduced it is less likely to silently 
 cause changes under developers feet.

 Simon

 Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>
> Josh, I agree that silently not working is problematic but it has been 
> this way since prefetch_related() was introduced.
>
> Something to keep in mind as well is that silently turning it on would 
> also perform P * C extra queries where P is the number of prefetches 
> requested through prefetch_related() and C the number of chunks the 
> results 
> contains. This is non negligible IMO.
>
> What I'd be in favor off is raising an exception on 
> prefetch_related(*prefetches).iterator() in the next release at least to 
> allow users using this pattern to adjust their code and then ship the 
> optimization with all the warnings related to the interactions between 
> prefetch_related(*prefetches) and iterator(chunk_size) in the next one.
>
> I'm not completely completely against skipping the exception release 
> phase entirely given there might be users out there accessing attributes 
> expected to be prefetched on iterated instances and inadvertently 
> performing tons of queries but the exception phase just feels safer given 
> iterator() is usually used in memory constrained contexts.
>
> Simon
>
> Le vendredi 26 octobre 2018 19:27:55 UTC-4, Josh Smeaton a écrit :
>>
>> I tend to agree with Tobi. Prefetching silently not working on 
>> iterator can be quite confusing, unless you have a good understanding of 

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2019-01-14 Thread Adam Johnson
I agree with your reasoning, and also favour option 2 now, especially since
the default can break on sqlite.

It would also be possible to go through a deprecation period to first raise
a warning and not prefetch, before a later version raises an exception,
which is probably kinder since previously it was documented that it just
was a no-op.

On Mon, 14 Jan 2019 at 19:03, charettes  wrote:

> > This seems reasonable, but would you do in the case chunk_size isn't
> explicitly defined - throw an exception?
>
> That's would be the plan.
>
> > Currently it silently fails to prefetch which means N + 1 queries, so
> even prefetching for the default chunk_size of 2000 would be a huge gain in
> cases where chunk_size isn't defined.
>
> That's assuming related relationships are actually accessed during
> iteration over the objects. If it's not currently the case turning that on
> would cause P + C extra queries.
>
> To summarize here's the options we have:
>
> 1. Always turn on the prefetching feature.
> 2. Require chunk_size to be explicitly specified for a deprecation period
> in order to enable the feature.
>
> Pros/Cons of 1. is that developers who actually didn't follow the
> documentation and/or didn't assert that prefetching didn't work and left
> some lingering prefetch_related() will probably see their code perform less
> queries but result in a slight increase in memory (given they discard
> objects on iterating). This is also likely to break code because of some
> backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 1000)[0]
>
> Pros/Cons of 2. is that most prefetch_related().iterator() won't break for
> developers that followed the documentation and fail for others requiring
> them to acknowledge what their code is going to start doing.
>
> As I've expressed in my previous messages I'm slightly in favor of 2 even
> if it is likely to cause more breakage because of the exceptional situation
> where this API should have raised an exception from the beginning.
>
> Cheers,
> Simon
>
> [0] https://code.djangoproject.com/ticket/27833
>
> Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :
>>
>> ...what if we required chunk_size to be explicitly specified when the
>>> queryset has prefetch lookups?
>>
>>
>> This seems reasonable, but would you do in the case chunk_size isn't
>> explicitly defined - throw an exception? Currently it silently fails to
>> prefetch which means N + 1 queries, so even prefetching for the default
>> chunk_size of 2000 would be a huge gain in cases where chunk_size isn't
>> defined.
>>
>> On Sun, 13 Jan 2019 at 02:05, charettes  wrote:
>>
>>> Replying to my concerns about the P * C queries.
>>>
>>> Throwing that idea out there but what if we required chunk_size to be
>>> explicitly specified when the queryset has prefetch lookups?
>>>
>>> That would at least force the developers to consider how many
>>> prefetching queries iterating through the results would require. Plus since
>>> the argument was only recently introduced it is less likely to silently
>>> cause changes under developers feet.
>>>
>>> Simon
>>>
>>> Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :

 Josh, I agree that silently not working is problematic but it has been
 this way since prefetch_related() was introduced.

 Something to keep in mind as well is that silently turning it on would
 also perform P * C extra queries where P is the number of prefetches
 requested through prefetch_related() and C the number of chunks the results
 contains. This is non negligible IMO.

 What I'd be in favor off is raising an exception on
 prefetch_related(*prefetches).iterator() in the next release at least to
 allow users using this pattern to adjust their code and then ship the
 optimization with all the warnings related to the interactions between
 prefetch_related(*prefetches) and iterator(chunk_size) in the next one.

 I'm not completely completely against skipping the exception release
 phase entirely given there might be users out there accessing attributes
 expected to be prefetched on iterated instances and inadvertently
 performing tons of queries but the exception phase just feels safer given
 iterator() is usually used in memory constrained contexts.

 Simon

 Le vendredi 26 octobre 2018 19:27:55 UTC-4, Josh Smeaton a écrit :
>
> I tend to agree with Tobi. Prefetching silently not working on
> iterator can be quite confusing, unless you have a good understanding of
> both APIs. It might be possible to do what you're asking, but it'd mean
> that django is now actually caching the result when it explicitly says it
> isn't - even if the result is a much smaller moving cache. Prefetching
> chunk_size results per chunk is unlikely to make a material difference to
> memory usage. Users are usually concerned about the entire result set of
> the primary 

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2019-01-14 Thread charettes
> This seems reasonable, but would you do in the case chunk_size isn't 
explicitly defined - throw an exception?

That's would be the plan.

> Currently it silently fails to prefetch which means N + 1 queries, so 
even prefetching for the default chunk_size of 2000 would be a huge gain in 
cases where chunk_size isn't defined.

That's assuming related relationships are actually accessed during 
iteration over the objects. If it's not currently the case turning that on 
would cause P + C extra queries.

To summarize here's the options we have:

1. Always turn on the prefetching feature.
2. Require chunk_size to be explicitly specified for a deprecation period 
in order to enable the feature.

Pros/Cons of 1. is that developers who actually didn't follow the 
documentation and/or didn't assert that prefetching didn't work and left 
some lingering prefetch_related() will probably see their code perform less 
queries but result in a slight increase in memory (given they discard 
objects on iterating). This is also likely to break code because of some 
backends (e.g. SQLite) query parameters limits (chunk_size=2000 > 1000)[0]

Pros/Cons of 2. is that most prefetch_related().iterator() won't break for 
developers that followed the documentation and fail for others requiring 
them to acknowledge what their code is going to start doing.

As I've expressed in my previous messages I'm slightly in favor of 2 even 
if it is likely to cause more breakage because of the exceptional situation 
where this API should have raised an exception from the beginning.

Cheers,
Simon

[0] https://code.djangoproject.com/ticket/27833

Le lundi 14 janvier 2019 09:53:45 UTC-5, Adam Johnson a écrit :
>
> ...what if we required chunk_size to be explicitly specified when the 
>> queryset has prefetch lookups?
>
>
> This seems reasonable, but would you do in the case chunk_size isn't 
> explicitly defined - throw an exception? Currently it silently fails to 
> prefetch which means N + 1 queries, so even prefetching for the default 
> chunk_size of 2000 would be a huge gain in cases where chunk_size isn't 
> defined.
>
> On Sun, 13 Jan 2019 at 02:05, charettes > 
> wrote:
>
>> Replying to my concerns about the P * C queries.
>>
>> Throwing that idea out there but what if we required chunk_size to be 
>> explicitly specified when the queryset has prefetch lookups?
>>
>> That would at least force the developers to consider how many prefetching 
>> queries iterating through the results would require. Plus since the 
>> argument was only recently introduced it is less likely to silently cause 
>> changes under developers feet.
>>
>> Simon
>>
>> Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>>>
>>> Josh, I agree that silently not working is problematic but it has been 
>>> this way since prefetch_related() was introduced.
>>>
>>> Something to keep in mind as well is that silently turning it on would 
>>> also perform P * C extra queries where P is the number of prefetches 
>>> requested through prefetch_related() and C the number of chunks the results 
>>> contains. This is non negligible IMO.
>>>
>>> What I'd be in favor off is raising an exception on 
>>> prefetch_related(*prefetches).iterator() in the next release at least to 
>>> allow users using this pattern to adjust their code and then ship the 
>>> optimization with all the warnings related to the interactions between 
>>> prefetch_related(*prefetches) and iterator(chunk_size) in the next one.
>>>
>>> I'm not completely completely against skipping the exception release 
>>> phase entirely given there might be users out there accessing attributes 
>>> expected to be prefetched on iterated instances and inadvertently 
>>> performing tons of queries but the exception phase just feels safer given 
>>> iterator() is usually used in memory constrained contexts.
>>>
>>> Simon
>>>
>>> Le vendredi 26 octobre 2018 19:27:55 UTC-4, Josh Smeaton a écrit :

 I tend to agree with Tobi. Prefetching silently not working on iterator 
 can be quite confusing, unless you have a good understanding of both APIs. 
 It might be possible to do what you're asking, but it'd mean that django 
 is 
 now actually caching the result when it explicitly says it isn't - even if 
 the result is a much smaller moving cache. Prefetching chunk_size results 
 per chunk is unlikely to make a material difference to memory usage. Users 
 are usually concerned about the entire result set of the primary table.

 I don't know if you can change the API to make these suggested changes 
 without also impacting how we iterate over result sets - but I'd be 
 interested in seeing a proof of concept at the very least.



 On Monday, 15 October 2018 20:41:13 UTC+11, tobias@truffls.com 
 wrote:
>
> Thank you for your feedback. I would like to answer some statements to 
> either convince you or make it more clear, where my idea stems 

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2019-01-14 Thread Adam Johnson
>
> ...what if we required chunk_size to be explicitly specified when the
> queryset has prefetch lookups?


This seems reasonable, but would you do in the case chunk_size isn't
explicitly defined - throw an exception? Currently it silently fails to
prefetch which means N + 1 queries, so even prefetching for the default
chunk_size of 2000 would be a huge gain in cases where chunk_size isn't
defined.

On Sun, 13 Jan 2019 at 02:05, charettes  wrote:

> Replying to my concerns about the P * C queries.
>
> Throwing that idea out there but what if we required chunk_size to be
> explicitly specified when the queryset has prefetch lookups?
>
> That would at least force the developers to consider how many prefetching
> queries iterating through the results would require. Plus since the
> argument was only recently introduced it is less likely to silently cause
> changes under developers feet.
>
> Simon
>
> Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>>
>> Josh, I agree that silently not working is problematic but it has been
>> this way since prefetch_related() was introduced.
>>
>> Something to keep in mind as well is that silently turning it on would
>> also perform P * C extra queries where P is the number of prefetches
>> requested through prefetch_related() and C the number of chunks the results
>> contains. This is non negligible IMO.
>>
>> What I'd be in favor off is raising an exception on
>> prefetch_related(*prefetches).iterator() in the next release at least to
>> allow users using this pattern to adjust their code and then ship the
>> optimization with all the warnings related to the interactions between
>> prefetch_related(*prefetches) and iterator(chunk_size) in the next one.
>>
>> I'm not completely completely against skipping the exception release
>> phase entirely given there might be users out there accessing attributes
>> expected to be prefetched on iterated instances and inadvertently
>> performing tons of queries but the exception phase just feels safer given
>> iterator() is usually used in memory constrained contexts.
>>
>> Simon
>>
>> Le vendredi 26 octobre 2018 19:27:55 UTC-4, Josh Smeaton a écrit :
>>>
>>> I tend to agree with Tobi. Prefetching silently not working on iterator
>>> can be quite confusing, unless you have a good understanding of both APIs.
>>> It might be possible to do what you're asking, but it'd mean that django is
>>> now actually caching the result when it explicitly says it isn't - even if
>>> the result is a much smaller moving cache. Prefetching chunk_size results
>>> per chunk is unlikely to make a material difference to memory usage. Users
>>> are usually concerned about the entire result set of the primary table.
>>>
>>> I don't know if you can change the API to make these suggested changes
>>> without also impacting how we iterate over result sets - but I'd be
>>> interested in seeing a proof of concept at the very least.
>>>
>>>
>>>
>>> On Monday, 15 October 2018 20:41:13 UTC+11, tobias@truffls.com
>>> wrote:

 Thank you for your feedback. I would like to answer some statements to
 either convince you or make it more clear, where my idea stems from:

 The fundamental reason why iterator() cannot be used with
> prefetch_related() is that the latter requires a set of model instance to
> be materialized to work appropriately which chunk_size doesn't control at
> all.
> In other words chunk_size only controls how many rows should be
> fetched from the database cursor and kept into memory at a time. Even when
> this parameter is used, iterator() will only materialize a single model
> instance per yield.
>

 It should be easily possible to change the involved code of
 ModelIterable to materialize the retrieved rows in batches. After
 materializing the batch / chunk, it could do the prefetching.


> Given that iterator() always ignored prefetch related lookups instead
> of erroring out when they were specified make me feel like turning such a
> feature on by default could be problematic as it could balloon the memory
> usage which is the main reason why iterator is useful anyway.
>

 I would argue, that users who thoughtlessly applied prefetching
 together with iterator now actually get, what they thought of: less DB
 query round trips traded against a little more memory usage.

 Best,
 Tobi

>>> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> 

Re: QuerySet.iterator together with prefetch_related because of chunk_size

2019-01-12 Thread charettes
Replying to my concerns about the P * C queries.

Throwing that idea out there but what if we required chunk_size to be 
explicitly specified when the queryset has prefetch lookups?

That would at least force the developers to consider how many prefetching 
queries iterating through the results would require. Plus since the 
argument was only recently introduced it is less likely to silently cause 
changes under developers feet.

Simon

Le vendredi 26 octobre 2018 20:12:02 UTC-4, charettes a écrit :
>
> Josh, I agree that silently not working is problematic but it has been 
> this way since prefetch_related() was introduced.
>
> Something to keep in mind as well is that silently turning it on would 
> also perform P * C extra queries where P is the number of prefetches 
> requested through prefetch_related() and C the number of chunks the results 
> contains. This is non negligible IMO.
>
> What I'd be in favor off is raising an exception on 
> prefetch_related(*prefetches).iterator() in the next release at least to 
> allow users using this pattern to adjust their code and then ship the 
> optimization with all the warnings related to the interactions between 
> prefetch_related(*prefetches) and iterator(chunk_size) in the next one.
>
> I'm not completely completely against skipping the exception release phase 
> entirely given there might be users out there accessing attributes expected 
> to be prefetched on iterated instances and inadvertently performing tons of 
> queries but the exception phase just feels safer given iterator() is 
> usually used in memory constrained contexts.
>
> Simon
>
> Le vendredi 26 octobre 2018 19:27:55 UTC-4, Josh Smeaton a écrit :
>>
>> I tend to agree with Tobi. Prefetching silently not working on iterator 
>> can be quite confusing, unless you have a good understanding of both APIs. 
>> It might be possible to do what you're asking, but it'd mean that django is 
>> now actually caching the result when it explicitly says it isn't - even if 
>> the result is a much smaller moving cache. Prefetching chunk_size results 
>> per chunk is unlikely to make a material difference to memory usage. Users 
>> are usually concerned about the entire result set of the primary table.
>>
>> I don't know if you can change the API to make these suggested changes 
>> without also impacting how we iterate over result sets - but I'd be 
>> interested in seeing a proof of concept at the very least.
>>
>>
>>
>> On Monday, 15 October 2018 20:41:13 UTC+11, tobias@truffls.com wrote:
>>>
>>> Thank you for your feedback. I would like to answer some statements to 
>>> either convince you or make it more clear, where my idea stems from:
>>>
>>> The fundamental reason why iterator() cannot be used with 
 prefetch_related() is that the latter requires a set of model instance to 
 be materialized to work appropriately which chunk_size doesn't control at 
 all.
 In other words chunk_size only controls how many rows should be fetched 
 from the database cursor and kept into memory at a time. Even when this 
 parameter is used, iterator() will only materialize a single model 
 instance 
 per yield.

>>>  
>>> It should be easily possible to change the involved code of 
>>> ModelIterable to materialize the retrieved rows in batches. After 
>>> materializing the batch / chunk, it could do the prefetching.
>>>  
>>>
 Given that iterator() always ignored prefetch related lookups instead 
 of erroring out when they were specified make me feel like turning such a 
 feature on by default could be problematic as it could balloon the memory 
 usage which is the main reason why iterator is useful anyway.

>>>
>>> I would argue, that users who thoughtlessly applied prefetching together 
>>> with iterator now actually get, what they thought of: less DB query round 
>>> trips traded against a little more memory usage.
>>>
>>> Best,
>>> Tobi
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1f5d8b5a-8c7c-4575-b904-adfaa9d1900d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: QuerySet.iterator together with prefetch_related because of chunk_size

2018-11-23 Thread Tim Graham
https://code.djangoproject.com/ticket/29984 suggests to support 
prefetch_related() with QuerySet.iterator(). I accepted the ticket to do 
something and linked back to this discussion.

On Friday, October 26, 2018 at 8:12:02 PM UTC-4, charettes wrote:
>
> Josh, I agree that silently not working is problematic but it has been 
> this way since prefetch_related() was introduced.
>
> Something to keep in mind as well is that silently turning it on would 
> also perform P * C extra queries where P is the number of prefetches 
> requested through prefetch_related() and C the number of chunks the results 
> contains. This is non negligible IMO.
>
> What I'd be in favor off is raising an exception on 
> prefetch_related(*prefetches).iterator() in the next release at least to 
> allow users using this pattern to adjust their code and then ship the 
> optimization with all the warnings related to the interactions between 
> prefetch_related(*prefetches) and iterator(chunk_size) in the next one.
>
> I'm not completely completely against skipping the exception release phase 
> entirely given there might be users out there accessing attributes expected 
> to be prefetched on iterated instances and inadvertently performing tons of 
> queries but the exception phase just feels safer given iterator() is 
> usually used in memory constrained contexts.
>
> Simon
>
> Le vendredi 26 octobre 2018 19:27:55 UTC-4, Josh Smeaton a écrit :
>>
>> I tend to agree with Tobi. Prefetching silently not working on iterator 
>> can be quite confusing, unless you have a good understanding of both APIs. 
>> It might be possible to do what you're asking, but it'd mean that django is 
>> now actually caching the result when it explicitly says it isn't - even if 
>> the result is a much smaller moving cache. Prefetching chunk_size results 
>> per chunk is unlikely to make a material difference to memory usage. Users 
>> are usually concerned about the entire result set of the primary table.
>>
>> I don't know if you can change the API to make these suggested changes 
>> without also impacting how we iterate over result sets - but I'd be 
>> interested in seeing a proof of concept at the very least.
>>
>>
>>
>> On Monday, 15 October 2018 20:41:13 UTC+11, tobias@truffls.com wrote:
>>>
>>> Thank you for your feedback. I would like to answer some statements to 
>>> either convince you or make it more clear, where my idea stems from:
>>>
>>> The fundamental reason why iterator() cannot be used with 
 prefetch_related() is that the latter requires a set of model instance to 
 be materialized to work appropriately which chunk_size doesn't control at 
 all.
 In other words chunk_size only controls how many rows should be fetched 
 from the database cursor and kept into memory at a time. Even when this 
 parameter is used, iterator() will only materialize a single model 
 instance 
 per yield.

>>>  
>>> It should be easily possible to change the involved code of 
>>> ModelIterable to materialize the retrieved rows in batches. After 
>>> materializing the batch / chunk, it could do the prefetching.
>>>  
>>>
 Given that iterator() always ignored prefetch related lookups instead 
 of erroring out when they were specified make me feel like turning such a 
 feature on by default could be problematic as it could balloon the memory 
 usage which is the main reason why iterator is useful anyway.

>>>
>>> I would argue, that users who thoughtlessly applied prefetching together 
>>> with iterator now actually get, what they thought of: less DB query round 
>>> trips traded against a little more memory usage.
>>>
>>> Best,
>>> Tobi
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/32bbd4fa-40be-45fb-9008-ac25e9033bae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: QuerySet.iterator together with prefetch_related because of chunk_size

2018-10-26 Thread charettes
Josh, I agree that silently not working is problematic but it has been this 
way since prefetch_related() was introduced.

Something to keep in mind as well is that silently turning it on would also 
perform P * C extra queries where P is the number of prefetches requested 
through prefetch_related() and C the number of chunks the results contains. 
This is non negligible IMO.

What I'd be in favor off is raising an exception on 
prefetch_related(*prefetches).iterator() in the next release at least to 
allow users using this pattern to adjust their code and then ship the 
optimization with all the warnings related to the interactions between 
prefetch_related(*prefetches) and iterator(chunk_size) in the next one.

I'm not completely completely against skipping the exception release phase 
entirely given there might be users out there accessing attributes expected 
to be prefetched on iterated instances and inadvertently performing tons of 
queries but the exception phase just feels safer given iterator() is 
usually used in memory constrained contexts.

Simon

Le vendredi 26 octobre 2018 19:27:55 UTC-4, Josh Smeaton a écrit :
>
> I tend to agree with Tobi. Prefetching silently not working on iterator 
> can be quite confusing, unless you have a good understanding of both APIs. 
> It might be possible to do what you're asking, but it'd mean that django is 
> now actually caching the result when it explicitly says it isn't - even if 
> the result is a much smaller moving cache. Prefetching chunk_size results 
> per chunk is unlikely to make a material difference to memory usage. Users 
> are usually concerned about the entire result set of the primary table.
>
> I don't know if you can change the API to make these suggested changes 
> without also impacting how we iterate over result sets - but I'd be 
> interested in seeing a proof of concept at the very least.
>
>
>
> On Monday, 15 October 2018 20:41:13 UTC+11, tobias@truffls.com wrote:
>>
>> Thank you for your feedback. I would like to answer some statements to 
>> either convince you or make it more clear, where my idea stems from:
>>
>> The fundamental reason why iterator() cannot be used with 
>>> prefetch_related() is that the latter requires a set of model instance to 
>>> be materialized to work appropriately which chunk_size doesn't control at 
>>> all.
>>> In other words chunk_size only controls how many rows should be fetched 
>>> from the database cursor and kept into memory at a time. Even when this 
>>> parameter is used, iterator() will only materialize a single model instance 
>>> per yield.
>>>
>>  
>> It should be easily possible to change the involved code of ModelIterable 
>> to materialize the retrieved rows in batches. After materializing the batch 
>> / chunk, it could do the prefetching.
>>  
>>
>>> Given that iterator() always ignored prefetch related lookups instead of 
>>> erroring out when they were specified make me feel like turning such a 
>>> feature on by default could be problematic as it could balloon the memory 
>>> usage which is the main reason why iterator is useful anyway.
>>>
>>
>> I would argue, that users who thoughtlessly applied prefetching together 
>> with iterator now actually get, what they thought of: less DB query round 
>> trips traded against a little more memory usage.
>>
>> Best,
>> Tobi
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/2389b18a-ea33-40eb-b5b0-929ab02a468a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: QuerySet.iterator together with prefetch_related because of chunk_size

2018-10-26 Thread Josh Smeaton
I tend to agree with Tobi. Prefetching silently not working on iterator can 
be quite confusing, unless you have a good understanding of both APIs. It 
might be possible to do what you're asking, but it'd mean that django is 
now actually caching the result when it explicitly says it isn't - even if 
the result is a much smaller moving cache. Prefetching chunk_size results 
per chunk is unlikely to make a material difference to memory usage. Users 
are usually concerned about the entire result set of the primary table.

I don't know if you can change the API to make these suggested changes 
without also impacting how we iterate over result sets - but I'd be 
interested in seeing a proof of concept at the very least.



On Monday, 15 October 2018 20:41:13 UTC+11, tobias@truffls.com wrote:
>
> Thank you for your feedback. I would like to answer some statements to 
> either convince you or make it more clear, where my idea stems from:
>
> The fundamental reason why iterator() cannot be used with 
>> prefetch_related() is that the latter requires a set of model instance to 
>> be materialized to work appropriately which chunk_size doesn't control at 
>> all.
>> In other words chunk_size only controls how many rows should be fetched 
>> from the database cursor and kept into memory at a time. Even when this 
>> parameter is used, iterator() will only materialize a single model instance 
>> per yield.
>>
>  
> It should be easily possible to change the involved code of ModelIterable 
> to materialize the retrieved rows in batches. After materializing the batch 
> / chunk, it could do the prefetching.
>  
>
>> Given that iterator() always ignored prefetch related lookups instead of 
>> erroring out when they were specified make me feel like turning such a 
>> feature on by default could be problematic as it could balloon the memory 
>> usage which is the main reason why iterator is useful anyway.
>>
>
> I would argue, that users who thoughtlessly applied prefetching together 
> with iterator now actually get, what they thought of: less DB query round 
> trips traded against a little more memory usage.
>
> Best,
> Tobi
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/e7a36092-0743-41dc-998e-eee11fa1180b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: QuerySet.iterator together with prefetch_related because of chunk_size

2018-10-15 Thread tobias . kroenke
Thank you for your feedback. I would like to answer some statements to 
either convince you or make it more clear, where my idea stems from:

The fundamental reason why iterator() cannot be used with 
> prefetch_related() is that the latter requires a set of model instance to 
> be materialized to work appropriately which chunk_size doesn't control at 
> all.
> In other words chunk_size only controls how many rows should be fetched 
> from the database cursor and kept into memory at a time. Even when this 
> parameter is used, iterator() will only materialize a single model instance 
> per yield.
>
 
It should be easily possible to change the involved code of ModelIterable 
to materialize the retrieved rows in batches. After materializing the batch 
/ chunk, it could do the prefetching.
 

> Given that iterator() always ignored prefetch related lookups instead of 
> erroring out when they were specified make me feel like turning such a 
> feature on by default could be problematic as it could balloon the memory 
> usage which is the main reason why iterator is useful anyway.
>

I would argue, that users who thoughtlessly applied prefetching together 
with iterator now actually get, what they thought of: less DB query round 
trips traded against a little more memory usage.

Best,
Tobi

-- 





Truffls GmbH
Chausseestraße 86, 10115 Berlin 


https://truffls.de 



Vertretungsberechtigte 
Geschäftsführer: Clemens Dittrich, Matthes Dohmeyer Amtsgericht 
Charlottenburg, HRB 160036B


Diese Nachricht ist vertraulich. Sie darf 
ausschließlich durch den vorgesehenen Empfänger und Adressaten gelesen, 
kopiert oder genutzt werden. Sollten Sie diese Nachricht versehentlich 
erhalten haben, bitten wir, den Absender (durch Antwort-E-Mail) hiervon 
unverzüglich zu informieren und die Nachricht zu löschen. Jede Nutzung oder 
Weitergabe des Inhalts dieser Nachricht ist unzulässig.
This message 
(including any attachments) is confidential and may be privileged. It may 
be read, copied and used only by the intended recipient. If you have 
received it in error please contact the sender (by return e-mail) 
immediately and delete this message. Any unauthorized use or dissemination 
of this message in whole or in part is strictly prohibited.


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/bf7ce18b-db67-4bb9-9be6-2e6946201c58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: QuerySet.iterator together with prefetch_related because of chunk_size

2018-10-11 Thread Curtis Maloney

On 10/12/18 10:51 AM, charettes wrote:

Hello Tobias,

 From my understanding the introduction of chunk_size doest't help here.

The fundamental reason why iterator() cannot be used with 
prefetch_related() is that the latter requires a set of model instance 
to be materialized to work appropriately which chunk_size doesn't 
control at all.


In other words chunk_size only controls how many rows should be fetched 
from the database cursor and kept into memory at a time. Even when this 
parameter is used, iterator() will only materialize a single model 
instance per yield.


I inferred from the original post they were suggesting to do a set of 
prefetch_related "filling" queries _per_ _chunk_


It wouldn't be as efficient as the 1+P (where P is number of prefetched 
relations) of a normal prefetch_related, but at 1+(P x Chunks) it would 
_probably_ be more efficient than the 1+(P x N-rows) it would otherwise 
be [or worse!]


--
Curtis

--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fad10ba6-783d-bfbe-6b78-2500e475983f%40tinbrain.net.
For more options, visit https://groups.google.com/d/optout.


Re: QuerySet.iterator together with prefetch_related because of chunk_size

2018-10-11 Thread charettes
Hello Tobias,

>From my understanding the introduction of chunk_size doest't help here.

The fundamental reason why iterator() cannot be used with 
prefetch_related() is that the latter requires a set of model instance to 
be materialized to work appropriately which chunk_size doesn't control at 
all.

In other words chunk_size only controls how many rows should be fetched 
from the database cursor and kept into memory at a time. Even when this 
parameter is used, iterator() will only materialize a single model instance 
per yield.

Now, something you could do if you really want to be using prefetch_related 
with iterator() is to materialize chunks and use the now public 
prefetch_related_objects[0] helper to prefetch relationships.

A somewhat complete implementation would look like

def prefetch_related_iterator(queryset, *related_lookups, chunk_size=100):
iterator = queryset.iterator(chunk_size=chunk_size)
while True:
chunk = list(itertools.islice(iterator, chunk_size))
if not chunk:
return
prefetch_related_objects(chunk, *related_lookups)
yield from chunk

Given that iterator() always ignored prefetch related lookups instead of 
erroring out when they were specified make me feel like turning such a 
feature on by default could be problematic as it could balloon the memory 
usage which is the main reason why iterator is useful anyway.

Cheers,
Simon

[0] 
https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.prefetch_related_objects

Le jeudi 11 octobre 2018 15:44:06 UTC-4, tobias@truffls.com a écrit :
>
> Hi everyone!
>
> The docs (
> https://docs.djangoproject.com/en/2.1/ref/models/querysets/#iterator) 
> state that the
>
>  use of iterator() causes previous prefetch_related() calls to be ignored 
>> since these two optimizations do not make sense together.
>
>
> I am wondering, if this is still true with the introduction of the 
> *chunk_size* parameter. Having only one additional query per chunk could 
> heavily reduce the amount of queries per batch.
>
> Best regards
> Tobi
>
>
>
> *Truffls GmbH*
> Chausseestraße 86, 10115 Berlin 
> 
> https://truffls.de
>
> Vertretungsberechtigte Geschäftsführer: Clemens Dittrich, Matthes Dohmeyer 
> Amtsgericht Charlottenburg, HRB 160036B
>
> Diese Nachricht ist vertraulich. Sie darf ausschließlich durch den 
> vorgesehenen Empfänger und Adressaten gelesen, kopiert oder genutzt werden. 
> Sollten Sie diese Nachricht versehentlich erhalten haben, bitten wir, den 
> Absender (durch Antwort-E-Mail) hiervon unverzüglich zu informieren und die 
> Nachricht zu löschen. Jede Nutzung oder Weitergabe des Inhalts dieser 
> Nachricht ist unzulässig.
> This message (including any attachments) is confidential and may 
> be privileged. It may be read, copied and used only by the intended 
> recipient. If you have received it in error please contact the sender (by 
> return e-mail) immediately and delete this message. Any unauthorized use or 
> dissemination of this message in whole or in part is strictly prohibited.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/afe9c247-45ac-4f63-8673-2b8b4c337d54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


QuerySet.iterator together with prefetch_related because of chunk_size

2018-10-11 Thread tobias . kroenke
Hi everyone!

The docs 
(https://docs.djangoproject.com/en/2.1/ref/models/querysets/#iterator) 
state that the

 use of iterator() causes previous prefetch_related() calls to be ignored 
> since these two optimizations do not make sense together.


I am wondering, if this is still true with the introduction of the 
*chunk_size* parameter. Having only one additional query per chunk could 
heavily reduce the amount of queries per batch.

Best regards
Tobi

-- 





Truffls GmbH
Chausseestraße 86, 10115 Berlin 


https://truffls.de 



Vertretungsberechtigte 
Geschäftsführer: Clemens Dittrich, Matthes Dohmeyer Amtsgericht 
Charlottenburg, HRB 160036B


Diese Nachricht ist vertraulich. Sie darf 
ausschließlich durch den vorgesehenen Empfänger und Adressaten gelesen, 
kopiert oder genutzt werden. Sollten Sie diese Nachricht versehentlich 
erhalten haben, bitten wir, den Absender (durch Antwort-E-Mail) hiervon 
unverzüglich zu informieren und die Nachricht zu löschen. Jede Nutzung oder 
Weitergabe des Inhalts dieser Nachricht ist unzulässig.
This message 
(including any attachments) is confidential and may be privileged. It may 
be read, copied and used only by the intended recipient. If you have 
received it in error please contact the sender (by return e-mail) 
immediately and delete this message. Any unauthorized use or dissemination 
of this message in whole or in part is strictly prohibited.


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/985d983f-09e1-401c-beb8-cd9b6b60e484%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.