Re: Implement QuerySet.__contains__?

2020-06-05 Thread Shai Berger
On Fri, 5 Jun 2020 13:58:47 +0200
Johan Schiff  wrote:

> I still think the wrong call was made here [...]
> *(Assuming queryset should be evaluated is probably correct in
> most cases, but sometimes adds a big performance hit when the code
> goes into production and the dataset grows - code that looks
> reasonable and works like a charm in dev will cause trouble in
> production. Assuming the opposite would have added a tiny performance
> hit in most cases, but avoided the big one.)*
> 

FWIW, when that call was made, the default behavior was to fetch only
100 records at a time, so the performance hit for the cases you
describe was quite limited. This was changed later, as Tim mentioned in
this thread:

> Django 1.6 when chunked reads were removed from QuerySet iteration: 
> 
> https://github.com/django/django/commit/70679243d1786e03557c28929f9762a119e3ac14

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/20200605224258.783815ec.shai%40platonix.com.


Re: Implement QuerySet.__contains__?

2020-06-05 Thread Tim Graham
(You can reopen https://code.djangoproject.com/ticket/24141 rather than 
creating a new ticket.)

On Friday, June 5, 2020 at 9:06:48 AM UTC-4, Aymeric Augustin wrote:
>
> If people agree QuerySet.contains() should be added, how do we move 
> forward?
>
>
> Yes, if there's no major new argument in a couple days, we can assume 
> consensus on adding .contains().
>
> * Create a ticket in Trac. Point to this discussion and to the previous 
> ticket where .contains() was suggested.
> * Submit a pull request in GitHub with code, tests, docs. This is worth 
> mentioning in the changelog.
>
> There's a lot of details in the docs, in the Contributing Guide, about how 
> to do allD this right.
>
> -- 
> Aymeric.
>
>
>
> On 5 Jun 2020, at 13:58, Johan Schiff > 
> wrote:
>
> Shai does make a good point about changing a well documented behaviour. 
> That argument wins me over.
>
> Adding .contains() and updating the documentation goes a long way to make 
> it easier for developers.
>
> Best case would be that Dajngo does not make assumptions about what the 
> developer wants, but to implement __len__(), __bool__() and __contains__(), 
> we have to assume one method or the other. I still think the wrong call was 
> made here, but changing it now is too much pain. *(Assuming queryset 
> should be evaluated is probably correct in most cases, but sometimes adds a 
> big performance hit when the code goes into production and the dataset 
> grows - code that looks reasonable and works like a charm in dev will cause 
> trouble in production. Assuming the opposite would have added a tiny 
> performance hit in most cases, but avoided the big one.)*
>
> Since I'm new here:
> If people agree QuerySet.contains() should be added, how do we move 
> forward?
> I'd be willing to write some code with tests and add a ticket, if that's 
> helpful.
>
> ‪Le ven. 5 juin 2020 à 11:42, ‫אורי‬‎ > a 
> écrit :‬
>
>> Hi,
>>
>> I'm thinking, maybe instead of:
>>
>> if obj in queryset:
>>
>> Users should write:
>>
>> if obj in list(queryset):
>>
>> Which I understand is already working? Why does the first line (without 
>> list()) should work anyway?
>>
>> And if users want performance, why not write:
>>
>> if queryset.filter(pk=obj.pk).exists():
>> אורי
>> u...@speedy.net 
>>
>>
>> On Tue, Jun 2, 2020 at 11:57 AM Johan Schiff > > wrote:
>>
>>> Is there a specific reason Djangos QuerySet does not implement 
>>> __contains__? It doesn't seem very complicated to me, but maybe I'm missing 
>>> something.
>>>
>>> When checking if an object is in e queryset I always use code lite this:
>>> if queryset.filter(pk=obj.pk).exists():
>>>
>>> The pythonic way would be:
>>> if obj in queryset:
>>>
>>> The way I understand it this latter version will fetch all objects from 
>>> queryset, while the first one is a much leaner db query.
>>>
>>> So, is there a reason QuerySet doesn't have a __contains__ method, or 
>>> would it be a good addition? My guess is that a lot of people use the "obj 
>>> in queryset" syntax, believing it does "the right thing".
>>>
>>> //Johan
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to django-developers+unsubscr...@googlegroups.com 
>>> .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com
>>>  
>>> 
>>> .
>>>
>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> django-developers+unsubscr...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com
>>  
>> 
>> .
>>
>
>
> -- 
> Vänligen, Johan Schiff
> Radkompaniet AB
> 072-229 61 19
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/CAD69rUBKOse%2Be2RedPT%2B-KWkcdC%3DiaG0zyt1Hakefp6gQ7vdLA%40mail.gmail.com
>  
> 

Re: Implement QuerySet.__contains__?

2020-06-05 Thread Aymeric Augustin
> If people agree QuerySet.contains() should be added, how do we move forward?

Yes, if there's no major new argument in a couple days, we can assume consensus 
on adding .contains().

* Create a ticket in Trac. Point to this discussion and to the previous ticket 
where .contains() was suggested.
* Submit a pull request in GitHub with code, tests, docs. This is worth 
mentioning in the changelog.

There's a lot of details in the docs, in the Contributing Guide, about how to 
do allD this right.

-- 
Aymeric.



> On 5 Jun 2020, at 13:58, Johan Schiff  wrote:
> 
> Shai does make a good point about changing a well documented behaviour. That 
> argument wins me over.
> 
> Adding .contains() and updating the documentation goes a long way to make it 
> easier for developers.
> 
> Best case would be that Dajngo does not make assumptions about what the 
> developer wants, but to implement __len__(), __bool__() and __contains__(), 
> we have to assume one method or the other. I still think the wrong call was 
> made here, but changing it now is too much pain. (Assuming queryset should be 
> evaluated is probably correct in most cases, but sometimes adds a big 
> performance hit when the code goes into production and the dataset grows - 
> code that looks reasonable and works like a charm in dev will cause trouble 
> in production. Assuming the opposite would have added a tiny performance hit 
> in most cases, but avoided the big one.)
> 
> Since I'm new here:
> If people agree QuerySet.contains() should be added, how do we move forward?
> I'd be willing to write some code with tests and add a ticket, if that's 
> helpful.
> 
> ‪Le ven. 5 juin 2020 à 11:42, ‫אורי‬‎  > a écrit :‬
> Hi,
> 
> I'm thinking, maybe instead of:
> 
> if obj in queryset:
> 
> Users should write:
> 
> if obj in list(queryset):
> 
> Which I understand is already working? Why does the first line (without 
> list()) should work anyway?
> 
> And if users want performance, why not write:
> 
> if queryset.filter(pk=obj.pk ).exists():
> אורי
> u...@speedy.net 
> 
> On Tue, Jun 2, 2020 at 11:57 AM Johan Schiff  > wrote:
> Is there a specific reason Djangos QuerySet does not implement __contains__? 
> It doesn't seem very complicated to me, but maybe I'm missing something.
> 
> When checking if an object is in e queryset I always use code lite this:
> if queryset.filter(pk=obj.pk).exists():
> 
> The pythonic way would be:
> if obj in queryset:
> 
> The way I understand it this latter version will fetch all objects from 
> queryset, while the first one is a much leaner db query.
> 
> So, is there a reason QuerySet doesn't have a __contains__ method, or would 
> it be a good addition? My guess is that a lot of people use the "obj in 
> queryset" syntax, believing it does "the right thing".
> 
> //Johan
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com
>  
> .
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe 
> .
> To unsubscribe from this group and all its topics, send an email to 
> django-developers+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com
>  
> .
> 
> 
> -- 
> Vänligen, Johan Schiff
> Radkompaniet AB
> 072-229 61 19
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> 

Re: Implement QuerySet.__contains__?

2020-06-05 Thread Johan Schiff
Shai does make a good point about changing a well documented behaviour.
That argument wins me over.

Adding .contains() and updating the documentation goes a long way to make
it easier for developers.

Best case would be that Dajngo does not make assumptions about what the
developer wants, but to implement __len__(), __bool__() and __contains__(),
we have to assume one method or the other. I still think the wrong call was
made here, but changing it now is too much pain. *(Assuming queryset should
be evaluated is probably correct in most cases, but sometimes adds a big
performance hit when the code goes into production and the dataset grows -
code that looks reasonable and works like a charm in dev will cause trouble
in production. Assuming the opposite would have added a tiny performance
hit in most cases, but avoided the big one.)*

Since I'm new here:
If people agree QuerySet.contains() should be added, how do we move forward?
I'd be willing to write some code with tests and add a ticket, if that's
helpful.

‪Le ven. 5 juin 2020 à 11:42, ‫אורי‬‎  a écrit :‬

> Hi,
>
> I'm thinking, maybe instead of:
>
> if obj in queryset:
>
> Users should write:
>
> if obj in list(queryset):
>
> Which I understand is already working? Why does the first line (without
> list()) should work anyway?
>
> And if users want performance, why not write:
>
> if queryset.filter(pk=obj.pk).exists():
> אורי
> u...@speedy.net
>
>
> On Tue, Jun 2, 2020 at 11:57 AM Johan Schiff  wrote:
>
>> Is there a specific reason Djangos QuerySet does not implement
>> __contains__? It doesn't seem very complicated to me, but maybe I'm missing
>> something.
>>
>> When checking if an object is in e queryset I always use code lite this:
>> if queryset.filter(pk=obj.pk).exists():
>>
>> The pythonic way would be:
>> if obj in queryset:
>>
>> The way I understand it this latter version will fetch all objects from
>> queryset, while the first one is a much leaner db query.
>>
>> So, is there a reason QuerySet doesn't have a __contains__ method, or
>> would it be a good addition? My guess is that a lot of people use the "obj
>> in queryset" syntax, believing it does "the right thing".
>>
>> //Johan
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-developers+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com
>> 
>> .
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com
> 
> .
>


-- 
Vänligen, Johan Schiff
Radkompaniet AB
072-229 61 19

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD69rUBKOse%2Be2RedPT%2B-KWkcdC%3DiaG0zyt1Hakefp6gQ7vdLA%40mail.gmail.com.


Re: Implement QuerySet.__contains__?

2020-06-05 Thread אורי
Hi,

I'm thinking, maybe instead of:

if obj in queryset:

Users should write:

if obj in list(queryset):

Which I understand is already working? Why does the first line (without
list()) should work anyway?

And if users want performance, why not write:

if queryset.filter(pk=obj.pk).exists():
אורי
u...@speedy.net


On Tue, Jun 2, 2020 at 11:57 AM Johan Schiff  wrote:

> Is there a specific reason Djangos QuerySet does not implement
> __contains__? It doesn't seem very complicated to me, but maybe I'm missing
> something.
>
> When checking if an object is in e queryset I always use code lite this:
> if queryset.filter(pk=obj.pk).exists():
>
> The pythonic way would be:
> if obj in queryset:
>
> The way I understand it this latter version will fetch all objects from
> queryset, while the first one is a much leaner db query.
>
> So, is there a reason QuerySet doesn't have a __contains__ method, or
> would it be a good addition? My guess is that a lot of people use the "obj
> in queryset" syntax, believing it does "the right thing".
>
> //Johan
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CABD5YeE7_ORUfJcnt6f4aB4J0j-%3DyDK_BowEt_fefcaFMGdB1g%40mail.gmail.com.


Re: Implement QuerySet.__contains__?

2020-06-05 Thread Adam Johnson
I'm with Shai, against changing bool() and len() behaviour. I think the
"fetch the whole queryset" behaviour is normally helpful for beginners, who
write things like:

if qs:
   for x in qs:
   ...

# really common for anyone new to Python:
for i in len(qs):
do_something(qs[i])

We have highlighted this behaviour for a long time in the 'database access
optimization' page really well:
https://docs.djangoproject.com/en/3.0/topics/db/optimization/#use-queryset-count
. There are alternatives if you want different behaviour - exists() and
count() .

+1 to .contains() since it would follow that same pattern.

On Fri, 5 Jun 2020 at 07:03, Shai Berger  wrote:

> On Tue, 2 Jun 2020 20:57:54 +0200
> Aymeric Augustin  wrote:
>
> >
> > We're talking about a trade-off between preserving optimisations in
> > existing code bases and expertise of advanced users versus doing the
> > right thing by default for less experienced users.
>
> I disagree.
>
> The suggestion is to make
>
> if obj in qs:
> # do things which don't fetch qs
>
> more performant, at the expense of
>
> if obj in qs:
> # do things that fetch qs
>
> As well as
>
> for obj in container:
> if obj in qs:
> # whatever
>
> So, it is not a net optimization even in terms of its own.
>
> Also, changing the performance characteristics of thousands, if not
> millions of existing lines of code, and making a decade of online
> comments and advice wrong (not just obsolete "there's better ways now"
> but wrong "this points you away from truth and good"), is IMO a
> non-starter.
>
> Strong -1 on Aymeric's option 1. Strong +1 on adding ".contains()".
>
> My 2 cents,
> Shai.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers  (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/20200605090349.559f3872.shai%40platonix.com
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM1s8moPUY0dqfm3bcnFF9a7ATfz%3DDjSW2p-uXSMqvwcAg%40mail.gmail.com.


Re: Implement QuerySet.__contains__?

2020-06-05 Thread Shai Berger
On Tue, 2 Jun 2020 20:57:54 +0200
Aymeric Augustin  wrote:

> 
> We're talking about a trade-off between preserving optimisations in
> existing code bases and expertise of advanced users versus doing the
> right thing by default for less experienced users.

I disagree. 

The suggestion is to make 

if obj in qs:
# do things which don't fetch qs

more performant, at the expense of

if obj in qs:
# do things that fetch qs

As well as

for obj in container:
if obj in qs:
# whatever

So, it is not a net optimization even in terms of its own.

Also, changing the performance characteristics of thousands, if not
millions of existing lines of code, and making a decade of online
comments and advice wrong (not just obsolete "there's better ways now"
but wrong "this points you away from truth and good"), is IMO a
non-starter.

Strong -1 on Aymeric's option 1. Strong +1 on adding ".contains()".

My 2 cents,
Shai.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/20200605090349.559f3872.shai%40platonix.com.


Re: Implement QuerySet.__contains__?

2020-06-03 Thread Johan Schiff
To answer my own question: No, I wasn't doing it correctly. I should have
done a sanity check before posting.

New timeit code and results at bottom, now using a smaller dataset (~900
objects).

Notable:

   - An .exists() query is about 1/100 the time of full fetch in this case.
   This difference would ofc be much bigger if I did it on the 100 000+
   objects.
   - If already prefetched, it should be worth it to search prefetched data
   if the dataset is <700 objects. It seems rare to prefetch that many
   objects. If so, a developer could easily be explicit and use .exists()
   method.

*Based on this exact scenario: 100 times slower for using obj in
queryset is quite a performance hit. On the other hand, an extra 1 % query
time for a queryset that is later evaluated is marginal.*

If a developer knows they want to evaluate the queryset later, it would be
nice to have a .evaluate() that does self._fetch_all() and returns self. I
think that's preferable over list(queryset)-method, because we don't need
to create an extra list.

# Old explicit fetch method
queryset = Stuff.objects.all()
if obj in list(queryset):
pass

# My suggestion
queryset = Stuff.objects.all().evaluate()
if obj in queryset:
pass



Updated timeit function and results:

import timeit

def time_fn(title, stmt, number=10_000):
setup = [
'from movies.models import Movie',
'qs=Movie.objects.all()',
'obj_first = qs.first()',
'obj_mid = qs[300]',
'obj_last = qs.last()',
'list(qs)',
]
result_time = timeit.timeit(stmt, setup='\n'.join(setup), number=number)
print(f'{title}: {result_time}')

time_fn('Database fetch', 'qs._result_cache=None\nlist(qs)')
time_fn('Prefetched obj_first in qs', 'obj_first in qs')
time_fn('Prefetched obj_mid in qs', 'obj_mid in qs')
time_fn('Prefetched obj_last in qs', 'obj_last in qs')
time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk
).exists()')
time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()')
time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk).exists()')

# Results

Database fetch: 616.097227364
Prefetched obj_first in qs: 0.030961711003328674
Prefetched obj_mid in qs: 6.6988333979970776
Prefetched obj_last in qs: 24.18991441695
Database obj_first exists(): 6.468764332996216
Database obj_mid exists(): 6.167532913001196
Database obj_last exists(): 6.0190791100030765


Le mer. 3 juin 2020 à 10:52, Johan Schiff  a écrit :

> Thanks for great info.
>
> First, I'm leaning towards Aymeric's proposition here. I do recognize that
> there is a lot to consider.
>
> This seems to be important:
>
>1. Developers must be able to explicitly choose methods to optimize
>for their environment. (Considering database latency, dataset size,
>prefetch preferred, etc.)
>2. Behaviour should preferably be consistent across methods. (len(qs),
>bool(qs), obj in qs)
>3. Syntax should be pythonic.
>
> I think what Aymeric is proposing would fulfill those demands.
>
> I have done some timeit-tests on a model with 100 000+ records, based on
> Rogers work, on a single host setup using postgresql. My finding is that a
> separate database query is generally faster than current *obj in qs* 
> behaviour,
> even on a prefetched queryset (unless your dataset is really small). This
> tells me that we should prioritize .exists() query unless explicitly stated
> (i.e. *obj in iter(qs)*), even when we have prefetched data. For len(qs)
> and bool(qs) that should not be the case.
>
> It would be interesting to get timings from other setups, so I'm including
> the code I used. (Also, am I doing this correctly?)
>
> import timeit
>
> def time_fn(title, stmt, number=1, prefetch=False):
> setup = [
> 'from tickets.models import Ticket',
> 'qs=Ticket.objects.all()',
> 'obj_first = qs.first()',
> 'obj_mid = qs[5]',
> 'obj_last = qs.last()',
> ]
> if prefetch:
> setup.append('list(qs)')
> result_time = timeit.timeit(stmt, setup='\n'.join(setup),
> number=number)
> print(f'{title}: {result_time}')
>
> time_fn('Database fetch', 'list(qs)')
> time_fn('Prefetched obj_first in qs', 'obj_first in qs', prefetch=True)
> time_fn('Prefetched obj_mid in qs', 'obj_mid in qs', prefetch=True)
> time_fn('Prefetched obj_last in qs', 'obj_last in qs', prefetch=True)
> time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk
> ).exists()')
> time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()')
> time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk
> ).exists()')
>
>
> My results:
>
> Database fetch: 25.667968138001015
> Prefetched obj_first in qs: 0.027538340998944477
> Prefetched obj_mid in qs: 1051.1691511649988
> Prefetched obj_last in qs: 2369.5217889660016
> Database obj_first exists(): 6.442390248001175
> Database obj_mid exists(): 6.213641551999899
> Database obj_last exists(): 5.83125246542
>
>
> Le 

Re: Implement QuerySet.__contains__?

2020-06-03 Thread Johan Schiff
Thanks for great info.

First, I'm leaning towards Aymeric's proposition here. I do recognize that
there is a lot to consider.

This seems to be important:

   1. Developers must be able to explicitly choose methods to optimize for
   their environment. (Considering database latency, dataset size, prefetch
   preferred, etc.)
   2. Behaviour should preferably be consistent across methods. (len(qs),
   bool(qs), obj in qs)
   3. Syntax should be pythonic.

I think what Aymeric is proposing would fulfill those demands.

I have done some timeit-tests on a model with 100 000+ records, based on
Rogers work, on a single host setup using postgresql. My finding is that a
separate database query is generally faster than current *obj in qs* behaviour,
even on a prefetched queryset (unless your dataset is really small). This
tells me that we should prioritize .exists() query unless explicitly stated
(i.e. *obj in iter(qs)*), even when we have prefetched data. For len(qs)
and bool(qs) that should not be the case.

It would be interesting to get timings from other setups, so I'm including
the code I used. (Also, am I doing this correctly?)

import timeit

def time_fn(title, stmt, number=1, prefetch=False):
setup = [
'from tickets.models import Ticket',
'qs=Ticket.objects.all()',
'obj_first = qs.first()',
'obj_mid = qs[5]',
'obj_last = qs.last()',
]
if prefetch:
setup.append('list(qs)')
result_time = timeit.timeit(stmt, setup='\n'.join(setup), number=number)
print(f'{title}: {result_time}')

time_fn('Database fetch', 'list(qs)')
time_fn('Prefetched obj_first in qs', 'obj_first in qs', prefetch=True)
time_fn('Prefetched obj_mid in qs', 'obj_mid in qs', prefetch=True)
time_fn('Prefetched obj_last in qs', 'obj_last in qs', prefetch=True)
time_fn('Database obj_first exists()', 'qs.filter(pk=obj_first.pk
).exists()')
time_fn('Database obj_mid exists()', 'qs.filter(pk=obj_mid.pk).exists()')
time_fn('Database obj_last exists()', 'qs.filter(pk=obj_last.pk).exists()')


My results:

Database fetch: 25.667968138001015
Prefetched obj_first in qs: 0.027538340998944477
Prefetched obj_mid in qs: 1051.1691511649988
Prefetched obj_last in qs: 2369.5217889660016
Database obj_first exists(): 6.442390248001175
Database obj_mid exists(): 6.213641551999899
Database obj_last exists(): 5.83125246542


Le mar. 2 juin 2020 à 23:15, Roger Gammans  a
écrit :

> On Tue, 2020-06-02 at 11:31 -0700, Javier Buzzi wrote:
>
>
>
> ps @roger
>
> >>> timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import
> myapp.models as m;x = m.GL.objects.all()[324]', number=100)
> 0.05818330496549606
>
> is not doing anything, add a `.exists()` or `len(..)` or something to 
> evaluate the queryset. That number is WAY too low.
>
>
> Yes, of course. That makes me feel silly, on both counts - the evaluations
> and the number of iterations. I made number of iterations low;because I
> started with a test which a single iteration was taking multiple seconds
> and it that case the default number was way to high for the quick test I
> was offering. More critically the x in qs; depends on where in the qs the
> object is and it doesn't matter how big if you only look at the beginning.
>
> I reran it; but which shows different results.. See the following re run.
> Construction here is more complicated to show avg microsecs; per iteration,
> note the vary iteration lengths, actual runtime was aimed to be between
> 20secs and a minute or so.
>
> >>> # Searching for first in queryset
>
> >>> n=10_000_000 ;timeit.timeit('x in qs', setup='import myapp.models as m;qs 
> >>> = m.GL.objects.all(); qs2=qs;i=len(qs2);x=qs[0]', number=n)/(n/1_000_000)
>
> 0.4136358265765011
>
> >>> n = 100_000 ; timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import 
> >>> myapp.models as m;x = list(m.GL.objects.all())[0]',  
> >>> number=n)/(n/1_000_000)
>
> 124.99445254914463
>
> >>> # Searching for last in queryset.
>
> >>> n=1000; timeit.timeit('x in qs', setup='import myapp.models as m;qs = 
> >>> m.GL.objects.all(); qs2=qs;i=len(qs2);x=qs[i-1]', number=n)/(n/1_000_000)
>
> 50741.098549216986
>
> >>> n = 100_000 ; timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import 
> >>> myapp.models as m;x = list(m.GL.objects.all())[-1]',  
> >>> number=n)/(n/1_000_000)
>
> 118.99649207945913
>
>
>
>
> --
>
> Roger Gammans 
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/django-developers/NZaMq9BALrs/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk
> 

Re: Implement QuerySet.__contains__?

2020-06-02 Thread Roger Gammans
On Tue, 2020-06-02 at 11:31 -0700, Javier Buzzi wrote:
> ps @roger
> 
> >>> timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import
> myapp.models as m;x = m.GL.objects.all()[324]', number=100)
> 0.05818330496549606
> 
> is not doing anything, add a `.exists()` or `len(..)` or something to
> evaluate the queryset. That number is WAY too low.

Yes, of course. That makes me feel silly, on both counts - the
evaluations and the number of iterations.  I made number of iterations
low;because I started with a test which a single iteration was taking
multiple seconds and it that case the default number was way to high
for the quick test I was offering. More critically the x in qs; depends
on where in the qs the object is and it doesn't matter how big if you
only look at the beginning.
I reran it; but which shows different results.. See the following re
run. Construction here is more complicated to show avg microsecs; per
iteration, note the vary iteration lengths, actual runtime was aimed to
be between 20secs and a minute or so.
>>> # Searching for first in queryset>>> n=10_000_000 ;timeit.timeit('x
in qs', setup='import myapp.models as m;qs = m.GL.objects.all();
qs2=qs;i=len(qs2);x=qs[0]',
number=n)/(n/1_000_000)0.4136358265765011>>> n = 100_000 ;
timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import
myapp.models as m;x =
list(m.GL.objects.all())[0]',  number=n)/(n/1_000_000)124.9944525491446
3>>> # Searching for last in queryset.>>> n=1000; timeit.timeit('x in
qs', setup='import myapp.models as m;qs = m.GL.objects.all();
qs2=qs;i=len(qs2);x=qs[i-1]',
number=n)/(n/1_000_000)50741.098549216986>>> n = 100_000 ;
timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import
myapp.models as m;x = list(m.GL.objects.all())[-
1]',  number=n)/(n/1_000_000)118.99649207945913


-- 
Roger Gammans 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/3928039038bac9b52279294f7efcac318dc80388.camel%40gammascience.co.uk.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Aymeric Augustin
> On 2 Jun 2020, at 19:42, Tim Graham  wrote:
> 
> And here's some past discussion:
> https://code.djangoproject.com/ticket/24141 - contains() method for QuerySets 
> (closed as needsinfo due to no mailing list discussion to find a consensus)
> https://github.com/django/django/pull/3906 - Efficient QuerySet.__contains__ 
> (closed as wontfix due to the behavior change)

Thanks. Your memory is amazing :-)

Since the issue keeps coming back, I think we should do one of the following:

1. make simultaneous and consistent changes to __nonzero__, __len__, and 
__contains__ to eliminate the "fetch the whole table" behavior when calling `if 
qs`,  `len(qs)` and `if obj in qs` — and check if any other special methods 
require similar treatment
2. failing that, add .contains(), as suggested in 
https://code.djangoproject.com/ticket/24141

We're talking about a trade-off between preserving optimisations in existing 
code bases and expertise of advanced users versus doing the right thing by 
default for less experienced users. I think we're increasingly willing to 
consider changes that make Django safer to use, so perhaps what Carl said in 
2015 could be open for discussion. If I forget what I know about querysets for 
a minute, the Principle of Least Astonishment says we should go for option 1.

if qs:  # test in qs isn't empty
if list(qs):  # fetch and cache qs, then test if it isn't empty
len(qs): # count the number of objects in qs
len(list(qs)):  # fetch and cache qs, then count the number of objects
if obj in qs:  # test if qs contains obj
if obj in list(qs):  # fetch and cache qs, then test if qs contains obj

objects = list(qs) is a well known pattern when you need to trigger the 
database query, perhaps because you'll do further manipulations in Python.

In https://code.djangoproject.com/ticket/18702#comment:8 
 Anssi says he was in 
favor of committing the optimized version (my option 1). Just before finalizing 
his patch, he changed his mind because:

- "This approach optimizes use cases that are rare": well, since they keep 
coming up, in fact they aren't so rare
- "Special casing some uses makes the code complex": sure, but the whole point 
of Django is to include _some_ complexity so that users have less to worry 
about.

I'm interested in what others think :-)

-- 
Aymeric.




-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/90C68EF9-7343-44D6-9BB0-33972198685A%40polytechnique.org.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Javier Buzzi
My 2 cents, I think @johan's suggestion makes sense.

if obj in queryset:

It's very pythonic. it should do what __len__ does and cache it, if you 
want the single quick db query you can always use exists().


ps @roger

>>> timeit.timeit('m.GL.objects.filter(pk=x.pk)', setup='import 
myapp.models as m;x = m.GL.objects.all()[324]', number=100)
0.05818330496549606

is not doing anything, add a `.exists()` or `len(..)` or something to evaluate 
the queryset. That number is WAY too low.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5195eb58-8600-43a3-ad46-1c72f691f04b%40googlegroups.com.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Tim Graham
And here's some past discussion:
https://code.djangoproject.com/ticket/24141 - contains() method for 
QuerySets (closed as needsinfo due to no mailing list discussion to find a 
consensus)
https://github.com/django/django/pull/3906 - Efficient 
QuerySet.__contains__ (closed as wontfix due to the behavior change)

On Tuesday, June 2, 2020 at 8:53:07 AM UTC-4, Tim Graham wrote:
>
> It may help to know that QuerySet.__contains__() was implemented until 
> Django 1.6 when chunked reads were removed from QuerySet iteration: 
>
>
> https://github.com/django/django/commit/70679243d1786e03557c28929f9762a119e3ac14
>
> On Tuesday, June 2, 2020 at 7:43:25 AM UTC-4, Aymeric Augustin wrote:
>>
>> On 2 Jun 2020, at 13:30, Florian Apolloner  wrote:
>>
>> On Tuesday, June 2, 2020 at 11:28:34 AM UTC+2, Adam Johnson wrote:
>>>
>>> If you already fetched the queryset, `if obj in queryset` will make a 
 new database query
>>>
>>>
>>> I don't see why we couldn't implement __contains__ to do a walk through 
>>> _result_cache if it has been fetched?
>>>
>>
>> we are doing the same for len() IIRC. I have to say it sometimes makes it 
>> a little bit hard to debug, but if it is documented as such it should be 
>> okay.
>>
>>
>> These are good points.
>>
>> As far as I can tell, the current implementation of __len__ is to fetch 
>> the whole queryset:
>>
>> def __len__(self):
>> self._fetch_all()
>> return len(self._result_cache)
>>
>> I'm almost sure doing an automatic .count() was discussed in the past. I 
>> don't remember why we rejected it — perhaps because in most cases, when you 
>> need the length of a queryset, you will also iterate it?
>>
>> The suggestion for __contains__ is similar: doing an automatic .filter(pk=
>> obj.pk).exists(). We could do it only if the queryset isn't fetched yet, 
>> but I'm afraid this may be too magical. Is it reasonable to have `total = 
>> len(queryset); found = obj in queryset` make one SQL query while `found = 
>> obj in queryset; total = len(queryset)` makes two queries?
>>
>> Now I'm worried that we may be making inconsistent decisions between 
>> __len__ and __contains__... Let's make sure we're consistent across dunder 
>> methods.
>>
>> -- 
>> Aymeric
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/5e16c857-d3c1-4d9d-b7cd-99fe77b7e13a%40googlegroups.com.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Roger Gammans
Hi,
Here are some performance numbers against a local SQLite in case any
one is interested. GL has about 40,000 records for reference.324 was
just a random number chosen, of different 'in's 

>>> sys.version'3.7.2rc1 (default, Dec 12 2018, 06:25:49) \n[GCC
8.2.0]'>>> django.__version__'3.0.6'>>> import myapp.models as m>>> x =
m.GL.objects.all()[324]>>> x in m.GL.objects.all()True
>>> import timeit>>> timeit.timeit('m.GL.objects.filter(pk=x.pk)',
setup='import myapp.models as m;x = m.GL.objects.all()[324]',
number=100)0.05818330496549606>>> timeit.timeit('x in qs',
setup='import myapp.models as m;qs = m.GL.objects.all(); x=qs[324]',
number=100)1.5688817161135375


On Tue, 2020-06-02 at 05:53 -0700, Tim Graham wrote:
> It may help to know that QuerySet.__contains__() was implemented
> until Django 1.6 when chunked reads were removed from QuerySet
> iteration:
> 
> https://github.com/django/django/commit/70679243d1786e03557c28929f9762a119e3ac14
> On Tuesday, June 2, 2020 at 7:43:25 AM UTC-4, Aymeric Augustin wrote:
> > > On 2 Jun 2020, at 13:30, Florian Apolloner 
> > > wrote:
> > > 
> > > On Tuesday, June 2, 2020 at 11:28:34 AM UTC+2, Adam Johnson
> > > wrote:
> > > > > If you already fetched the queryset, `if obj in queryset`
> > > > > will make a new database query
> > > > 
> > > > I don't see why we couldn't implement __contains__ to do a walk
> > > > through _result_cache if it has been fetched?
> > > > 
> > > 
> > > we are doing the same for len() IIRC. I have to say it sometimes
> > > makes it a little bit hard to debug, but if it is documented as
> > > such it should be okay.
> > 
> > These are good points.
> > 
> > As far as I can tell, the current implementation of __len__ is to
> > fetch the whole queryset:
> > 
> > def __len__(self):
> > self._fetch_all()
> > return len(self._result_cache)
> > 
> > I'm almost sure doing an automatic .count() was discussed in the
> > past. I don't remember why we rejected it — perhaps because in most
> > cases, when you need the length of a queryset, you will also
> > iterate it?
> > 
> > The suggestion for __contains__ is similar: doing an automatic
> > .filter(pk=obj.pk).exists(). We could do it only if the queryset
> > isn't fetched yet, but I'm afraid this may be too magical. Is it
> > reasonable to have `total = len(queryset); found = obj in queryset`
> > make one SQL query while `found = obj in queryset; total =
> > len(queryset)` makes two queries?
> > 
> > Now I'm worried that we may be making inconsistent decisions
> > between __len__ and __contains__... Let's make sure we're
> > consistent across dunder methods.
> > 
> > -- 
> > Aymeric
> > 
> > 
> 
> 
> 
> -- 
> 
> You received this message because you are subscribed to the Google
> Groups "Django developers  (Contributions to Django itself)" group.
> 
> To unsubscribe from this group and stop receiving emails from it,
> send an email to django-developers+unsubscr...@googlegroups.com.
> 
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/55b4d7e0-ba30-48b2-8005-d27adacc7fd0%40googlegroups.com
> .
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/69fdfe26be34d37a09cfe9dc7f5cc9a4c580d1d9.camel%40gammascience.co.uk.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Tim Graham
It may help to know that QuerySet.__contains__() was implemented until 
Django 1.6 when chunked reads were removed from QuerySet iteration: 

https://github.com/django/django/commit/70679243d1786e03557c28929f9762a119e3ac14

On Tuesday, June 2, 2020 at 7:43:25 AM UTC-4, Aymeric Augustin wrote:
>
> On 2 Jun 2020, at 13:30, Florian Apolloner  > wrote:
>
> On Tuesday, June 2, 2020 at 11:28:34 AM UTC+2, Adam Johnson wrote:
>>
>> If you already fetched the queryset, `if obj in queryset` will make a new 
>>> database query
>>
>>
>> I don't see why we couldn't implement __contains__ to do a walk through 
>> _result_cache if it has been fetched?
>>
>
> we are doing the same for len() IIRC. I have to say it sometimes makes it 
> a little bit hard to debug, but if it is documented as such it should be 
> okay.
>
>
> These are good points.
>
> As far as I can tell, the current implementation of __len__ is to fetch 
> the whole queryset:
>
> def __len__(self):
> self._fetch_all()
> return len(self._result_cache)
>
> I'm almost sure doing an automatic .count() was discussed in the past. I 
> don't remember why we rejected it — perhaps because in most cases, when you 
> need the length of a queryset, you will also iterate it?
>
> The suggestion for __contains__ is similar: doing an automatic .filter(pk=
> obj.pk).exists(). We could do it only if the queryset isn't fetched yet, 
> but I'm afraid this may be too magical. Is it reasonable to have `total = 
> len(queryset); found = obj in queryset` make one SQL query while `found = 
> obj in queryset; total = len(queryset)` makes two queries?
>
> Now I'm worried that we may be making inconsistent decisions between 
> __len__ and __contains__... Let's make sure we're consistent across dunder 
> methods.
>
> -- 
> Aymeric
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/55b4d7e0-ba30-48b2-8005-d27adacc7fd0%40googlegroups.com.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Aymeric Augustin
> On 2 Jun 2020, at 13:30, Florian Apolloner  wrote:
> 
> On Tuesday, June 2, 2020 at 11:28:34 AM UTC+2, Adam Johnson wrote:
> If you already fetched the queryset, `if obj in queryset` will make a new 
> database query
> 
> I don't see why we couldn't implement __contains__ to do a walk through 
> _result_cache if it has been fetched?
> 
> we are doing the same for len() IIRC. I have to say it sometimes makes it a 
> little bit hard to debug, but if it is documented as such it should be okay.

These are good points.

As far as I can tell, the current implementation of __len__ is to fetch the 
whole queryset:

def __len__(self):
self._fetch_all()
return len(self._result_cache)

I'm almost sure doing an automatic .count() was discussed in the past. I don't 
remember why we rejected it — perhaps because in most cases, when you need the 
length of a queryset, you will also iterate it?

The suggestion for __contains__ is similar: doing an automatic 
.filter(pk=obj.pk).exists(). We could do it only if the queryset isn't fetched 
yet, but I'm afraid this may be too magical. Is it reasonable to have `total = 
len(queryset); found = obj in queryset` make one SQL query while `found = obj 
in queryset; total = len(queryset)` makes two queries?

Now I'm worried that we may be making inconsistent decisions between __len__ 
and __contains__... Let's make sure we're consistent across dunder methods.

-- 
Aymeric


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9B25C8F3-A282-4C7F-BB4C-252364514A03%40polytechnique.org.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Florian Apolloner
On Tuesday, June 2, 2020 at 11:28:34 AM UTC+2, Adam Johnson wrote:
>
> If you already fetched the queryset, `if obj in queryset` will make a new 
>> database query
>
>
> I don't see why we couldn't implement __contains__ to do a walk through 
> _result_cache if it has been fetched?
>

we are doing the same for len() IIRC. I have to say it sometimes makes it a 
little bit hard to debug, but if it is documented as such it should be okay.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fa0eae17-2c25-4ae2-befb-c20ebfb8d1da%40googlegroups.com.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Adam Johnson
>
> If you already fetched the queryset, `if obj in queryset` will make a new
> database query


I don't see why we couldn't implement __contains__ to do a walk through
_result_cache if it has been fetched?

On Tue, 2 Jun 2020 at 10:13, Aymeric Augustin <
aymeric.augus...@polytechnique.org> wrote:

> Hello Johan,
>
> You explained the upside. There's one downside to be aware of. If you
> already fetched the queryset, `if obj in queryset` will make a new database
> query, which will be slower than walking through the already fetched data
> (unless the queryset is really large and the database really close in terms
> of network latency). Making this change may throw off some assertNumQueries
> tests. However, people who consider this a regression can do `obj in
> list(queryset)` or perhaps `obj in iter(queryset)` to reuse the cached list.
>
> Judgment call: I think the upside is worth the downside and
> backwards-incompatibility, so I'm in favor of the change.
>
> Best regards,
>
> --
> Aymeric.
>
>
>
> On 2 Jun 2020, at 10:57, Johan Schiff  wrote:
>
> Is there a specific reason Djangos QuerySet does not implement
> __contains__? It doesn't seem very complicated to me, but maybe I'm missing
> something.
>
> When checking if an object is in e queryset I always use code lite this:
> if queryset.filter(pk=obj.pk).exists():
>
> The pythonic way would be:
> if obj in queryset:
>
> The way I understand it this latter version will fetch all objects from
> queryset, while the first one is a much leaner db query.
>
> So, is there a reason QuerySet doesn't have a __contains__ method, or
> would it be a good addition? My guess is that a lot of people use the "obj
> in queryset" syntax, believing it does "the right thing".
>
> //Johan
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com
> 
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/570372E7-2949-426C-BAAE-DBEF1B17FBC0%40polytechnique.org
> 
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM21UY5u_sDyuSeNGY%3D4wOXXX4rSDR6CE1yYR_ZBmOoRSw%40mail.gmail.com.


Re: Implement QuerySet.__contains__?

2020-06-02 Thread Aymeric Augustin
Hello Johan,

You explained the upside. There's one downside to be aware of. If you already 
fetched the queryset, `if obj in queryset` will make a new database query, 
which will be slower than walking through the already fetched data (unless the 
queryset is really large and the database really close in terms of network 
latency). Making this change may throw off some assertNumQueries tests. 
However, people who consider this a regression can do `obj in list(queryset)` 
or perhaps `obj in iter(queryset)` to reuse the cached list.

Judgment call: I think the upside is worth the downside and 
backwards-incompatibility, so I'm in favor of the change.

Best regards,

-- 
Aymeric.



> On 2 Jun 2020, at 10:57, Johan Schiff  wrote:
> 
> Is there a specific reason Djangos QuerySet does not implement __contains__? 
> It doesn't seem very complicated to me, but maybe I'm missing something.
> 
> When checking if an object is in e queryset I always use code lite this:
> if queryset.filter(pk=obj.pk).exists():
> 
> The pythonic way would be:
> if obj in queryset:
> 
> The way I understand it this latter version will fetch all objects from 
> queryset, while the first one is a much leaner db query.
> 
> So, is there a reason QuerySet doesn't have a __contains__ method, or would 
> it be a good addition? My guess is that a lot of people use the "obj in 
> queryset" syntax, believing it does "the right thing".
> 
> //Johan
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/88c83b8d-658c-47cc-9162-fcfebebe9c4a%40googlegroups.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/570372E7-2949-426C-BAAE-DBEF1B17FBC0%40polytechnique.org.