Re: [Django] #28455: Create "inplace" QuerySets to speed up certain operations

2021-08-02 Thread Django
#28455: Create "inplace" QuerySets to speed up certain operations
-+-
 Reporter:  Anssi Kääriäinen |Owner:  Keryn
 Type:   |  Knight
  Cleanup/optimization   |   Status:  assigned
Component:  Database layer   |  Version:  dev
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  1|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  1
Easy pickings:  0|UI/UX:  0
-+-
Changes (by Keryn Knight):

 * needs_better_patch:  0 => 1
 * has_patch:  0 => 1


Comment:

 Updated slightly, and now I've sat down to get cProfile information using
 `%prun for _ in range(100):
 tuple(User.objects.prefetch_related('groups__permissions',
 'user_permissions'))`

 First, the baseline, showing only operations related to the change, for
 brevity (so no `Model.__init__` etc):
 {{{
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 312000.1650.0000.3030.000 query.py:290(clone)
   3000.0980.0003.1340.010
 query.py:1860(prefetch_one_level)
 312000.0660.0000.4440.000 query.py:1337(_clone)
 302000.0550.0000.6550.000
 related_descriptors.py:883(_apply_rel_filters)
 406000.0520.0001.2310.000 query.py:45(__iter__)
 302000.0510.0000.8970.000
 related_descriptors.py:899(get_queryset)
 312000.0460.0000.5000.000 query.py:1325(_chain)
 402000.0410.0000.3270.000 base.py:511(from_db)
 305000.0390.0000.9310.000
 query.py:982(_filter_or_exclude)
 306000.0300.0000.1940.000 manager.py:142(get_queryset)
 312000.0290.0000.3370.000 query.py:341(chain)
 312000.0280.0000.0700.000 where.py:142(clone)
 }}}

 Using the `@contextmanager` decorator. Lines are shown in the same order
 as the above, so they're technically ordered by the baseline's internal
 time:
 {{{
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  10000.0070.0000.0130.000 query.py:290(clone)
   3000.0820.0002.9320.010
 query.py:1881(prefetch_one_level)
  10000.0030.0000.0190.000 query.py:1358(_clone)
 302000.0970.0000.4440.000
 related_descriptors.py:884(_apply_rel_filters)
 406000.0520.0000.9470.000 query.py:46(__iter__)
 302000.0540.0001.0390.000
 related_descriptors.py:901(get_queryset)
 312000.0290.0000.0560.000 query.py:1343(_chain)
 402000.0410.0000.3760.000 base.py:511(from_db)
 305000.0410.0000.4960.000
 query.py:984(_filter_or_exclude)
 306000.0320.0000.5450.000 manager.py:142(get_queryset)
  10000.0010.0000.0140.000 query.py:341(chain)
  10000.0020.0000.0030.000 where.py:142(clone)
   ...
 306000.0810.0000.0860.000 contextlib.py:86(__init__)
 604000.0240.0000.0340.000
 query.py:1335(_avoid_cloning)
 306000.0170.0000.0670.000 contextlib.py:121(__exit__)
 306000.0160.0000.1020.000 contextlib.py:242(helper)
 306000.0160.0000.0420.000 contextlib.py:112(__enter__)
 }}}

 And then finally with a custom context manager class, again using the same
 ordering as the baseline:
 {{{
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  10000.0100.0000.0160.000 query.py:290(clone)
   3000.0950.0003.2930.011
 query.py:1888(prefetch_one_level)
  10000.0040.0000.0230.000 query.py:1365(_clone)
 302000.1020.0000.4120.000
 related_descriptors.py:884(_apply_rel_filters)
 406000.0620.0001.1330.000 query.py:46(__iter__)
 302000.0630.0001.0320.000
 related_descriptors.py:901(get_queryset)
 312000.0770.0000.1110.000 query.py:1350(_chain)
 402000.0490.0000.3910.000 base.py:511(from_db)
 305000.0700.0000.6500.000
 query.py:996(_filter_or_exclude)
 306000.0380.0000.5630.000 manager.py:142(get_queryset)
  10000.0020.0000.0180.000 query.py:341(chain)
  10000.0020.0000.0040.000 where.py:142(clone)
   ...
 302000.0060.0000.0060.000 query.py:178(__init__)
 302000.0160.0000.0220.000
 query.py:1347(_avoid_clon

Re: [Django] #28455: Create "inplace" QuerySets to speed up certain operations

2021-07-20 Thread Django
#28455: Create "inplace" QuerySets to speed up certain operations
-+-
 Reporter:  Anssi Kääriäinen |Owner:  Keryn
 Type:   |  Knight
  Cleanup/optimization   |   Status:  assigned
Component:  Database layer   |  Version:  dev
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Keryn Knight):

 PR is https://github.com/django/django/pull/14675

 Based heavily on Anssi's original implementation idea. This introduces 2
 ways of opting out of cloning, both ostensibly private API. Firstly and
 primarily, a context manager which temporarily makes the current QuerySet
 (+ Query, implicitly) mutable:

 {{{
 with User.objects.all()._avoid_cloning() as queryset1:
 queryset1.filter(x=1).exclude(y=2).filter(z__in=[1,2,3])
 # OR
 queryset2 = User.objects.all()
 with queryset2._avoid_cloning():
 queryset2.filter(x=1).exclude(y=2).filter(z__in=[1,2,3])
 }}}

 And secondly, an imperative form, for use cases where there might be
 multiple filters across a long list of lines, where it's not desirable to
 muddy the VCS history and/or the indent depth by using the context
 manager:
 {{{
 queryset = User.objects.all()._disable_cloning()
 queryset.filter(x=1).exclude(y=2).filter(z__in=[1,2,3])
 queryset._enable_cloning()
 queryset2 = queryset.filter(abc='test')  # This should be a new clone, as
 usual.
 }}}

 There's currently no support for any sort of nesting or ensured balancing
 (as with the original implementation, as I understand it), so
 `._disable_cloning()._disable_cloning()._disable_cloning()` would be
 undone by a single `._enable_cloning()` call because it's a toggle rather
 than a stack of pushes/pops; hence the preference for the contextmanager.

 Given the following SQLite data:
 {{{
 In [1]: from django.contrib.auth.models import User, Group, Permission
 In [2]: Group.objects.count()
 Out[2]: 101
 In [3]: User.objects.count()
 Out[3]: 101
 In [4]: Permission.objects.count()
 Out[4]: 24
 In [5]: User.user_permissions.through.count()
 Out[5]: 0
 In [6]: for user in User.objects.all():
...: print(f'{user.groups.count()}, {user.groups.first().pk},
 {user.user_permissions.count()}')
 1, 1, 0
 1, 2, 0
 1, 3, 0
 1, 4, 0
 1, 5, 0
 ... they all have 1 group and zero permissions
 In[7]: from django.db import connection
 In[8]: connection.queries[-2:]
 [{'sql': 'SELECT "auth_user"."id", ... FROM "auth_user"',
   'time': '0.001'},
  {'sql': 'SELECT ("auth_user_groups"."user_id") AS
 "_prefetch_related_val_user_id", "auth_group"."id", "auth_group"."name"
 FROM "auth_group" INNER JOIN "auth_user_groups" ON ("auth_group"."id" =
 "auth_user_groups"."group_id") WHERE "auth_user_groups"."user_id" IN (1,
 2, 3, [...] 99, 100, 101)',
   'time': '0.000'}]
 }}}

 Before applying any of the changes:
 {{{
 In [2]: %timeit tuple(User.objects.all())
 1.26 ms ± 9.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
 (0 calls to ._clone)
 In [3]: %timeit tuple(User.objects.prefetch_related('groups'))
 6.52 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
 (105 calls to ._clone)
 In [4]: %timeit tuple(User.objects.prefetch_related('groups',
 'user_permissions'))
 12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
 (213 calls to ._clone)
 }}}
 After enabling the avoidance cloning technique for prefetch_related only:
 {{{
 In [2]: %timeit tuple(User.objects.all())
 1.28 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
 (0 calls to ._clone)
 In [3]: %timeit tuple(User.objects.prefetch_related('groups'))
 5.93 ms ± 53.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
 (4 calls to ._clone)
 In [4]: %timeit tuple(User.objects.prefetch_related('groups',
 'user_permissions'))
 10.2 ms ± 59.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
 (7 calls to ._clone)
 }}}

 So for the 1 prefetch it's a decrease of maybe ~9%, and for the 2
 prefetches it's a decrease of roughly ~15%. I make no pretense about being
 able to calculate improvement by factor vs percent and risk getting it
 wrong.

 The number of calls to clone was done by re-running the queries in fresh
 `ipython` sessions, having `_clone()` increment a global integer each time
 it's called. It isn't part of the actual timeit numbers.

 These improved prefetch timings are only found when hitting the
 `manager.get_queryset()` of `prefetch_one_level` because we can assume
 that returns a fresh QuerySet instanc

Re: [Django] #28455: Create "inplace" QuerySets to speed up certain operations

2021-07-20 Thread Django
#28455: Create "inplace" QuerySets to speed up certain operations
-+-
 Reporter:  Anssi Kääriäinen |Owner:  Keryn
 Type:   |  Knight
  Cleanup/optimization   |   Status:  assigned
Component:  Database layer   |  Version:  dev
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by Keryn Knight):

 * owner:  nobody => Keryn Knight
 * status:  new => assigned


Comment:

 I don't think I'm stepping on anyone's toes at this juncture, but do shout
 if so!

 I'd accidentally been wading into similar waters when I stumbled across
 this ticket, which has ended up at roughly the same place/conclusions, and
 I have a rough sketch based on the prior branches for an MVP to resume
 discussion on this. I'll tidy up what I have, try and gather some simple
 benchmarks, and throw it into the PR queue for sanity checking.

 I guarantee there's more nuance to this than I've anticipated so far, but
 we've got to restart somewhere, right? If it works out, all the better :)

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/066.2e561b84b5871b94ca82ef1a6068fc02%40djangoproject.com.


Re: [Django] #28455: Create "inplace" QuerySets to speed up certain operations

2019-10-14 Thread Django
#28455: Create "inplace" QuerySets to speed up certain operations
-+-
 Reporter:  Anssi Kääriäinen |Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  master
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by Simon Charette):

 * cc: Simon Charette (added)


-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/066.bd7b88f16585d94b611805e1145d7812%40djangoproject.com.


Re: [Django] #28455: Create "inplace" QuerySets to speed up certain operations

2017-08-10 Thread Django
#28455: Create "inplace" QuerySets to speed up certain operations
-+-
 Reporter:  Anssi Kääriäinen |Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  master
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Tim Graham):

 I
 
[https://github.com/django/django/compare/master...timgraham:28455-inplace?expand=1
 rebased work] from
 [https://github.com/akaariai/django/commits/splitted_clone Anssi's
 branch]. I guess the plan was to keep "inplace" as a private API initially
 (hence the underscore prefix) but I'm not sure. Someone else is welcome to
 continue this work. I'm not planning to continue it soon.

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/066.27809e604f8952c32f0d4e72aad372cc%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.