Re: [Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-07 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
 Reporter:  Adam Sołtysik|Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  2.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Carlton Gibson):

 > Adding .values('pk') of course solves this as well.

 Yes… and for the same reason, right? 

 Fancy taking on a PR as per Simon’s suggestion? (We’re happy to advise…)

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.65461b15b2cb2e6e255f3f8a3f4e2575%40djangoproject.com.


Re: [Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-07 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
 Reporter:  Adam Sołtysik|Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  2.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Adam Sołtysik):

 Thank you for accepting. I have added `.values('pk').order_by()` before my
 `.count()` queries, but I'll be glad to be able to get rid of it.

 Just to add a bit more information. The "problem" is not only with
 `.distinct()`, also any `.annotate()` makes `count()` produce a subquery
 with the redundant annotations inside. For example, a query like this
 results in SQL that is 4 times slower for my data than the bare count:

 {{{Model.objects.annotate(id2=F('id')).count()}}}
 {{{(0.057) SELECT COUNT(*) FROM (SELECT "table"."id" AS Col1, "table"."id"
 AS "id2" FROM "table" GROUP BY "table"."id") subquery;}}}

 Adding `.values('pk')` of course solves this as well.

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.64f6f82f6b1522b1edbe9d926e66d160%40djangoproject.com.


Re: [Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-06 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
 Reporter:  Adam Sołtysik|Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  2.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:  Accepted
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by Carlton Gibson):

 * status:  closed => new
 * resolution:  invalid =>
 * stage:  Unreviewed => Accepted


Comment:

 OK, fine, thank you for the input Simon. Let’s accept on that basis. 

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.1f9e6e38a41f1fedb8f93b31f676ee6f%40djangoproject.com.


Re: [Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-06 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
 Reporter:  Adam Sołtysik|Owner:  nobody
 Type:   |   Status:  closed
  Cleanup/optimization   |
Component:  Database layer   |  Version:  2.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:  invalid
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Simon Charette):

 It should feasible to adapt `sql.Query.get_count()` to perform these
 optimizations by setting a `.values` mask when no columns are specified.

 
https://github.com/django/django/blob/05964b2198e53a8d66e34d83d9123e3051720b28/django/db/models/sql/query.py#L505-L514

 You might have to tweak the suquery pushdown logic a bit as that's what
 kicks in when `distinct` is used but it should be feasible.

 
https://github.com/django/django/blob/05964b2198e53a8d66e34d83d9123e3051720b28/django/db/models/sql/query.py#L421-L457

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.2ca440dc0e25922cf9a55d9f893be9f4%40djangoproject.com.


Re: [Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-06 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
 Reporter:  adamsol  |Owner:  nobody
 Type:   |   Status:  closed
  Cleanup/optimization   |
Component:  Database layer   |  Version:  2.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:  invalid
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Carlton Gibson):

 I think the issue is that there's no way for the `distinct()` call to know
 that you're going to follow up with a `count()` — and that code to special
 case for that isn't going to worth the effort.

 If you know you only want to select a single field tell the ORM via
 
[https://docs.djangoproject.com/en/2.2/ref/models/querysets/#django.db.models.query.QuerySet.only
 `only()`]. This'll give you want you need, I think:
 `Model.objects.only('id').distinct().count()`.

 (Maybe the ORM experts can tell you more, but I suspect that be more or
 less the end of the story...)

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.df697d9dbcbe6b9cd933230a35bd1f05%40djangoproject.com.


Re: [Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-06 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
 Reporter:  adamsol  |Owner:  nobody
 Type:   |   Status:  closed
  Cleanup/optimization   |
Component:  Database layer   |  Version:  2.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:  invalid
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by adamsol):

 I don't think it's a bug, but I think there is a place for optimization. I
 can't see a reason why all columns are SELECTed in a subquery just to
 perform a simple count. I've attached the SQLs generated and what I would
 expect.

 So, for any model, a query like this:

 {{{Model.objects.distinct().count()}}}

 should produce an SQL like this:

 {{{SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS Col1 FROM
 "table") subquery;}}}

 instead of:

 {{{SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS Col1, ... (all
 other columns) FROM "table") subquery;}}}

 which, depending on the number of columns and size of the table, can be
 several times slower (at least on PostgreSQL).

 If you think this is not worth the effort, then OK, I won't insist.

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.c601f2474d248b860127577aef97a8b3%40djangoproject.com.


Re: [Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-06 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
 Reporter:  adamsol  |Owner:  nobody
 Type:   |   Status:  closed
  Cleanup/optimization   |
Component:  Database layer   |  Version:  2.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:  invalid
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by Carlton Gibson):

 * status:  new => closed
 * resolution:   => invalid


Comment:

 Hi Adam,

 From the description here it just looks like the expected behaviour, as
 per the
 
[https://docs.djangoproject.com/en/2.2/ref/models/querysets/#django.db.models.query.QuerySet.distinct
 various `Note` callouts in the `distinct()` documentation].

 (If you really think there's a bug here, can I ask you to narrow it down
 to something more specific? What are the exact models and ORM calls in
 play. What is the exact SQL generated? Vs What do you expect and why?
 Thanks.)

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.9e52660ab2c610e804c6844af292037a%40djangoproject.com.


[Django] #30685: Suboptimal QuerySet.distinct().count()

2019-08-06 Thread Django
#30685: Suboptimal QuerySet.distinct().count()
-+-
   Reporter:  adamsol|  Owner:  nobody
   Type: | Status:  new
  Cleanup/optimization   |
  Component:  Database   |Version:  2.2
  layer (models, ORM)|
   Severity:  Normal |   Keywords:
   Triage Stage: |  Has patch:  0
  Unreviewed |
Needs documentation:  0  |Needs tests:  0
Patch needs improvement:  0  |  Easy pickings:  0
  UI/UX:  0  |
-+-
 I have a PostgreSQL table with 100 000 records and 15 columns.

 A simple `.count()` query results in a fast SQL (execution time in seconds
 on the left):

 {{{(0.015) SELECT COUNT(*) AS "__count" FROM "table";}}}

 When we add `.distinct()`, a subquery is created with all columns
 SELECTed:

 {{{(0.178) SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS Col1, ...
 (15 columns) FROM "table") subquery;}}}

 When instead of `.distinct()` we write `.distinct('id')` and add
 `.order_by('id', 'col1', 'col2')`, the subquery is additionally ORDERed:

 {{{(0.151) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("table"."id")
 "table"."id" AS Col1, ... (15 columns) FROM "table" ORDER BY "table"."id"
 ASC, "table"."col1" ASC, "table"."col2" ASC) subquery;}}}

 Funny thing is that without `.distinct('id')` we can write
 `.order_by('non_existing_column')` and it works without any exception.

 After adding `.values('id')` and an empty `.order_by()`, the query is as
 fast as it can be with DISTINCT:

 {{{(0.053) SELECT COUNT(*) FROM (SELECT DISTINCT ON ("table"."id")
 "table"."id" AS Col1 FROM "table") subquery;}}}

 I think that the subquery for `count()` should never contain additional
 columns nor be ordered (and the same probably goes for other
 aggregations).

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/050.e28f820276fe08ecb1d5fb8de4796627%40djangoproject.com.