Re: PostgreSQL aggregation and views through unmanaged models

2018-07-31 Thread vojtech.bocek via Django developers (Contributions to Django itself)
Hello,
the fix for https://code.djangoproject.com/ticket/27241 caused significant 
performance drop for my project, because it pretty much prevents posgresql 
from using indexes on the particular query. I use unmanaged tables because 
Django is used to display data from another project's database, so it has 
just read-only access to the database and I don't want to create any 
migrations for it.

Another solution I came up while writing this post would be to replace the
> feature flag by a callable that takes a model as a single parameter and 
> returns
> whether or not the optimization can be performed against it. The default
> implementation would return `mode._meta.managed` but it would make it 
> easier for
> users affected by this to override in order to opt-in or out based on their
> application logic.
>

This would be perfect for my (arguably niche) use case. Should I try to 
prepare a patch?

On Monday, May 22, 2017 at 5:05:38 AM UTC+2, charettes wrote:
>
> Hello fellow developers,
>
> As some of you may know PostgreSQL 9.1 added support for GROUP'ing BY
> selected table primary keys[0] only. Five years ago it was reported[1] that
> Django could rely on this feature to speed up aggregation on models backed
> up by tables with either many fields or a few large ones.
>
> Being affected by this slow down myself I decided to dive into the ORM 
> internals
> and managed to get a patch that made it in 1.9[2] thanks to Anssi's and 
> Josh's
> review[3].
>
> One subtle thing I didn't know back in the time is that PostgreSQL query 
> planner
> isn't able to introspect database views columns' functional dependency 
> like it
> does with tables and thus prevents the primary key GROUP'ing optimization 
> from
> being used.
>
> While Django doesn't support database views officially it documents that
> unmanaged models can be used to query them[4] and thereby perform 
> aggregation on
> them and generating an invalid query.
>
> This was initially reported as a crashing bug 9 months ago[5] and the 
> consensus
> at this time was that it was an esoteric edge case since there was few 
> reports
> of breakages and it went off my radar. Fast-forward to a month ago, this is
> reported again[6] and it takes the reporter quite a lot of effort to 
> determine
> the origin of the issue, pushing me to come up with a solution as I 
> introduced
> this behavior.
>
> Before Claude makes me realize this is a duplicate of the former report 
> (which I
> completely forgot about in the mean time) I implement a patch and commit 
> it once
> it's reviewed [7].
>
> When I closed the initial ticket as "fixed" the reporter brought to my 
> attention
> that this was now introducing a performance regression for unmanaged models
> relying on aggregation and that we should document how to disable this
> optimization by creating a backend subclass as a workaround instead.
>
> In my opinion the current situation is as follow. The optimization 
> introduced a
> break in backward compatibility in 1.9 as we've always documented that 
> database
> views could be queried against using unmanaged models. If this issue had 
> been
> discovered during the 1.9 release cycle it would have been eligible for a
> backport because it was a bug in a newly introduced feature. Turning this
> optimization off for unmanaged models by assuming they could be views is 
> only
> going to degrade performance of queries using unmanaged models to perform
> aggregation on tables with either a large number of columns or large 
> columns
> using PostgreSQL.
>
> Therefore I'd favor we keep the current adjustment in the master branch as 
> it
> restores backward compatibility but I don't have strong feelings about 
> reverting
> it either if it's deemed inappropriate.
>
> Another solution I came up while writing this post would be to replace the
> feature flag by a callable that takes a model as a single parameter and 
> returns
> whether or not the optimization can be performed against it. The default
> implementation would return `mode._meta.managed` but it would make it 
> easier for
> users affected by this to override in order to opt-in or out based on their
> application logic.
>
> Thank you for your time,
> Simon
>
> [0] https://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-GROUPBY
> [1] https://code.djangoproject.com/ticket/19259
> [2] 
> https://github.com/django/django/commit/dc27f3ee0c3eb9bb17d6cb764788eeaf73a371d7
> [3] https://github.com/django/django/pull/4397
> [4] https://docs.djangoproject.com/en/1.11/ref/models/options/#managed
> [5] https://code.djangoproject.com/ticket/27241
> [6] https://code.djangoproject.com/ticket/28107
> [7] 
> https://github.com/django/django/commit/daf2bd3efe53cbfc1c9fd00222b8315708023792
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: PostgreSQL aggregation and views through unmanaged models

2017-12-11 Thread Matthew Pava
You asked me to join in on this conversation as I had an issue using Django 
1.11.  My third party apps do not yet all support Django 2.0.

I have an unmanaged model that is connected to a view in PostgreSQL.  I 
attempted to create an annotated version of the model manager using the 
.annotate() method.  Unfortunately, I am getting an error that some fields 
“must appear in the GROUP BY clause or be used in an aggregate function.”  
It was a mysterious error message for me.

 

Upon further investigation, it appears that I need a PRIMARY KEY in the 
backend to avoid having to include columns in a GROUP BY clause, but 
PostgreSQL does not allow for such constraints to be added to a VIEW.  The 
“id” column is just a row_number().

On Sunday, May 21, 2017 at 10:05:38 PM UTC-5, charettes wrote:
>
> Hello fellow developers,
>
> As some of you may know PostgreSQL 9.1 added support for GROUP'ing BY
> selected table primary keys[0] only. Five years ago it was reported[1] that
> Django could rely on this feature to speed up aggregation on models backed
> up by tables with either many fields or a few large ones.
>
> Being affected by this slow down myself I decided to dive into the ORM 
> internals
> and managed to get a patch that made it in 1.9[2] thanks to Anssi's and 
> Josh's
> review[3].
>
> One subtle thing I didn't know back in the time is that PostgreSQL query 
> planner
> isn't able to introspect database views columns' functional dependency 
> like it
> does with tables and thus prevents the primary key GROUP'ing optimization 
> from
> being used.
>
> While Django doesn't support database views officially it documents that
> unmanaged models can be used to query them[4] and thereby perform 
> aggregation on
> them and generating an invalid query.
>
> This was initially reported as a crashing bug 9 months ago[5] and the 
> consensus
> at this time was that it was an esoteric edge case since there was few 
> reports
> of breakages and it went off my radar. Fast-forward to a month ago, this is
> reported again[6] and it takes the reporter quite a lot of effort to 
> determine
> the origin of the issue, pushing me to come up with a solution as I 
> introduced
> this behavior.
>
> Before Claude makes me realize this is a duplicate of the former report 
> (which I
> completely forgot about in the mean time) I implement a patch and commit 
> it once
> it's reviewed [7].
>
> When I closed the initial ticket as "fixed" the reporter brought to my 
> attention
> that this was now introducing a performance regression for unmanaged models
> relying on aggregation and that we should document how to disable this
> optimization by creating a backend subclass as a workaround instead.
>
> In my opinion the current situation is as follow. The optimization 
> introduced a
> break in backward compatibility in 1.9 as we've always documented that 
> database
> views could be queried against using unmanaged models. If this issue had 
> been
> discovered during the 1.9 release cycle it would have been eligible for a
> backport because it was a bug in a newly introduced feature. Turning this
> optimization off for unmanaged models by assuming they could be views is 
> only
> going to degrade performance of queries using unmanaged models to perform
> aggregation on tables with either a large number of columns or large 
> columns
> using PostgreSQL.
>
> Therefore I'd favor we keep the current adjustment in the master branch as 
> it
> restores backward compatibility but I don't have strong feelings about 
> reverting
> it either if it's deemed inappropriate.
>
> Another solution I came up while writing this post would be to replace the
> feature flag by a callable that takes a model as a single parameter and 
> returns
> whether or not the optimization can be performed against it. The default
> implementation would return `mode._meta.managed` but it would make it 
> easier for
> users affected by this to override in order to opt-in or out based on their
> application logic.
>
> Thank you for your time,
> Simon
>
> [0] https://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-GROUPBY
> [1] https://code.djangoproject.com/ticket/19259
> [2] 
> https://github.com/django/django/commit/dc27f3ee0c3eb9bb17d6cb764788eeaf73a371d7
> [3] https://github.com/django/django/pull/4397
> [4] https://docs.djangoproject.com/en/1.11/ref/models/options/#managed
> [5] https://code.djangoproject.com/ticket/27241
> [6] https://code.djangoproject.com/ticket/28107
> [7] 
> https://github.com/django/django/commit/daf2bd3efe53cbfc1c9fd00222b8315708023792
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at 

Re: PostgreSQL aggregation and views through unmanaged models

2017-09-23 Thread Dylan Young
Is there any reason not to query the models on startup to determine which 
ones support the optimization and which do not (naively, which are proper 
tables and which are views, but perhaps postgresql can provide the 
information instead of relying on heuristics which could change in any 
release)?  Not only would this solve this issue, it would add a general 
mechanism for auto-configuring optimizations based on underlying 
model/table characteristics, essential for any ORM IMO. 

On Monday, 22 May 2017 00:05:38 UTC-3, charettes wrote:
>
> Hello fellow developers,
>
> As some of you may know PostgreSQL 9.1 added support for GROUP'ing BY
> selected table primary keys[0] only. Five years ago it was reported[1] that
> Django could rely on this feature to speed up aggregation on models backed
> up by tables with either many fields or a few large ones.
>
> Being affected by this slow down myself I decided to dive into the ORM 
> internals
> and managed to get a patch that made it in 1.9[2] thanks to Anssi's and 
> Josh's
> review[3].
>
> One subtle thing I didn't know back in the time is that PostgreSQL query 
> planner
> isn't able to introspect database views columns' functional dependency 
> like it
> does with tables and thus prevents the primary key GROUP'ing optimization 
> from
> being used.
>
> While Django doesn't support database views officially it documents that
> unmanaged models can be used to query them[4] and thereby perform 
> aggregation on
> them and generating an invalid query.
>
> This was initially reported as a crashing bug 9 months ago[5] and the 
> consensus
> at this time was that it was an esoteric edge case since there was few 
> reports
> of breakages and it went off my radar. Fast-forward to a month ago, this is
> reported again[6] and it takes the reporter quite a lot of effort to 
> determine
> the origin of the issue, pushing me to come up with a solution as I 
> introduced
> this behavior.
>
> Before Claude makes me realize this is a duplicate of the former report 
> (which I
> completely forgot about in the mean time) I implement a patch and commit 
> it once
> it's reviewed [7].
>
> When I closed the initial ticket as "fixed" the reporter brought to my 
> attention
> that this was now introducing a performance regression for unmanaged models
> relying on aggregation and that we should document how to disable this
> optimization by creating a backend subclass as a workaround instead.
>
> In my opinion the current situation is as follow. The optimization 
> introduced a
> break in backward compatibility in 1.9 as we've always documented that 
> database
> views could be queried against using unmanaged models. If this issue had 
> been
> discovered during the 1.9 release cycle it would have been eligible for a
> backport because it was a bug in a newly introduced feature. Turning this
> optimization off for unmanaged models by assuming they could be views is 
> only
> going to degrade performance of queries using unmanaged models to perform
> aggregation on tables with either a large number of columns or large 
> columns
> using PostgreSQL.
>
> Therefore I'd favor we keep the current adjustment in the master branch as 
> it
> restores backward compatibility but I don't have strong feelings about 
> reverting
> it either if it's deemed inappropriate.
>
> Another solution I came up while writing this post would be to replace the
> feature flag by a callable that takes a model as a single parameter and 
> returns
> whether or not the optimization can be performed against it. The default
> implementation would return `mode._meta.managed` but it would make it 
> easier for
> users affected by this to override in order to opt-in or out based on their
> application logic.
>
> Thank you for your time,
> Simon
>
> [0] https://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-GROUPBY
> [1] https://code.djangoproject.com/ticket/19259
> [2] 
> https://github.com/django/django/commit/dc27f3ee0c3eb9bb17d6cb764788eeaf73a371d7
> [3] https://github.com/django/django/pull/4397
> [4] https://docs.djangoproject.com/en/1.11/ref/models/options/#managed
> [5] https://code.djangoproject.com/ticket/27241
> [6] https://code.djangoproject.com/ticket/28107
> [7] 
> https://github.com/django/django/commit/daf2bd3efe53cbfc1c9fd00222b8315708023792
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/64ee2617-76c0-44a1-8b45-59ef48f61ae4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: PostgreSQL aggregation and views through unmanaged models

2017-05-22 Thread jroes
Thanks for that extensive write up!

As the reporter of #27241 it seems I would be arguing against my own self 
interest when I say that I'm not in favour of the patch, but my reasonings 
are as follows:

* The current behaviour is preferable in the vast majority of cases, only a 
couple of projects are affected by the change of behaviour in Django 1.9.
* Anyone using unmanaged models on views in a way that stops working when 
upgrading Django to >= 1.9 is actively maintaining their code and should be 
able to implement a workaround.
* The easiest workaround is only ~5 lines of code and pretty much restores 
the behaviour of Django 1.8 so it's perfectly acceptable for people 
upgrading from <= 1.8.
* Other workarounds are also possible, especially with the new subquery and 
other ORM improvements that have been introduced since Django 1.8.
* Upgrading from <= 1.8 doesn't necessarily mean upgrading to 2.0 (i.e. 
ready to upgrade to the next LTS but not yet ready to migrate to Python 3). 
These people will still end up looking for a workaround anyway.
* Any project using unmanaged models that works with Django >= 1.9 will 
suddenly see a performance hit when they upgrade to 2.0.
* The reporter of #28107, #26758 (which is probably the same issue) and I 
(reporter of #27241) have worked around the issue, so we none of us will 
benefit from a patch.

So while I think that fixing this bug is noble, in this case I think 
there's way more downsides than upsides.

Thanks!

Jaap Roes

On Monday, May 22, 2017 at 5:05:38 AM UTC+2, charettes wrote:
>
> Hello fellow developers,
>
> As some of you may know PostgreSQL 9.1 added support for GROUP'ing BY
> selected table primary keys[0] only. Five years ago it was reported[1] that
> Django could rely on this feature to speed up aggregation on models backed
> up by tables with either many fields or a few large ones.
>
> Being affected by this slow down myself I decided to dive into the ORM 
> internals
> and managed to get a patch that made it in 1.9[2] thanks to Anssi's and 
> Josh's
> review[3].
>
> One subtle thing I didn't know back in the time is that PostgreSQL query 
> planner
> isn't able to introspect database views columns' functional dependency 
> like it
> does with tables and thus prevents the primary key GROUP'ing optimization 
> from
> being used.
>
> While Django doesn't support database views officially it documents that
> unmanaged models can be used to query them[4] and thereby perform 
> aggregation on
> them and generating an invalid query.
>
> This was initially reported as a crashing bug 9 months ago[5] and the 
> consensus
> at this time was that it was an esoteric edge case since there was few 
> reports
> of breakages and it went off my radar. Fast-forward to a month ago, this is
> reported again[6] and it takes the reporter quite a lot of effort to 
> determine
> the origin of the issue, pushing me to come up with a solution as I 
> introduced
> this behavior.
>
> Before Claude makes me realize this is a duplicate of the former report 
> (which I
> completely forgot about in the mean time) I implement a patch and commit 
> it once
> it's reviewed [7].
>
> When I closed the initial ticket as "fixed" the reporter brought to my 
> attention
> that this was now introducing a performance regression for unmanaged models
> relying on aggregation and that we should document how to disable this
> optimization by creating a backend subclass as a workaround instead.
>
> In my opinion the current situation is as follow. The optimization 
> introduced a
> break in backward compatibility in 1.9 as we've always documented that 
> database
> views could be queried against using unmanaged models. If this issue had 
> been
> discovered during the 1.9 release cycle it would have been eligible for a
> backport because it was a bug in a newly introduced feature. Turning this
> optimization off for unmanaged models by assuming they could be views is 
> only
> going to degrade performance of queries using unmanaged models to perform
> aggregation on tables with either a large number of columns or large 
> columns
> using PostgreSQL.
>
> Therefore I'd favor we keep the current adjustment in the master branch as 
> it
> restores backward compatibility but I don't have strong feelings about 
> reverting
> it either if it's deemed inappropriate.
>
> Another solution I came up while writing this post would be to replace the
> feature flag by a callable that takes a model as a single parameter and 
> returns
> whether or not the optimization can be performed against it. The default
> implementation would return `mode._meta.managed` but it would make it 
> easier for
> users affected by this to override in order to opt-in or out based on their
> application logic.
>
> Thank you for your time,
> Simon
>
> [0] https://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-GROUPBY
> [1] https://code.djangoproject.com/ticket/19259
> [2] 
> 

Re: PostgreSQL aggregation and views through unmanaged models

2017-05-21 Thread Josh Smeaton
> Therefore I'd favor we keep the current adjustment in the master branch 
as it
> restores backward compatibility but I don't have strong feelings about 
reverting
> it either if it's deemed inappropriate.

Fixing the crash is the number 1 priority in my opinion, as it broke 
something that used to work.

Optimising aggregation for unmanaged models is a distant second goal. Make 
it right. Then make it fast.

It'd be nice to provide said optimisation for unmanaged models provided 
there was a palatable way of doing so. I'm not sure how users would be able 
to use your callable approach without subclassing the backend - unless that 
is the intention? Getting feedback from the reporters would be good of 
course.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/16e9386f-856f-4730-85a1-69ff1545a04c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


PostgreSQL aggregation and views through unmanaged models

2017-05-21 Thread charettes
Hello fellow developers,

As some of you may know PostgreSQL 9.1 added support for GROUP'ing BY
selected table primary keys[0] only. Five years ago it was reported[1] that
Django could rely on this feature to speed up aggregation on models backed
up by tables with either many fields or a few large ones.

Being affected by this slow down myself I decided to dive into the ORM 
internals
and managed to get a patch that made it in 1.9[2] thanks to Anssi's and 
Josh's
review[3].

One subtle thing I didn't know back in the time is that PostgreSQL query 
planner
isn't able to introspect database views columns' functional dependency like 
it
does with tables and thus prevents the primary key GROUP'ing optimization 
from
being used.

While Django doesn't support database views officially it documents that
unmanaged models can be used to query them[4] and thereby perform 
aggregation on
them and generating an invalid query.

This was initially reported as a crashing bug 9 months ago[5] and the 
consensus
at this time was that it was an esoteric edge case since there was few 
reports
of breakages and it went off my radar. Fast-forward to a month ago, this is
reported again[6] and it takes the reporter quite a lot of effort to 
determine
the origin of the issue, pushing me to come up with a solution as I 
introduced
this behavior.

Before Claude makes me realize this is a duplicate of the former report 
(which I
completely forgot about in the mean time) I implement a patch and commit it 
once
it's reviewed [7].

When I closed the initial ticket as "fixed" the reporter brought to my 
attention
that this was now introducing a performance regression for unmanaged models
relying on aggregation and that we should document how to disable this
optimization by creating a backend subclass as a workaround instead.

In my opinion the current situation is as follow. The optimization 
introduced a
break in backward compatibility in 1.9 as we've always documented that 
database
views could be queried against using unmanaged models. If this issue had 
been
discovered during the 1.9 release cycle it would have been eligible for a
backport because it was a bug in a newly introduced feature. Turning this
optimization off for unmanaged models by assuming they could be views is 
only
going to degrade performance of queries using unmanaged models to perform
aggregation on tables with either a large number of columns or large columns
using PostgreSQL.

Therefore I'd favor we keep the current adjustment in the master branch as 
it
restores backward compatibility but I don't have strong feelings about 
reverting
it either if it's deemed inappropriate.

Another solution I came up while writing this post would be to replace the
feature flag by a callable that takes a model as a single parameter and 
returns
whether or not the optimization can be performed against it. The default
implementation would return `mode._meta.managed` but it would make it 
easier for
users affected by this to override in order to opt-in or out based on their
application logic.

Thank you for your time,
Simon

[0] https://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-GROUPBY
[1] https://code.djangoproject.com/ticket/19259
[2] 
https://github.com/django/django/commit/dc27f3ee0c3eb9bb17d6cb764788eeaf73a371d7
[3] https://github.com/django/django/pull/4397
[4] https://docs.djangoproject.com/en/1.11/ref/models/options/#managed
[5] https://code.djangoproject.com/ticket/27241
[6] https://code.djangoproject.com/ticket/28107
[7] 
https://github.com/django/django/commit/daf2bd3efe53cbfc1c9fd00222b8315708023792

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/cf767186-8082-4553-ba85-2547169d5b53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.