Re: Stop QuerySet repr from executing queries

2022-04-21 Thread Ian Foote
I've been working on the Kolo debugging tool and as part of that I've also
run into this issue. Generating unexpected queries when monitoring a django
project was a nasty surprise.

In Kolo's case I was also able to work around it with a monkeypatch, but
not needing this would be a nice performance win.

I also want to point to a similar case in Django where I think the default
behaviour is more helpful - that of forms. Calling `str` on a form can
trigger form validation, but calling `repr` avoids this. I'd be in favour
of moving the current `__repr__` implementation for `QuerySet` into
`__str__` and having an implementation of `__repr__` guaranteed not to
generate extra queries.

Regards,
Ian

On Fri, 15 Apr 2022 at 16:03, 'Kevin Weaver' via Django developers
(Contributions to Django itself)  wrote:

> I know this thread is more than a year old, but I was just bitten by this
> as well. We use the Elastic APM Python agent
>  which, like Sentry, calls
> `repr()` on local variables that it sends as part of stack traces. We use 
> Django
> REST Framework  and had a
> serializer that looked something like this (fake) example:
>
> ```
> class AccountSerializer(ModelSerializer):
>   owner = PrimaryKeyRelatedField(queryset=User.objects.all())
>   days_since_joined = serializers.SerializerMethodField()
>
>   def get_days_since_joined(self, obj):
> try:
>   ...
> except Exception:
>   logger.exception(...)
> ```
>
> What happened was:
>
>1. We have several places in the application that return a paginated
>list of, say, 50 accounts, using this serializer.
>2. Someone introduced the log line to catch exceptions when
>calculating `days_since_joined`, which is called for each of the 50
>accounts returned on the page. Due to a bug, the exception was almost
>always raised.
>3. Each time the exception is raised, the Elastic APM Python agent
>generates a stack trace and calls `repr()` on local variables in each stack
>frame. One of those variables in the `get_days_since_joined` frame is the
>`AccountSerializer` object `self`, which includes the `owner` property.
>Calling `repr()` on the `owner` property evaluates the `User.objects.all()`
>QuerySet, which fetches the first 21 User objects from the database.
>4. We now had `SELECT ... FROM user LIMIT 21` being called in a loop
>50 times each time someone tries to load a page of accounts, suddenly
>consuming significantly more database CPU and slowing down the application
>considerably until queries start timing out.
>
> To prevent this from happening again, we ended up applying something very
> similar to the suggestions above as a monkey patch:
>
> ```
> orig = QuerySet.__repr__
>
> def __repr__(self):
>   if settings.DEBUG or self._result_cache:
> return orig(self)
>
>   return f"<{self.__class__.__name__} [not evaluated]>"
>
> QuerySet.__repr__ = __repr__
> ```
>
> If folks are willing to reconsider their positions on this issue, I am
> more than willing to do what is necessary to contribute this change
> upstream, so that others are not impacted by this in the future.
>
> Thank you,
>
> Kevin
>
> On Tuesday, March 23, 2021 at 9:02:39 AM UTC-4 Adam Johnson wrote:
>
>> Would James' suggestion of reusing the result cache in __repr__ have
>> solved your issue? I would like to see that first.
>>
>> I'm not against the DEBUG-only fetching but there hasn't been any
>> suggestion what to show instead that could be of use.
>>
>> One can also mitigate against bad queries particularly well by setting a
>> statement timeout at the database level (at least MySQL and PostgreSQL
>> support this), which most users will generally want at a scale where repr()
>> becomes a problem.
>>
>> On Tue, 23 Mar 2021 at 12:40, Joachim Jablon  wrote:
>>
>>> Just been bitten by that same bug (combination of Sentry, using a
>>> Queryset.as_manager() that creates an unfiltered queryset as a local
>>> variable in the stack, a create signal that errored provoking a stacktrace
>>> that contained the queryset, a table that is always filtered by a field,
>>> and sorted by another field and an index that behaves poorly without the
>>> aformentionned filter).
>>>
>>> So would a contribution that would avoid evaluating the uncached
>>> queryset on repr when DEBUG is False likely to be accepted ? If so, I'm
>>> ready to get my hands dirty :)
>>>
>>> Cheers !
>>>
>>> Le dimanche 7 mars 2021 à 00:54:56 UTC+1, mic...@turbosys.com.br a
>>> écrit :
>>>
 I agree with Matt on that. Avoiding executing the queries on production
 would be helpful!

 Let me share my case. I use django-rest-framework in a specific project
 and DRF has a feature: a serializer has a fancier string representation
 when printed. I was using a serializer with a queryset in a view that had
 an exception further, which caused the serializer to be 

Re: Stop QuerySet repr from executing queries

2022-04-21 Thread 'Kevin Weaver' via Django developers (Contributions to Django itself)
I know this thread is more than a year old, but I was just bitten by this 
as well. We use the Elastic APM Python agent 
 which, like Sentry, calls 
`repr()` on local variables that it sends as part of stack traces. We use 
Django 
REST Framework  and had a 
serializer that looked something like this (fake) example:

```
class AccountSerializer(ModelSerializer):
  owner = PrimaryKeyRelatedField(queryset=User.objects.all())
  days_since_joined = serializers.SerializerMethodField()

  def get_days_since_joined(self, obj):
try:
  ...
except Exception:
  logger.exception(...)
```

What happened was:

   1. We have several places in the application that return a paginated 
   list of, say, 50 accounts, using this serializer.
   2. Someone introduced the log line to catch exceptions when calculating 
   `days_since_joined`, which is called for each of the 50 accounts returned 
   on the page. Due to a bug, the exception was almost always raised.
   3. Each time the exception is raised, the Elastic APM Python agent 
   generates a stack trace and calls `repr()` on local variables in each stack 
   frame. One of those variables in the `get_days_since_joined` frame is the 
   `AccountSerializer` object `self`, which includes the `owner` property. 
   Calling `repr()` on the `owner` property evaluates the `User.objects.all()` 
   QuerySet, which fetches the first 21 User objects from the database.
   4. We now had `SELECT ... FROM user LIMIT 21` being called in a loop 50 
   times each time someone tries to load a page of accounts, suddenly 
   consuming significantly more database CPU and slowing down the application 
   considerably until queries start timing out.

To prevent this from happening again, we ended up applying something very 
similar to the suggestions above as a monkey patch:

```
orig = QuerySet.__repr__

def __repr__(self):
  if settings.DEBUG or self._result_cache:
return orig(self)

  return f"<{self.__class__.__name__} [not evaluated]>"

QuerySet.__repr__ = __repr__
```

If folks are willing to reconsider their positions on this issue, I am more 
than willing to do what is necessary to contribute this change upstream, so 
that others are not impacted by this in the future.

Thank you,

Kevin

On Tuesday, March 23, 2021 at 9:02:39 AM UTC-4 Adam Johnson wrote:

> Would James' suggestion of reusing the result cache in __repr__ have 
> solved your issue? I would like to see that first.
>
> I'm not against the DEBUG-only fetching but there hasn't been any 
> suggestion what to show instead that could be of use.
>
> One can also mitigate against bad queries particularly well by setting a 
> statement timeout at the database level (at least MySQL and PostgreSQL 
> support this), which most users will generally want at a scale where repr() 
> becomes a problem.
>
> On Tue, 23 Mar 2021 at 12:40, Joachim Jablon  wrote:
>
>> Just been bitten by that same bug (combination of Sentry, using a 
>> Queryset.as_manager() that creates an unfiltered queryset as a local 
>> variable in the stack, a create signal that errored provoking a stacktrace 
>> that contained the queryset, a table that is always filtered by a field, 
>> and sorted by another field and an index that behaves poorly without the 
>> aformentionned filter).
>>
>> So would a contribution that would avoid evaluating the uncached queryset 
>> on repr when DEBUG is False likely to be accepted ? If so, I'm ready to get 
>> my hands dirty :)
>>
>> Cheers !
>>
>> Le dimanche 7 mars 2021 à 00:54:56 UTC+1, mic...@turbosys.com.br a 
>> écrit :
>>
>>> I agree with Matt on that. Avoiding executing the queries on production 
>>> would be helpful!
>>>
>>> Let me share my case. I use django-rest-framework in a specific project 
>>> and DRF has a feature: a serializer has a fancier string representation 
>>> when printed. I was using a serializer with a queryset in a view that had 
>>> an exception further, which caused the serializer to be logged and thus, 
>>> the queryset to be executed.
>>>
>>> There are more details in this discussion: 
>>> https://github.com/encode/django-rest-framework/discussions/7782
>>>
>>> The short story is: I was logging this serializer passively and it was 
>>> causing the execution of a query to a whole table with millions of records, 
>>> without any sorting optimization, creating hard to debug performance 
>>> problems.
>>>
>>> I understand it is an unusual situation, but repeating Matt's words: 
>>> Django should be reliable in production for any combination of those 
>>> conditions. 
>>>
>>> Thanks!
>>>
>>> On Sunday, October 20, 2019 at 4:47:25 PM UTC-3 Matt wrote:
>>>
 Perhaps we ought to just keep the current behavior when DEBUG is True 
 (it seems so obvious now, I can't believe it wasn't the first thing I 
 suggested)? Django does lots of helpful things in DEBUG mode at the 
 expense 
 of 

Re: Stop QuerySet repr from executing queries

2021-03-23 Thread 'Adam Johnson' via Django developers (Contributions to Django itself)
Would James' suggestion of reusing the result cache in __repr__ have solved
your issue? I would like to see that first.

I'm not against the DEBUG-only fetching but there hasn't been any
suggestion what to show instead that could be of use.

One can also mitigate against bad queries particularly well by setting a
statement timeout at the database level (at least MySQL and PostgreSQL
support this), which most users will generally want at a scale where repr()
becomes a problem.

On Tue, 23 Mar 2021 at 12:40, Joachim Jablon  wrote:

> Just been bitten by that same bug (combination of Sentry, using a
> Queryset.as_manager() that creates an unfiltered queryset as a local
> variable in the stack, a create signal that errored provoking a stacktrace
> that contained the queryset, a table that is always filtered by a field,
> and sorted by another field and an index that behaves poorly without the
> aformentionned filter).
>
> So would a contribution that would avoid evaluating the uncached queryset
> on repr when DEBUG is False likely to be accepted ? If so, I'm ready to get
> my hands dirty :)
>
> Cheers !
>
> Le dimanche 7 mars 2021 à 00:54:56 UTC+1, mic...@turbosys.com.br a écrit :
>
>> I agree with Matt on that. Avoiding executing the queries on production
>> would be helpful!
>>
>> Let me share my case. I use django-rest-framework in a specific project
>> and DRF has a feature: a serializer has a fancier string representation
>> when printed. I was using a serializer with a queryset in a view that had
>> an exception further, which caused the serializer to be logged and thus,
>> the queryset to be executed.
>>
>> There are more details in this discussion:
>> https://github.com/encode/django-rest-framework/discussions/7782
>>
>> The short story is: I was logging this serializer passively and it was
>> causing the execution of a query to a whole table with millions of records,
>> without any sorting optimization, creating hard to debug performance
>> problems.
>>
>> I understand it is an unusual situation, but repeating Matt's words:
>> Django should be reliable in production for any combination of those
>> conditions.
>>
>> Thanks!
>>
>> On Sunday, October 20, 2019 at 4:47:25 PM UTC-3 Matt wrote:
>>
>>> Perhaps we ought to just keep the current behavior when DEBUG is True
>>> (it seems so obvious now, I can't believe it wasn't the first thing I
>>> suggested)? Django does lots of helpful things in DEBUG mode at the expense
>>> of performance. I think this would be an innocuous change for most people.
>>>
>>> It is not the most important thing to remove behavior that most of users
 use because we want to fix an edge case that was reported twice in the last
 six years.

>>>
>>> I don't consider any of those *individual *conditions to trigger the
>>> problem "off the beaten path" for a bigger production Django site. All of
>>> them *combined *is obviously extremely rare, but it will effect someone
>>> else eventually*. *It doesn't sit well with me to not fix the issue.
>>> Django should be reliable in production for any combination of those
>>> conditions.
>>>
>>> On Wednesday, October 16, 2019 at 12:14:37 AM UTC-7, Mariusz Felisiak
>>> wrote:

 W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro
 napisał:
>
> Yes, it's a complicated issue, but isn't the SQL query the ultimate
> representation of which methods are called or not?
>
> Having the query evaluated during debugging has been shown to be
> harmful in certain situations, isn't that the most important thing to fix?
>

 Current behavior is extremely valuable IMO. It is not the most
 important thing to remove behavior that most of users use because we want
 to fix an edge case that was reported twice in the last six years. I agree
 that we can clarify this in docs. SQL query is not a solutions because not
 all of ORM users know SQL. Moreover the current `repr()` shows values from
 DB that we'll miss showing only SQL. You can check in the Django
 documentation how users use the current `repr()` e.g. in tutorials.

 Best,
 Mariusz

>>> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/050a8686-eaa5-47e7-a9ab-d8dd6fd14844n%40googlegroups.com
> 
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: Stop QuerySet repr from executing queries

2021-03-23 Thread Joachim Jablon
Just been bitten by that same bug (combination of Sentry, using a 
Queryset.as_manager() that creates an unfiltered queryset as a local 
variable in the stack, a create signal that errored provoking a stacktrace 
that contained the queryset, a table that is always filtered by a field, 
and sorted by another field and an index that behaves poorly without the 
aformentionned filter).

So would a contribution that would avoid evaluating the uncached queryset 
on repr when DEBUG is False likely to be accepted ? If so, I'm ready to get 
my hands dirty :)

Cheers !

Le dimanche 7 mars 2021 à 00:54:56 UTC+1, mic...@turbosys.com.br a écrit :

> I agree with Matt on that. Avoiding executing the queries on production 
> would be helpful!
>
> Let me share my case. I use django-rest-framework in a specific project 
> and DRF has a feature: a serializer has a fancier string representation 
> when printed. I was using a serializer with a queryset in a view that had 
> an exception further, which caused the serializer to be logged and thus, 
> the queryset to be executed.
>
> There are more details in this discussion: 
> https://github.com/encode/django-rest-framework/discussions/7782
>
> The short story is: I was logging this serializer passively and it was 
> causing the execution of a query to a whole table with millions of records, 
> without any sorting optimization, creating hard to debug performance 
> problems.
>
> I understand it is an unusual situation, but repeating Matt's words: 
> Django should be reliable in production for any combination of those 
> conditions. 
>
> Thanks!
>
> On Sunday, October 20, 2019 at 4:47:25 PM UTC-3 Matt wrote:
>
>> Perhaps we ought to just keep the current behavior when DEBUG is True (it 
>> seems so obvious now, I can't believe it wasn't the first thing I 
>> suggested)? Django does lots of helpful things in DEBUG mode at the expense 
>> of performance. I think this would be an innocuous change for most people. 
>>
>> It is not the most important thing to remove behavior that most of users 
>>> use because we want to fix an edge case that was reported twice in the last 
>>> six years.
>>>
>>
>> I don't consider any of those *individual *conditions to trigger the 
>> problem "off the beaten path" for a bigger production Django site. All of 
>> them *combined *is obviously extremely rare, but it will effect someone 
>> else eventually*. *It doesn't sit well with me to not fix the issue. 
>> Django should be reliable in production for any combination of those 
>> conditions.
>>
>> On Wednesday, October 16, 2019 at 12:14:37 AM UTC-7, Mariusz Felisiak 
>> wrote:
>>>
>>> W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro 
>>> napisał:

 Yes, it's a complicated issue, but isn't the SQL query the ultimate 
 representation of which methods are called or not?

 Having the query evaluated during debugging has been shown to be 
 harmful in certain situations, isn't that the most important thing to fix?

>>>
>>> Current behavior is extremely valuable IMO. It is not the most important 
>>> thing to remove behavior that most of users use because we want to fix an 
>>> edge case that was reported twice in the last six years. I agree that we 
>>> can clarify this in docs. SQL query is not a solutions because not all of 
>>> ORM users know SQL. Moreover the current `repr()` shows values from DB that 
>>> we'll miss showing only SQL. You can check in the Django documentation how 
>>> users use the current `repr()` e.g. in tutorials.
>>>
>>> Best,
>>> Mariusz
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/050a8686-eaa5-47e7-a9ab-d8dd6fd14844n%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2021-03-06 Thread Michel Sabchuk
I agree with Matt on that. Avoiding executing the queries on production 
would be helpful!

Let me share my case. I use django-rest-framework in a specific project and 
DRF has a feature: a serializer has a fancier string representation when 
printed. I was using a serializer with a queryset in a view that had an 
exception further, which caused the serializer to be logged and thus, the 
queryset to be executed.

There are more details in this discussion: 
https://github.com/encode/django-rest-framework/discussions/7782

The short story is: I was logging this serializer passively and it was 
causing the execution of a query to a whole table with millions of records, 
without any sorting optimization, creating hard to debug performance 
problems.

I understand it is an unusual situation, but repeating Matt's words: Django 
should be reliable in production for any combination of those conditions. 

Thanks!

On Sunday, October 20, 2019 at 4:47:25 PM UTC-3 Matt wrote:

> Perhaps we ought to just keep the current behavior when DEBUG is True (it 
> seems so obvious now, I can't believe it wasn't the first thing I 
> suggested)? Django does lots of helpful things in DEBUG mode at the expense 
> of performance. I think this would be an innocuous change for most people. 
>
> It is not the most important thing to remove behavior that most of users 
>> use because we want to fix an edge case that was reported twice in the last 
>> six years.
>>
>
> I don't consider any of those *individual *conditions to trigger the 
> problem "off the beaten path" for a bigger production Django site. All of 
> them *combined *is obviously extremely rare, but it will effect someone 
> else eventually*. *It doesn't sit well with me to not fix the issue. 
> Django should be reliable in production for any combination of those 
> conditions.
>
> On Wednesday, October 16, 2019 at 12:14:37 AM UTC-7, Mariusz Felisiak 
> wrote:
>>
>> W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro 
>> napisał:
>>>
>>> Yes, it's a complicated issue, but isn't the SQL query the ultimate 
>>> representation of which methods are called or not?
>>>
>>> Having the query evaluated during debugging has been shown to be harmful 
>>> in certain situations, isn't that the most important thing to fix?
>>>
>>
>> Current behavior is extremely valuable IMO. It is not the most important 
>> thing to remove behavior that most of users use because we want to fix an 
>> edge case that was reported twice in the last six years. I agree that we 
>> can clarify this in docs. SQL query is not a solutions because not all of 
>> ORM users know SQL. Moreover the current `repr()` shows values from DB that 
>> we'll miss showing only SQL. You can check in the Django documentation how 
>> users use the current `repr()` e.g. in tutorials.
>>
>> Best,
>> Mariusz
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/e020a9f4-13cc-458b-a916-05ceed41e984n%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-20 Thread Matt
Perhaps we ought to just keep the current behavior when DEBUG is True (it 
seems so obvious now, I can't believe it wasn't the first thing I 
suggested)? Django does lots of helpful things in DEBUG mode at the expense 
of performance. I think this would be an innocuous change for most people. 

It is not the most important thing to remove behavior that most of users 
> use because we want to fix an edge case that was reported twice in the last 
> six years.
>

I don't consider any of those *individual *conditions to trigger the 
problem "off the beaten path" for a bigger production Django site. All of 
them *combined *is obviously extremely rare, but it will effect someone 
else eventually*. *It doesn't sit well with me to not fix the issue. Django 
should be reliable in production for any combination of those conditions.

On Wednesday, October 16, 2019 at 12:14:37 AM UTC-7, Mariusz Felisiak wrote:
>
> W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro napisał:
>>
>> Yes, it's a complicated issue, but isn't the SQL query the ultimate 
>> representation of which methods are called or not?
>>
>> Having the query evaluated during debugging has been shown to be harmful 
>> in certain situations, isn't that the most important thing to fix?
>>
>
> Current behavior is extremely valuable IMO. It is not the most important 
> thing to remove behavior that most of users use because we want to fix an 
> edge case that was reported twice in the last six years. I agree that we 
> can clarify this in docs. SQL query is not a solutions because not all of 
> ORM users know SQL. Moreover the current `repr()` shows values from DB that 
> we'll miss showing only SQL. You can check in the Django documentation how 
> users use the current `repr()` e.g. in tutorials.
>
> Best,
> Mariusz
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/689bd01d-a949-4e65-9cac-357fefa56680%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-16 Thread Mariusz Felisiak
W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro napisał:
>
> Yes, it's a complicated issue, but isn't the SQL query the ultimate 
> representation of which methods are called or not?
>
> Having the query evaluated during debugging has been shown to be harmful 
> in certain situations, isn't that the most important thing to fix?
>

Current behavior is extremely valuable IMO. It is not the most important 
thing to remove behavior that most of users use because we want to fix an 
edge case that was reported twice in the last six years. I agree that we 
can clarify this in docs. SQL query is not a solutions because not all of 
ORM users know SQL. Moreover the current `repr()` shows values from DB that 
we'll miss showing only SQL. You can check in the Django documentation how 
users use the current `repr()` e.g. in tutorials.

Best,
Mariusz

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/dc658cc9-0974-4088-9ab7-f8eb24c74d36%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-15 Thread Harro
Yes, it's a complicated issue, but isn't the SQL query the ultimate 
representation of which methods are called or not?

Having the query evaluated during debugging has been shown to be harmful in 
certain situations, isn't that the most important thing to fix?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/282c78ce-bcbe-4b74-ad8a-5d49cd2662e9%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-15 Thread charettes
I agree with what James eloquently said, the issue is more complicated it 
appears.

Simon

Le mardi 15 octobre 2019 05:59:27 UTC-4, James Bennett a écrit :
>
> This request is a bit more complicated than I think people
> realize. There's no practical way I can see to have the __repr__() of
> a QuerySet supply information sufficient to reconstruct it, at least
> not in a manner recognizable to most users of the ORM (there's no
> internal record of which query methods were called, or with which
> arguments -- only the manipulations of the nodes of the underlying
> Query object).
>
> And even if we decided to go to the trouble of having QuerySet store a
> record of query method calls, it's likely to be a considerable amount
> of trouble, since arguments to query methods may be objects that are
> difficult or impossible to safely store (such as iterators which would
> be consumed in creating a string representation).
>
> So having QuerySet.__repr__() produce the sort of "information to
> reconstruct the object" typical of the __repr__() of Python's builtins
> is not going to work.
>
> Which leaves the question of what to put in the __repr__(), if it's
> not going to be a way to reconstruct the QuerySet and it's not going
> to be the results of the query. We would also need to ensure QuerySet
> implements __str__() in a way that *does* provide the visual debugging
> aid of showing the results, since currently it doesn't, and relies on
> Python falling back to __repr__() to get the string representation.
>
> And... I don't have a good answer to this. Anything that isn't either
> the method chain to reconstruct the query, or the results of the
> query, is not going to be a sufficient debugging aid.
>
> Which brings us back around to the current behavior; it seems to me
> that this is the most useful thing we can do. It's already documented
> that a string representation of a QuerySet shows the results, which
> requires evaluating it and performing the query. There are some small
> optimizations we could do -- such as checking for a populated result
> cache and using it, rather than re-performing the query as __repr__()
> currently does -- and we could be more explicit about the fact that
> performing the query is a requirement for generating a string
> representation, but I think that's about the best we can do for this
> issue.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/f0e3d894-1f00-45b0-8615-ffd211f51bcd%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-15 Thread James Bennett
This request is a bit more complicated than I think people
realize. There's no practical way I can see to have the __repr__() of
a QuerySet supply information sufficient to reconstruct it, at least
not in a manner recognizable to most users of the ORM (there's no
internal record of which query methods were called, or with which
arguments -- only the manipulations of the nodes of the underlying
Query object).

And even if we decided to go to the trouble of having QuerySet store a
record of query method calls, it's likely to be a considerable amount
of trouble, since arguments to query methods may be objects that are
difficult or impossible to safely store (such as iterators which would
be consumed in creating a string representation).

So having QuerySet.__repr__() produce the sort of "information to
reconstruct the object" typical of the __repr__() of Python's builtins
is not going to work.

Which leaves the question of what to put in the __repr__(), if it's
not going to be a way to reconstruct the QuerySet and it's not going
to be the results of the query. We would also need to ensure QuerySet
implements __str__() in a way that *does* provide the visual debugging
aid of showing the results, since currently it doesn't, and relies on
Python falling back to __repr__() to get the string representation.

And... I don't have a good answer to this. Anything that isn't either
the method chain to reconstruct the query, or the results of the
query, is not going to be a sufficient debugging aid.

Which brings us back around to the current behavior; it seems to me
that this is the most useful thing we can do. It's already documented
that a string representation of a QuerySet shows the results, which
requires evaluating it and performing the query. There are some small
optimizations we could do -- such as checking for a populated result
cache and using it, rather than re-performing the query as __repr__()
currently does -- and we could be more explicit about the fact that
performing the query is a requirement for generating a string
representation, but I think that's about the best we can do for this
issue.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAL13Cg8H0ja0J72vCkP7MFpoN5DsxefLiUu%3D%3DkHaV1FhgBaM5Q%40mail.gmail.com.


Re: Stop QuerySet repr from executing queries

2019-10-15 Thread 'Alexandr Aktsipetrov' via Django developers (Contributions to Django itself)
Current behavior is indeed usually useful but I actually remember being 
annoyed by it during interactive debugging - as watches affect sql log.  

What about reusing DEBUG instead of introducing brand new setting? I 
imagine all tutorials are in the context of scaffolded project with 
DEBUG=True as a default?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c59553a3-74f0-430a-8924-7ae0d238b73d%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-15 Thread Carlton Gibson
I'd be -1 on the change here too. 

That QuerySet shows you its value in the shell is beyond valuable. 

Look at the Django Tutorial 
, 
or the Django Girl Tutorial 
, or any of the other 
learning examples, and imagine making it so that these *playing with the 
API* examples required grasping *at which point QuerySets are evaluated*. 
Users don't get that quickly. (These exact issues come up all the time on 
DRF — Why isn't my QuerySet updating? Because you evaluated it in your 
class definition...) 

If you must have a different behaviour here, use a subclass.

Kind Regards,

Carlton


On Tuesday, 15 October 2019 10:21:14 UTC+2, Mariusz Felisiak wrote:
>
> I'm -1 to this change. ... to be honest even with all of this fetching the 
> first 20 rows should not crash your production database... I'm against this 
> change because it will have a negative impact for most of users ...It will 
> also have a big impact on our documentation ~ 200 examples to change.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/e6c631ea-d0b0-4d39-a784-2169f1445952%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-15 Thread Mariusz Felisiak
Hi y'all,

I closed the original ticket because two responses is not enough to 
reopen it. I'm -1 to this change. IMO your issue is really niche and it's 
caused by multiple reasons, i.e. huge table, Meta.ordering by non-indexed 
column, uncaught exception, error reporting layer that calls `repr()` etc. 
and to be honest even with all of this fetching the first 20 rows should 
not crash your production database (I worked with tables that had hundreds 
of millions of records). I'm against this change because it will have a 
negative impact for most of users (like pointed by Luke[1]). It will also 
have a big impact on our documentation ~ 200 examples to change.

Best,
Mariusz

[1] https://code.djangoproject.com/ticket/20393#comment:5

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c459aec4-23f1-475f-9b9f-0acb2c83b54c%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-14 Thread Matt
Created https://github.com/django/django/pull/11917

This patch will print the results of the query (if it has been evaluated). 
I did hit one odd case while writing the tests. Essentially, if you do:

queryset = SomeModel.objects.all()
list(queryset)
SomeModel.objects.create(...)
repr(queryset)

The newly created model won't appear in the repr. I believe the existing 
behavior will show the newly created object. I'm still not a huge fan of 
this repr behavior (I lean towards just displaying the SQL like 
RawQuerySet).

I just noticed felixxm closed the original ticket. Unfortunately, I don't 
think I'm advertising the severity of this issue very well and most people 
are probably never going to encounter it (and if they do, it's probably 
their own fault). So to be more explicit about the problem...

*How to crash your production database with Django:*

   1. Use an error reporting system like Sentry
   2. Have a database table with millions of rows (the more the "better")
   3. Have a Django model representing the above table, with a default sort 
   of a non-indexed field
   4. Have a local variable somewhere in the stack like `queryset = 
   HugeTable.objects.all()` (assume it doesn't get "refined down" until deeper 
   in your code)
   5. Trigger an uncaught exception
   6. Your error reporting layer will call repr() on that queryset local 
   variable.
   7. repr(queryset) will trigger your database server to do SELECT ... 
   FROM HugeTable ORDER BY NonIndexedColumn LIMIT 21 (locking it up)
   8. Bonus: Trigger all of the above on all your wsgi processes
   
In short, just call repr(BigTable.objects.all()) (where BigTable has an 
ordering on an non-indexed column and lots of rows)



On Thursday, October 10, 2019 at 8:50:31 AM UTC-7, Matt wrote:
>
> repr(some_queryset) will execute the query and display some results from 
> the query. This is done because it's (somewhat) helpful for debugging.
>
>
> https://github.com/django/django/blob/2a6f45e08e8cb8c7e5157915c378b453109424d2/django/db/models/query.py#L248
>
> This has caused at least two people to scratch their heads about odd 
> queries being generated, and it crashed my production database yesterday.
>
> See #20393  and #30863 
> 
>
> Luke Plant has said:
>
> ... you have a conflict between the goal of being information rich and 
>> with *always* being useful for debugging. In general, automatically 
>> seeing the results is information rich and is useful in debugging, but 
>> sometimes it causes further problems.
>>
>> I have thought about it, and with this conflict of goals, I think we are 
>> going to have to err on the side of the current behaviour. It is possible 
>> to work around by not calling repr on QuerySets in your middleware, and 
>> there are too many people who will be upset (or have problems with their 
>> own debugging facilities) by changing this.
>>
>
> One reason people love Django is because of its attention to things like 
> debug-ability. I can certainly see the argument here. However, I want to 
> press on the desirability of this feature.
>
> The argument that you can work around it by not calling repr is certainly 
> true, but in practice, I don't think it's reasonable. Production error 
> reporting tools (like Sentry and New Relic) will call repr when reporting 
> on exceptions.
>
> I see this behavior as being analogous to a file object printing the first 
> 21 characters of a file, which it doesn't do (for good reason, I would 
> imagine):
>
> >>> f = open("foo.py")
> >>> repr(f)
> "<_io.TextIOWrapper name='foo.py' mode='r' encoding='UTF-8'>"
> >>>
>
> I think we ought to reconsider the behavior of repr on QuerySets. I'm of 
> the opinion that we should scrap it completely. It could be replaced by 
> something that generates the SQL that would be executed (although that may 
> be tricky), or some kind of representation of the query filter. 
>
> Any thoughts?
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fef2d187-196c-43cc-8935-8b0f4779d727%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-11 Thread Ryan Hiebert
On Fri, Oct 11, 2019 at 9:29 PM Matt  wrote:

> I think it should just show  instead of .
>

I agree, I think this makes the most sense.

I think it can be argued that QuerySet should be consistent with [RawQuerySet,
> which just uses a string of the query in the repr]
>

I can see the argument I suppose, but I think most of the time that's just
noise. When you want to see that, you'll say you want to see that by
getting the repr of the query itself. It seems like too much for a default
repr for QuerySet, IMO.

>
> Regarding other things that have been mentioned: I'm not too keen about
> introducing a setting for this behavior, or doing things differently in an
> interactive console. Django already has a lot of settings, and this seems
> like such a minor thing. And I don't know of any precedent for changing the
> behavior of Django in an interactive shell (and you can tell I'm not a fan
> of inconsistent behaviors).
>

Me either, and I'm one that mentioned it. I personally have no misgivings
about changing the repr unconditionally, and it would be my preference. I
was trying to consider potential dissent to give balanced feedback.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CABpHFHTF0%2BNHWhtdhziqtsKPeJo1nih77WMt%3DzY4kM0j%3DGTcSg%40mail.gmail.com.


Re: Stop QuerySet repr from executing queries

2019-10-11 Thread Matt
 

I reviewed the patch that was originally created for this:

 

https://github.com/django/django/pull/1055/commits/7d53d428c0323a7eca93b7b56968a895b031e2ae

 

Essentially, it only includes the results of the queryset in the repr *if* 
the QuerySet._result_cache has been populated. There is one minor thing I 
don't like about the particular implementation: it shows the empty list if 
the result cache is not populated. I think it should just show  
instead of . That way you can differentiate between 
_result_cache=None, and _result_cache=[].

 

This is an OK approach, but the inconsistency in the representation smells 
a little off.

 

RawQuerySet simply does:

 

def __repr__(self):

return "<%s: %s>" % (self.__class__.__name__, self.query)

 

which is dead simple, and useful. I think it can be argued that QuerySet 
should be consistent with it (otherwise, why isn't RawQuerySet executing 
its SQL to show the repr?)

 

Regarding other things that have been mentioned: I'm not too keen about 
introducing a setting for this behavior, or doing things differently in an 
interactive console. Django already has a lot of settings, and this seems 
like such a minor thing. And I don't know of any precedent for changing the 
behavior of Django in an interactive shell (and you can tell I'm not a fan 
of inconsistent behaviors).


I'm going to re-open the ticket and hope other Django devs chime in.



On Thursday, October 10, 2019 at 8:50:31 AM UTC-7, Matt wrote:
>
> repr(some_queryset) will execute the query and display some results from 
> the query. This is done because it's (somewhat) helpful for debugging.
>
>
> https://github.com/django/django/blob/2a6f45e08e8cb8c7e5157915c378b453109424d2/django/db/models/query.py#L248
>
> This has caused at least two people to scratch their heads about odd 
> queries being generated, and it crashed my production database yesterday.
>
> See #20393  and #30863 
> 
>
> Luke Plant has said:
>
> ... you have a conflict between the goal of being information rich and 
>> with *always* being useful for debugging. In general, automatically 
>> seeing the results is information rich and is useful in debugging, but 
>> sometimes it causes further problems.
>>
>> I have thought about it, and with this conflict of goals, I think we are 
>> going to have to err on the side of the current behaviour. It is possible 
>> to work around by not calling repr on QuerySets in your middleware, and 
>> there are too many people who will be upset (or have problems with their 
>> own debugging facilities) by changing this.
>>
>
> One reason people love Django is because of its attention to things like 
> debug-ability. I can certainly see the argument here. However, I want to 
> press on the desirability of this feature.
>
> The argument that you can work around it by not calling repr is certainly 
> true, but in practice, I don't think it's reasonable. Production error 
> reporting tools (like Sentry and New Relic) will call repr when reporting 
> on exceptions.
>
> I see this behavior as being analogous to a file object printing the first 
> 21 characters of a file, which it doesn't do (for good reason, I would 
> imagine):
>
> >>> f = open("foo.py")
> >>> repr(f)
> "<_io.TextIOWrapper name='foo.py' mode='r' encoding='UTF-8'>"
> >>>
>
> I think we ought to reconsider the behavior of repr on QuerySets. I'm of 
> the opinion that we should scrap it completely. It could be replaced by 
> something that generates the SQL that would be executed (although that may 
> be tricky), or some kind of representation of the query filter. 
>
> Any thoughts?
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/ddce11f2-6971-484c-a9bf-28f65421488f%40googlegroups.com.


Re: Stop QuerySet repr from executing queries

2019-10-10 Thread Adam Johnson
I’m in favour of changing repr to not execute. Luke’s reasoning seems
reasonable.

On Thu, 10 Oct 2019 at 17:15, Ryan Hiebert  wrote:

>
>
> On Thu, Oct 10, 2019 at 10:50 AM Matt  wrote:
>
>>
>> I think we ought to reconsider the behavior of repr on QuerySets. I'm of
>> the opinion that we should scrap it completely. It could be replaced by
>> something that generates the SQL that would be executed (although that may
>> be tricky), or some kind of representation of the query filter.
>>
>
> Just speaking for myself, I think I'm roughly in agreement with the
> general sentiment and reasoning behind your proposal. There's one place
> that I think there's common utility, and which I think is the most likely
> the cause disruption to people, the interactive prompt. The issues that
> you've mentioned are also present on the interactive prompt as well, so the
> change would be optimal there as well. Perhaps it would be possible to ease
> the pain somewhat if there were a setting to do reprs as currently, perhaps
> _only_ on the interactive prompt, to avoid the issues you've pointed out
> with tools like Sentry. I'm not sure what the default setting value should
> be, though I do think new projects should include whatever settings would
> be needed to make this the default behavior for new projects, even if the
> setting default was to maintain backward compatibility, as has been done
> for other settings.
>
> tl;dr: I like the motivation, but I'm not sure how the backward
> compatibility aspects will play out for the community.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CABpHFHS1%3DirRaQVWgiUnmTX%2By%3DB7Vy%2BpGLha-aoYmqTGUSj20g%40mail.gmail.com
> 
> .
>
-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM0XNtKYEVy3kMGz0mFD5JHcOdLzEb5fDj22jpsKAr4vjQ%40mail.gmail.com.


Re: Stop QuerySet repr from executing queries

2019-10-10 Thread Ryan Hiebert
On Thu, Oct 10, 2019 at 10:50 AM Matt  wrote:

>
> I think we ought to reconsider the behavior of repr on QuerySets. I'm of
> the opinion that we should scrap it completely. It could be replaced by
> something that generates the SQL that would be executed (although that may
> be tricky), or some kind of representation of the query filter.
>

Just speaking for myself, I think I'm roughly in agreement with the general
sentiment and reasoning behind your proposal. There's one place that I
think there's common utility, and which I think is the most likely the
cause disruption to people, the interactive prompt. The issues that you've
mentioned are also present on the interactive prompt as well, so the change
would be optimal there as well. Perhaps it would be possible to ease the
pain somewhat if there were a setting to do reprs as currently, perhaps
_only_ on the interactive prompt, to avoid the issues you've pointed out
with tools like Sentry. I'm not sure what the default setting value should
be, though I do think new projects should include whatever settings would
be needed to make this the default behavior for new projects, even if the
setting default was to maintain backward compatibility, as has been done
for other settings.

tl;dr: I like the motivation, but I'm not sure how the backward
compatibility aspects will play out for the community.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CABpHFHS1%3DirRaQVWgiUnmTX%2By%3DB7Vy%2BpGLha-aoYmqTGUSj20g%40mail.gmail.com.


Stop QuerySet repr from executing queries

2019-10-10 Thread Matt

repr(some_queryset) will execute the query and display some results from 
the query. This is done because it's (somewhat) helpful for debugging.

https://github.com/django/django/blob/2a6f45e08e8cb8c7e5157915c378b453109424d2/django/db/models/query.py#L248

This has caused at least two people to scratch their heads about odd 
queries being generated, and it crashed my production database yesterday.

See #20393  and #30863 


Luke Plant has said:

... you have a conflict between the goal of being information rich and with 
> *always* being useful for debugging. In general, automatically seeing the 
> results is information rich and is useful in debugging, but sometimes it 
> causes further problems.
>
> I have thought about it, and with this conflict of goals, I think we are 
> going to have to err on the side of the current behaviour. It is possible 
> to work around by not calling repr on QuerySets in your middleware, and 
> there are too many people who will be upset (or have problems with their 
> own debugging facilities) by changing this.
>

One reason people love Django is because of its attention to things like 
debug-ability. I can certainly see the argument here. However, I want to 
press on the desirability of this feature.

The argument that you can work around it by not calling repr is certainly 
true, but in practice, I don't think it's reasonable. Production error 
reporting tools (like Sentry and New Relic) will call repr when reporting 
on exceptions.

I see this behavior as being analogous to a file object printing the first 
21 characters of a file, which it doesn't do (for good reason, I would 
imagine):

>>> f = open("foo.py")
>>> repr(f)
"<_io.TextIOWrapper name='foo.py' mode='r' encoding='UTF-8'>"
>>>

I think we ought to reconsider the behavior of repr on QuerySets. I'm of 
the opinion that we should scrap it completely. It could be replaced by 
something that generates the SQL that would be executed (although that may 
be tricky), or some kind of representation of the query filter. 

Any thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/29051be5-2988-4295-a8d9-1c5ae9555ee1%40googlegroups.com.