Should django.contrib.auth.authenticate check is_active?

2019-04-11 Thread Daniel Tao
This is essentially an RFC. I've become more familiar with Django's 
authentication backend system lately, and something has stood out to me 
that I'd like to draw attention to.

A new site built with Django is likely to use the default ModelBackend, 
which includes a user_can_authenticate 

 
method that returns False for inactive users (where is_active = False). 
This can create the expectation that inactive users should not be able to 
authenticate as a general rule.

The development team might later choose to introduce some custom 
authentication backends, for example to authenticate with an external 
service or using API tokens, without realizing that these also need to 
check the user's is_active field in order to preserve existing behavior 
(which may have become enshrined in the team's policies, for example if 
they had been setting is_active = False on accounts that had been detected 
performing malicious activity).

In my opinion it would be desirable in a large percentage of cases to move 
the logic in user_can_authenticate out of ModelBackend specifically and 
into the authenticate method itself, so that inactive users are blocked 
from authenticating using *any* backend. This would remove the burden from 
developers of authentication backends to remember to check for is_active in 
each one.

An obvious problem with changing the default behavior is that it would not 
be backwards compatible. How each application interprets is_active is up to 
that application, and some may have authentication backends that allow 
inactive users to authenticate by design. One option for to addressing that 
would be to introduce a setting such as AUTHENTICATION_REQUIRE_ACTIVE, set 
to False by default (but perhaps explicitly set to True for new Django 
projects created with the startproject command), which could be enabled for 
a given application to apply this requirements consistently in a 
future-proof way.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/19cbe676-ab7a-4dc9-98ea-d2cfe767695b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Proposal to re-open #27017 (updating only dirty fields in save())

2019-03-14 Thread Daniel Tao
I agree with this:

> Therefore it seems like it would be a breaking change which is hard to 
communicate to users, a complicated situation.

This is why I'm recommending that this behavior be opt-in via a Django 
setting such as SAVE_UPDATE_DIRTY_FIELDS_ONLY. Application developers would 
then need to make a determination for themselves whether or not to enable 
it.

Adam is right to point out an edge case where this behavior could lead to 
subtle bugs. Not surprisingly, it's related to concurrency (which is where 
subtle bugs tend to live)! In response, I would point out that the whole 
point of this proposal is to address a *different* edge case, also related 
to concurrency, which I personally believe is more common, though it's 
admittedly hard to know without a lot of empirical data. (Actually it's 
also present in the scenario Adam describes, and he addresses this: "Sure 
there's a race condition and the results of either process 1 or 2 can be 
lost.")

>From the perspective of an application developer, if this functionality 
were available to me, I'd want to look at the code and ask myself this 
question: which is more work?

   - Are there lots of places where the code is setting a small number of 
   attributes and calling save() without update_fields, where concurrent 
   requests (or background jobs) could override each other's writes and user 
   data could be lost? Should I leave SAVE_UPDATE_DIRTY_FIELDS_ONLY 
   *disabled*, enumerate these code paths and add update_fields to all of 
   them?
   - Are there lots of places where the code is setting model attributes 
   whose values are derived from a combination of other attributes, not all of 
   which are set? Should I *enable* SAVE_UPDATE_DIRTY_FIELDS_ONLY, 
   enumerate these code paths and wrap them in select_for_update to ensure the 
   data doesn't get into an inconsistent state?


On Monday, January 28, 2019 at 8:00:21 AM UTC-6, Daniel Tao wrote:
>
> Hi!
>
> This is my first post on this list. I recently left a comment on #27017 
> <https://code.djangoproject.com/ticket/27017#comment:3> requesting to 
> re-open the topic of only saving dirty fields in save(). Tim Graham 
> helpfully directed to #4102 <https://code.djangoproject.com/ticket/4102> and 
> advised that I make the proposal on the dev mailing list, so that's what 
> I'm doing :)
>
> I've gone through the history of #4102 and taken notes on the challenges 
> that arose when this was first attempted 12 years ago (!). The TL;DR is 
> that, while there were indeed quite a few complications, I don't see 
> anything that came up that should be considered a flat-out showstopper. 
> Rather, what happened was that at about the 69th comment 
> <https://code.djangoproject.com/ticket/4102#comment:69>, back in 2012, 
> the conversation shifted as Matt Long made this observation:
>
> what started as a simple opt-in feature request of adding a field 
>> white-list to the Model's save function morphed into a complicated dirty 
>> flag approach that obviously has many edge cases and performance 
>> implications given that this ticket has been open for 5 years now
>
>
> From here it seems things progressed towards the current solution of 
> supporting the update_fields argument, and that's where things ended. I 
> would like to point out that Matt did *not* advocate for completely 
> abandoning all efforts to support dirty field tracking; to the contrary, in 
> the same comment he said this (emphasis mine):
>
> Clearly some people feel differently and favor the dirty flag approach for 
>> a more hands-off approach. As such, I propose adding support for *both 
>> methods*
>
>
> With that in mind, I believe it's worth re-opening this discussion. For a 
> fairly lengthy justification, see my aforementioned comment on #27017 
> <https://code.djangoproject.com/ticket/27017#comment:3>. I'll copy the 
> effective TL;DR of the proposal here for convenience:
>
> In my opinion Django could make most code bases inherently more resilient 
>> against latent race conditions by implementing some form of dirty field 
>> tracking and effectively providing the functionality of update_fields 
>> automatically. I would like to propose a new setting, something like 
>> SAVE_UPDATE_DIRTY_FIELDS_ONLY, to change the ORM's default behavior so 
>> that calls to Model.save() only update the fields that have been set on 
>> the model instance. Naturally for backwards compatibility this setting 
>> would be False by default.
>
>
> As for the concerns that were raised when this was first attempted, I will 
> now attempt to summarize what I found along with, in most cases, a bit of 
> editorializing from me.
>
> Performance
>
> The performance angle was first explored in a comm

Proposal to re-open #27017 (updating only dirty fields in save())

2019-01-28 Thread Daniel Tao
Hi!

This is my first post on this list. I recently left a comment on #27017 
 requesting to 
re-open the topic of only saving dirty fields in save(). Tim Graham 
helpfully directed to #4102  and 
advised that I make the proposal on the dev mailing list, so that's what 
I'm doing :)

I've gone through the history of #4102 and taken notes on the challenges 
that arose when this was first attempted 12 years ago (!). The TL;DR is 
that, while there were indeed quite a few complications, I don't see 
anything that came up that should be considered a flat-out showstopper. 
Rather, what happened was that at about the 69th comment 
, back in 2012, the 
conversation shifted as Matt Long made this observation:

what started as a simple opt-in feature request of adding a field 
> white-list to the Model's save function morphed into a complicated dirty 
> flag approach that obviously has many edge cases and performance 
> implications given that this ticket has been open for 5 years now


>From here it seems things progressed towards the current solution of 
supporting the update_fields argument, and that's where things ended. I 
would like to point out that Matt did *not* advocate for completely 
abandoning all efforts to support dirty field tracking; to the contrary, in 
the same comment he said this (emphasis mine):

Clearly some people feel differently and favor the dirty flag approach for 
> a more hands-off approach. As such, I propose adding support for *both 
> methods*


With that in mind, I believe it's worth re-opening this discussion. For a 
fairly lengthy justification, see my aforementioned comment on #27017 
. I'll copy the 
effective TL;DR of the proposal here for convenience:

In my opinion Django could make most code bases inherently more resilient 
> against latent race conditions by implementing some form of dirty field 
> tracking and effectively providing the functionality of update_fields 
> automatically. I would like to propose a new setting, something like 
> SAVE_UPDATE_DIRTY_FIELDS_ONLY, to change the ORM's default behavior so 
> that calls to Model.save() only update the fields that have been set on 
> the model instance. Naturally for backwards compatibility this setting 
> would be False by default.


As for the concerns that were raised when this was first attempted, I will 
now attempt to summarize what I found along with, in most cases, a bit of 
editorializing from me.

Performance

The performance angle was first explored in a comment 
 that said it 
"doesn't look good" and provided some benchmarks showing a performance hit 
from 0.17s to 2.64s for setting an attribute using the timeit 
 package. I didn't see 
anyone point out that the timeit method defaults to executing code *a 
million times*; so the throughput of the operation went from about 6 
million to closer to 400 thousand times per second. (The percentage change 
is indeed significant, but this doesn't *smell* like a potential 
bottleneck.)

It was noted in a couple 
 places 
 that it seems 
potentially shortsighted to focus so much on the performance of getting and 
setting attributes without taking into account the potential performance 
*benefits* of executing smaller UPDATE statements that write fewer columns. 
As far as I can tell, no one in the thread on #4102 actively investigated 
the latter.

Based on the unlikelihood of attribute setting representing a performance 
bottleneck, and the lack of data for the performance impact of executing 
smaller updates, I would consider the performance discussion largely a 
distraction. Though I do think it's worth measuring the latter.

Compatibility

It was observed that the two approaches considered on the ticket 
(overriding __setattr__ or defining custom property setters) would not work 
with obj.__dict__.update 
, which is 
apparently an optimization you can find prescribed on some blogs out int he 
wild. This supports the premise that this behavior should be opt in, so 
devs who are using incompatible techniques can stay away (unless/until 
they've removed those optimizations from their code).

I had already suggested putting this functionality behind a setting, so I 
think we're good here.

Correctness

There were some bugs related to 
 multi-table 
inheritance . It 
appears these were both followed by patches with proposed fixes.

Another bug was that fields updated in pre_save such as