Re: Model-level validation

2022-10-01 Thread Aaron Smith
David -

All of your points are accurate. A usable ORM will probably never be 
perfectly safe, and none of the Django workarounds are particularly 
difficult. But requiring extra steps to get the save level of data safety 
as other ORMs will, just by nature of human nature and scale, make Django a 
riskier choice as well as increase the cost and risk of maintaining it. I 
think that unnecessary risk damages Django's long-term viability as a 
project and a technology choice for an organization.

On Saturday, October 1, 2022 at 6:50:56 PM UTC-7 shang.xia...@gmail.com 
wrote:

> I'm not really interested in debating whether the ORM validates or not but 
> I thought it might be worth pointing out a few things that haven't been 
> touched on yet:
>
> > It's not right.
>
> Design decisions are often neither outright right nor wrong but more 
> tradeoffs of varying values.
>
>
> > The data store layer should protect the validity of the data.
>
> I disagree that the ORM is the data store layer - that's the database. I 
> never put any guarantees in ORM validation because there's always a myriad 
> of ways to get around it.
>
> If you want guarantees I suggest you look into setting up constraints, 
> they're quite easy with Django nowadays. Some examples aside from the usual 
> unique constraint:
>
>- Validation of choices? Setup a check constraint to check the value 
>exists in the TextChoices `values` attribute.
>- Validation of non-overlapping date ranges? Use range types with 
>exclusion constraints.
>- Only 1 column from a set of columns should be set? Use a check 
>constraint with an xor not null test.
>- There are plenty more of these :)
>
> Only the database can protect the data.
>
> --
> David
>
> On Fri, 30 Sept 2022 at 10:12, Aaron Smith  wrote:
>
>> Why doesn't Django validate Models on save()?
>>
>> I am aware that full_clean() is called when using ModelForms. But most 
>> web app development these days, and every django app I've ever worked with, 
>> are headless APIs. The default behavior is dangerous for the naive 
>> developer.
>>
>> Bringing View-level concepts such as forms or serializers down into 
>> celery tasks and management commands breaks separation of concerns, and 
>> having multiple validation implementations at different layers in the app 
>> is fraught with divergence and unexpected behavior.
>>
>> It's not right. The data store layer should protect the validity of the 
>> data.
>>
>> -- 
>>
> You received this message because you are subscribed to the Google Groups 
>> "Django developers (Contributions to Django itself)" group.
>>
> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-develop...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/37ec0c58-2561-4300-9ead-05160410c389n%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7ad09662-2fca-47c2-a1cf-e552856be204n%40googlegroups.com.


Re: Model-level validation

2022-10-01 Thread Aaron Smith
I'm glad you brought up tests, because automated tests for these issues are 
quite difficult and require tribal knowledge. A first-hand example from a 
while back: The application had over 90% test coverage, including all of 
the validation rules in the DRF serializer. One of those rules was that a 
`status` could only be certain values. At some point we decided to start 
ingesting this data from a CSV (polled/downloaded from S3) instead of POST 
to an endpoint. Nothing about the shape of the data changed, so no new 
tests were written around validity. A later bug in the CSV generation 
resulted in invalid values, which got ingested and saved directly with the 
Model. Cue a lost week of firefighting and data-fixing. Yes, this could 
have been unit tested. But requiring a full set of unit tests for object 
validity specific to every data source is an expanding set of tests that 
need to be maintained, if you have many sources and an evolving model.

Contrast to the other ORMs I've worked with: Your model tests validity, and 
the testing burden is far, far lower and less error prone as the 
application evolves.

I would be happy with any step in the direction of treating validation as a 
first-class feature of the ORM, even if it's not by default. A `validate` 
kwarg to `save()`, even if it's defaulted to False. Something plain, 
obvious, easy to communicate and review.
On Saturday, October 1, 2022 at 5:16:51 PM UTC-7 timog...@gmail.com wrote:

> > Among the dozen of Django applications I have worked on, at 3 companies, 
> not a single one was actually running any kind of validation. It has always 
> been a mistake, 100% of the time, *never* the desired behavior.
>
> Besides not taking time to understand how Django works, it seems they 
> weren't doing any manual testing or writing tests for invalid data either, 
> so for me, this doesn't add much weight to the argument.
>
> > I was able to find was that it was for "performance reasons", and I 
> refuse to believe that a mature web framework like Django would prioritize 
> performance (let's face it - even my millions-of-requests-per-day API 
> service doesn't care about a few extra milliseconds here and there) over 
> the most basic data safety concerns by default. 
>
> I'm not sure it's correct to dismiss performance considerations, 
> particularly when Model.full_clean() could add database queries for 
> validating unique or other constraints. I believe doing validation 
> redundantly (e.g. with form validation) or unnecessarily (e.g. bulk loading 
> good data) would add undesired overhead. I think that forcing the entire 
> Django ecosystem to opt out of automatic model validation as needed would 
> be a heavy requirement to impose at this point. And apparently it was 
> considered too heavy a requirement to impose when model validation was 
> added in Django 1.2, released May 2010. 
>
> I would try to keep an open mind to a concrete proposal, but I'm skeptical 
> and it's surely non-trivial.
> On Friday, September 30, 2022 at 7:53:24 PM UTC-4 aa...@aaronsmith.co 
> wrote:
>
>> If `ModelForm` were truly where validation logic lived, django would not 
>> even use foreign keys. All constraints would be handled at the Form layer. 
>> But we do use FKs, and also do other database-level consistency and 
>> validity features, because data objects should understand and enforce their 
>> own constraints. So what we have now is some types of validation happen 
>> below Model, and some live above in the Form. This means that the data 
>> store is not a single thing that's implemented with a simple interface, it 
>> is a network of things which in inherently more difficult to work with, 
>> understand, and maintain.
>>
>> If Forms were truly the validation layer, why am I able to specify things 
>> like maximum length and allowed choices on the Model? Shouldn't those 
>> things be specified at the Form layer?
>> On Friday, September 30, 2022 at 4:27:13 PM UTC-7 Aaron Smith wrote:
>>
>>> Jorg,
>>>
>>> I do not believe it violates any separation of concerns. `full_clean()` 
>>> is already a method on the Model class itself. The Model is already where 
>>> all validation logic lives, except for the actual *triggering* of the 
>>> validation.
>>>
>>> What I believe violates separation of concerns is that models do not run 
>>> something which is already internal to itself, i.e. they are not actually 
>>> fully functional as a data store layer, unless an external thing 
>>> (ModelForm) is implemented. That feels wrong to me.
>>> On Friday, September 30, 2022 at 2:19:35 PM UTC-7 
>>> j.bre...@netzkolchose.de wrote:
>>>
 @Aaron 

 Oh well, if anecdotal personal evidence counts for you - here is mine: 

 Working here since 2008 with Django in tons of projects in different 
 positions. The project sizes were from small websites to big API-driven 
 SPA cluster installations (with and w'o DRF). Ofc it is not all 
 rainbows 
 and ponies 

Re: Model-level validation

2022-10-01 Thread David Sanders
I'm not really interested in debating whether the ORM validates or not but
I thought it might be worth pointing out a few things that haven't been
touched on yet:

> It's not right.

Design decisions are often neither outright right nor wrong but more
tradeoffs of varying values.

> The data store layer should protect the validity of the data.

I disagree that the ORM is the data store layer - that's the database. I
never put any guarantees in ORM validation because there's always a myriad
of ways to get around it.

If you want guarantees I suggest you look into setting up constraints,
they're quite easy with Django nowadays. Some examples aside from the usual
unique constraint:

   - Validation of choices? Setup a check constraint to check the value
   exists in the TextChoices `values` attribute.
   - Validation of non-overlapping date ranges? Use range types with
   exclusion constraints.
   - Only 1 column from a set of columns should be set? Use a check
   constraint with an xor not null test.
   - There are plenty more of these :)

Only the database can protect the data.

--
David

On Fri, 30 Sept 2022 at 10:12, Aaron Smith  wrote:

> Why doesn't Django validate Models on save()?
>
> I am aware that full_clean() is called when using ModelForms. But most web
> app development these days, and every django app I've ever worked with, are
> headless APIs. The default behavior is dangerous for the naive developer.
>
> Bringing View-level concepts such as forms or serializers down into celery
> tasks and management commands breaks separation of concerns, and having
> multiple validation implementations at different layers in the app is
> fraught with divergence and unexpected behavior.
>
> It's not right. The data store layer should protect the validity of the
> data.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/37ec0c58-2561-4300-9ead-05160410c389n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CADyZw-6BoxxsR4q7dUfA7KwgPXc7%3D86nUWdJb8jXPQ_AvVwa4g%40mail.gmail.com.


Re: Model-level validation

2022-10-01 Thread Tim Graham
> Among the dozen of Django applications I have worked on, at 3 companies, 
not a single one was actually running any kind of validation. It has always 
been a mistake, 100% of the time, *never* the desired behavior.

Besides not taking time to understand how Django works, it seems they 
weren't doing any manual testing or writing tests for invalid data either, 
so for me, this doesn't add much weight to the argument.

> I was able to find was that it was for "performance reasons", and I 
refuse to believe that a mature web framework like Django would prioritize 
performance (let's face it - even my millions-of-requests-per-day API 
service doesn't care about a few extra milliseconds here and there) over 
the most basic data safety concerns by default. 

I'm not sure it's correct to dismiss performance considerations, 
particularly when Model.full_clean() could add database queries for 
validating unique or other constraints. I believe doing validation 
redundantly (e.g. with form validation) or unnecessarily (e.g. bulk loading 
good data) would add undesired overhead. I think that forcing the entire 
Django ecosystem to opt out of automatic model validation as needed would 
be a heavy requirement to impose at this point. And apparently it was 
considered too heavy a requirement to impose when model validation was 
added in Django 1.2, released May 2010. 

I would try to keep an open mind to a concrete proposal, but I'm skeptical 
and it's surely non-trivial.
On Friday, September 30, 2022 at 7:53:24 PM UTC-4 aa...@aaronsmith.co wrote:

> If `ModelForm` were truly where validation logic lived, django would not 
> even use foreign keys. All constraints would be handled at the Form layer. 
> But we do use FKs, and also do other database-level consistency and 
> validity features, because data objects should understand and enforce their 
> own constraints. So what we have now is some types of validation happen 
> below Model, and some live above in the Form. This means that the data 
> store is not a single thing that's implemented with a simple interface, it 
> is a network of things which in inherently more difficult to work with, 
> understand, and maintain.
>
> If Forms were truly the validation layer, why am I able to specify things 
> like maximum length and allowed choices on the Model? Shouldn't those 
> things be specified at the Form layer?
> On Friday, September 30, 2022 at 4:27:13 PM UTC-7 Aaron Smith wrote:
>
>> Jorg,
>>
>> I do not believe it violates any separation of concerns. `full_clean()` 
>> is already a method on the Model class itself. The Model is already where 
>> all validation logic lives, except for the actual *triggering* of the 
>> validation.
>>
>> What I believe violates separation of concerns is that models do not run 
>> something which is already internal to itself, i.e. they are not actually 
>> fully functional as a data store layer, unless an external thing 
>> (ModelForm) is implemented. That feels wrong to me.
>> On Friday, September 30, 2022 at 2:19:35 PM UTC-7 
>> j.bre...@netzkolchose.de wrote:
>>
>>> @Aaron 
>>>
>>> Oh well, if anecdotal personal evidence counts for you - here is mine: 
>>>
>>> Working here since 2008 with Django in tons of projects in different 
>>> positions. The project sizes were from small websites to big API-driven 
>>> SPA cluster installations (with and w'o DRF). Ofc it is not all rainbows 
>>> and ponies with Django, but python-side data validation never crossed my 
>>> way as seriously flawed in Django. NOT EVEN ONCE. (Could list at least 
>>> 5-7 other topics that are somewhat tedious to get done with Django, but 
>>> thats offtopic here.) 
>>>
>>> Plz dont jump from personal frustration about poor development processes 
>>> you have observed to all-conclusions, that depict most Django users as 
>>> total noobs. (Still funny to read, as it reminded me on those flaming 
>>> wars between Perl and Python folks ~18ys ago, which abruptly ended when 
>>> Perl6 finally made its debut.) 
>>>
>>> > The question is not whether you /can/ compose validation into django 
>>> > models. The concern is that it should be done /by default/ to protect 
>>> > the average naive newbie developer from mistakes. 
>>>
>>> I'm sorry if I didn't answer that more directly for you - nope, imho it 
>>> should not be done by default on `Model.save`. It violates the path in 
>>> separation of concerns Django has chosen with form validation, thus the 
>>> -1 from my side. 
>>>
>>>
>>> Regards, 
>>> Jörg 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/2331d0d6-a890-42dc-8c6d-c627e540f0ean%40googlegroups.com.