Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-04-16 Thread Pavel Savchenko
Thank you Adam,

This is more or less what I ended up doing, sans the replace call, very 
neat!

And thanks a lot for the expert advice, everyone!

For the time being at least, it seems we have an agreement on not allowing 
non-strict validation into Django and I have to agree it just makes sense 
to keep the stricter default.

Stay safe,
Pavel

On Thursday, April 16, 2020 at 11:37:55 PM UTC+2, Adam Johnson wrote:
>
> Folks wanting this can subclass URLValidator. 
>>
>
> For anyone who does want this, the subclass is not so much work. You can 
> inherit the regex pieces from URLValidator and edit them to insert _ as a 
> valid character:
>
> In [18]: import re
> ...:
> ...: from django.core.validators import URLValidator
> ...:
> ...:
> ...: class LenientURLValidator(URLValidator):
> ...: hostname_re = URLValidator.hostname_re.replace('0-9]', 
> '0-9_]').replace('0-9-]', '0-9-_]')
> ...: domain_re = URLValidator.domain_re.replace('0-9-]', '0-9-_]')
> ...: host_re = '(' + hostname_re + domain_re + URLValidator.tld_re 
> + '|localhost)'
> ...:
> ...: regex = re.compile(
> ...: r'^(?:[a-z0-9.+-]*)://'  # scheme is validated separately
> ...: r'(?:[^\s:@/]+(?::[^\s:@/]*)?@)?'  # user:pass 
> authentication
> ...: r'(?:' + URLValidator.ipv4_re + '|' + 
> URLValidator.ipv6_re + '|' + host_re + ')'
> ...: r'(?::\d{2,5})?'  # port
> ...: r'(?:[/?#][^\s]*)?'  # resource path
> ...: r'\Z', re.IGNORECASE)
> ...:
>
> In [19]: LenientURLValidator()('
> http://online_casino_news.hundredpercentgambling.com/')  # no 
> ValidationError
>
> It's a little tricky in the re.compile step that's copied form the 
> superclass, but it works.
>
> On Thu, 26 Mar 2020 at 17:28, James Bennett  > wrote:
>
>> I'm also in the "I don't think this should be allowed" camp. People
>> who really need it can set up their own validator easily enough, and I
>> worry about the security implications of supporting non-standard
>> behavior in something as crucial as hostname validation -- Django's
>> been bitten by that sort of thing several times in the past.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django developers  (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-d...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/CAL13Cg-k6CGLZo9o%3DRG4LpGj5CbP57ayeGyBrKYXa7SPx07%2BWg%40mail.gmail.com
>> .
>>
>
>
> -- 
> Adam
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8fd1991e-d779-47e0-b3f9-cf97c1856e7d%40googlegroups.com.


Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-04-16 Thread Adam Johnson
>
> Folks wanting this can subclass URLValidator.
>

For anyone who does want this, the subclass is not so much work. You can
inherit the regex pieces from URLValidator and edit them to insert _ as a
valid character:

In [18]: import re
...:
...: from django.core.validators import URLValidator
...:
...:
...: class LenientURLValidator(URLValidator):
...: hostname_re = URLValidator.hostname_re.replace('0-9]',
'0-9_]').replace('0-9-]', '0-9-_]')
...: domain_re = URLValidator.domain_re.replace('0-9-]', '0-9-_]')
...: host_re = '(' + hostname_re + domain_re + URLValidator.tld_re
+ '|localhost)'
...:
...: regex = re.compile(
...: r'^(?:[a-z0-9.+-]*)://'  # scheme is validated separately
...: r'(?:[^\s:@/]+(?::[^\s:@/]*)?@)?'  # user:pass
authentication
...: r'(?:' + URLValidator.ipv4_re + '|' + URLValidator.ipv6_re
+ '|' + host_re + ')'
...: r'(?::\d{2,5})?'  # port
...: r'(?:[/?#][^\s]*)?'  # resource path
...: r'\Z', re.IGNORECASE)
...:

In [19]: LenientURLValidator()('
http://online_casino_news.hundredpercentgambling.com/')  # no
ValidationError

It's a little tricky in the re.compile step that's copied form the
superclass, but it works.

On Thu, 26 Mar 2020 at 17:28, James Bennett  wrote:

> I'm also in the "I don't think this should be allowed" camp. People
> who really need it can set up their own validator easily enough, and I
> worry about the security implications of supporting non-standard
> behavior in something as crucial as hostname validation -- Django's
> been bitten by that sort of thing several times in the past.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers  (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAL13Cg-k6CGLZo9o%3DRG4LpGj5CbP57ayeGyBrKYXa7SPx07%2BWg%40mail.gmail.com
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM0J_x7ieZSgPBE8%2BJGmHFk8yvvwzprB2fv7PpJvuGE9_g%40mail.gmail.com.


Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-26 Thread James Bennett
I'm also in the "I don't think this should be allowed" camp. People
who really need it can set up their own validator easily enough, and I
worry about the security implications of supporting non-standard
behavior in something as crucial as hostname validation -- Django's
been bitten by that sort of thing several times in the past.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAL13Cg-k6CGLZo9o%3DRG4LpGj5CbP57ayeGyBrKYXa7SPx07%2BWg%40mail.gmail.com.


Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-26 Thread Carlton Gibson
> By all means add a lenient=False flag which can be turned to True to 
enable lenient parsing...

I don't think we should even allow this. The extra API surface area 
complicates the matter for all users, almost all of whom are never going to 
set the new flag to anything but the default. (Of those that do, how many 
wouldn't really have thought it through/mean it?) 

Folks wanting this can subclass URLValidator. 

C. 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7bc7bd6c-021b-45c2-bc0b-2fad3b2356b6%40googlegroups.com.


Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-26 Thread Florian Apolloner
Hi Adam,

On Wednesday, March 25, 2020 at 7:27:58 PM UTC+1, Adam Johnson wrote:
>
> I think that would make Florian happy, although it *has* been seven years 
> since his closing comment on the ticket.
>

You should know me better :D No this would not make Florian happy and he is 
still against it. By all means add a lenient=False flag which can be turned 
to True to enable lenient parsing but the defaults should imo stay.

It might be true that for the sole purpose of __displaying__ URLs that an 
underscore will not hurt, but in the greater scheme of things it simply 
does not work:

 * java.net.URI will not parse it: new 
java.net.URI("http://test_host.com;).getHost -> null
 * While you laugh about me mentioning java the more relevant argument is 
that we are going towards a HTTPs world and there you have to play by a 
different set of rules namely CA/Browser Forum Baseline Requirements. These 
requirements require you to follow RFCs (especially RFC 5280) which in turn 
requires subjectAltNames to follow the preferred style of RFC 1034 which 
finally disallows the use of underscores. So for this reason CAs won't 
allow you to issue certs for those hostnames, you can only make those work 
via wildcard certs, which in turn only work for subdomains and not TLDs.

So this limits the usefulness of underscores in URLs to mainly http-only 
sites or sites that went around extra hoops to get it working. In that 
sense I do not see a strong requirement to be lenient in parsing by default.

Cheers,
Florian

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/e6a7c79d-f53f-4893-bf05-06fa5475f915%40googlegroups.com.


Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-25 Thread Adam Johnson
You're right there are two use cases here. It does sound like the pragmatic
approach is to allow underscores in URL's normally, but to preserve the
existing behaviour for those with stricter use cases, like you say.

I can also propose a solution that would still work for both: (deprecate
> and) rename the current class to StrictURLValidator (or
> URLValidatorRFC1034), to still be easily used for the less common scenarios.


This sounds reasonable to me. I'm not sure we'd need the deprecation
period, given we'd only be adding one character to URLValidator. A release
note is typically enough in this situation, but I normally defer to the
fellows for this.

I think that would make Florian happy, although it *has* been seven years
since his closing comment on the ticket.

On Tue, 24 Mar 2020 at 16:41, Pavel Savchenko  wrote:

> Hey folks,
>
> Sorry for not providing a more specific scenario before, was short on time
> and just wanted to kick this off.
>
> The most common scenario that I can think of (and the one that most
> similar to our usage) would be a *form field* on a Django site, that
> allows users to input a URL which is saved and later displayed *as a link
> to* other users (e.g in blogs, comments, CMS systems, etc).
>
> Here's an example of a site, though clearly not a very reputable one:
> http://online_casino_news.hundredpercentgambling.com/ . Note that google
> groups automatically converted this one to a URL for me, and I was able to
> click and follow it both on Chrome and Firefox.
>
> In the above use case, by validating the correctness of the URL, we
> protect a user from making a mistake, but we don't really care about
> adhering to standards beyond that, the usability wins.
>
> There are other use cases, that might care about RFC 952
> /1034
>  guidelines about
> hostname. For example, if we're building a hosting or a name server
> management system, or maybe SSL certificates vendor.
> In such cases, it might actually benefit the user if the platform alerts
> on the validity of the hostname chosen by the user (at the very least to
> advise the users).
>
> However, I would guess that the first use case, of taking a URL to store
> and render it as a link, would be more common and thus more frequently
> needing to override the class.
>
> I can also propose a solution that would still work for both: (deprecate
> and) rename the current class to StrictURLValidator (or
> URLValidatorRFC1034), to still be easily used for the less common scenarios.
>
> What do you think?
>
> Best Regards,
> Pavel
>
>
> On Tuesday, March 24, 2020 at 2:36:33 PM UTC+1, Adam Johnson wrote:
>>
>> Hi Pavel
>>
>> The ticket ( https://code.djangoproject.com/ticket/20264 ) doesn't
>> mention any specific use cases, and nor have you. What has this behaviour
>> blocked for you?
>>
>> Thanks,
>>
>> Adam
>>
>> On Tue, 24 Mar 2020 at 12:46, Pavel Savchenko  wrote:
>>
>>> Hi Folks,
>>>
>>> I've just encountered this issue, and it seems Django's URLValidator
>>> regex for host is trying to abide to RFC 1034 recommendation
>>>  , when there are many
>>> sites in the wild that use underscore in their domain name.
>>>
>>> Can we please discuss this issue here, so we can eventually decide to
>>> reopen the ticket (or not) and perhaps allow for a pull-request to fix it?
>>>
>>> I found this stackoverflow question helpful, with many answers/comments
>>> with additional references:
>>> https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it
>>>
>>> Best regards,
>>> Pavel
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-d...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/6982245f-2b5a-4a32-8fe5-a063c7459b7c%40googlegroups.com
>>> 
>>> .
>>>
>>
>>
>> --
>> Adam
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/2506854e-9566-444a-8f83-e227215613ea%40googlegroups.com
> 
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to 

Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-24 Thread Pavel Savchenko
Hey folks,

Sorry for not providing a more specific scenario before, was short on time 
and just wanted to kick this off.

The most common scenario that I can think of (and the one that most similar 
to our usage) would be a *form field* on a Django site, that allows users 
to input a URL which is saved and later displayed *as a link to* other 
users (e.g in blogs, comments, CMS systems, etc).

Here's an example of a site, though clearly not a very reputable one: 
http://online_casino_news.hundredpercentgambling.com/ . Note that google 
groups automatically converted this one to a URL for me, and I was able to 
click and follow it both on Chrome and Firefox.

In the above use case, by validating the correctness of the URL, we protect 
a user from making a mistake, but we don't really care about adhering to 
standards beyond that, the usability wins.

There are other use cases, that might care about RFC 952 
/1034 
 guidelines about 
hostname. For example, if we're building a hosting or a name server 
management system, or maybe SSL certificates vendor.
In such cases, it might actually benefit the user if the platform alerts on 
the validity of the hostname chosen by the user (at the very least to 
advise the users).

However, I would guess that the first use case, of taking a URL to store 
and render it as a link, would be more common and thus more frequently 
needing to override the class.

I can also propose a solution that would still work for both: (deprecate 
and) rename the current class to StrictURLValidator (or 
URLValidatorRFC1034), to still be easily used for the less common scenarios.

What do you think?

Best Regards,
Pavel


On Tuesday, March 24, 2020 at 2:36:33 PM UTC+1, Adam Johnson wrote:
>
> Hi Pavel
>
> The ticket ( https://code.djangoproject.com/ticket/20264 ) doesn't 
> mention any specific use cases, and nor have you. What has this behaviour 
> blocked for you?
>
> Thanks,
>
> Adam
>
> On Tue, 24 Mar 2020 at 12:46, Pavel Savchenko  > wrote:
>
>> Hi Folks,
>>
>> I've just encountered this issue, and it seems Django's URLValidator 
>> regex for host is trying to abide to RFC 1034 recommendation 
>>  , when there are many 
>> sites in the wild that use underscore in their domain name.
>>
>> Can we please discuss this issue here, so we can eventually decide to 
>> reopen the ticket (or not) and perhaps allow for a pull-request to fix it?
>>
>> I found this stackoverflow question helpful, with many answers/comments 
>> with additional references: 
>> https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it
>>
>> Best regards,
>> Pavel
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-d...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/6982245f-2b5a-4a32-8fe5-a063c7459b7c%40googlegroups.com
>>  
>> 
>> .
>>
>
>
> -- 
> Adam
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/2506854e-9566-444a-8f83-e227215613ea%40googlegroups.com.


Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-24 Thread '1337 Shadow Hacker' via Django developers (Contributions to Django itself)
> when there are many sites in the wild that use underscore in their domain 
> name.

Can you share some examples please ?

In general, we should abide by standards unless we have a really good reason.

In my experience I always had to replace underscores by dashes for a reason or 
another in hostnames that were setup by people who don't read RFCs anyway, so 
I'm not sure Django itself can make a big difference.

Nonetheless, can't you override the validation on your side ?

Best

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/4P0Vp6NoChXt46knjUSR74oJYbPbNbK2mEe7Z1fRy3aFONIVQ7Icz5Nujmy_EHuwUA7PjjvnUiaqMdb1ADWmPOAf2XfonKEE51DUpt4Oqcc%3D%40protonmail.com.


Re: Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-24 Thread Adam Johnson
Hi Pavel

The ticket ( https://code.djangoproject.com/ticket/20264 ) doesn't mention
any specific use cases, and nor have you. What has this behaviour blocked
for you?

Thanks,

Adam

On Tue, 24 Mar 2020 at 12:46, Pavel Savchenko  wrote:

> Hi Folks,
>
> I've just encountered this issue, and it seems Django's URLValidator regex
> for host is trying to abide to RFC 1034 recommendation
>  , when there are many
> sites in the wild that use underscore in their domain name.
>
> Can we please discuss this issue here, so we can eventually decide to
> reopen the ticket (or not) and perhaps allow for a pull-request to fix it?
>
> I found this stackoverflow question helpful, with many answers/comments
> with additional references:
> https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it
>
> Best regards,
> Pavel
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/6982245f-2b5a-4a32-8fe5-a063c7459b7c%40googlegroups.com
> 
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM3PgXz3g%3Dk8BbV%3DaXtgT41PZgv5zmPCYaPHFr2i2%2BQ9%3Dw%40mail.gmail.com.


Discuss ticket 20264: URLValidator should allow underscores in local hostname

2020-03-24 Thread Pavel Savchenko
Hi Folks,

I've just encountered this issue, and it seems Django's URLValidator regex 
for host is trying to abide to RFC 1034 recommendation 
 , when there are many 
sites in the wild that use underscore in their domain name.

Can we please discuss this issue here, so we can eventually decide to 
reopen the ticket (or not) and perhaps allow for a pull-request to fix it?

I found this stackoverflow question helpful, with many answers/comments 
with additional references: 
https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it

Best regards,
Pavel

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6982245f-2b5a-4a32-8fe5-a063c7459b7c%40googlegroups.com.