[issue43882] [security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs.

2022-02-06 Thread Mike Lissner


Mike Lissner  added the comment:

Looks like that CVE isn't public yet.

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0391

Any chance I can get access (I originally reported this vuln.). My email is 
m...@free.law, if it's possible and my email is needed.

Thanks!

--

___
Python tracker 
<https://bugs.python.org/issue43882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-06 Thread Mike Lissner


Mike Lissner  added the comment:

>  With the fix for this bug, urlsplit silently removes (some of) those 
> characters before we can replace them, modifying the output of our 
> sanitisation code

I don't have any good solutions for 3.9.5, but going forward, this feels like 
another example of why we should just do parsing right (the way browsers do). 
That'd maintain tabs and whatnot in your output, and it'd fix the security 
issue by putting `java\nscript` into the scheme attribute instead of the path.

> One solution that presents itself to me: add a `strip_insecure_characters: 
> bool = True` parameter.

Doesn't this lose sight of what this tool is supposed to do? It's not supposed 
to have a good (new, correct) and a bad (old, obsolete) way of parsing. Woe 
unto whoever has to write the documentation for that parameter. 

Also, I should reiterate that these aren't "insecure" characters so if we did 
have a parameter for this, it'd be more like `do_rfc_3986_parsing` or maybe 
`do_naive_parsing`. The chars aren't insecure in themselves. They're fine. 
Python just gets tripped up on them.

--

___
Python tracker 
<https://bugs.python.org/issue43882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-05 Thread Mike Lissner


Mike Lissner  added the comment:

> I'd wonder how to pass through valid exceptions without urlparse raising 
> something.

Oops, meant to say "valid URLs", not valid exceptions, sorry.

--

___
Python tracker 
<https://bugs.python.org/issue43882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-05 Thread Mike Lissner


Mike Lissner  added the comment:

> Instead of the patches as you see them, we could've raised an exception.

In my mind the definition of a valid URL is what browsers recognize. They're 
moving towards the WHATWG definition, and so too must we. 

If we make python raise an exception when a URL has a newline in the scheme 
(e..g: "htt\np"), we'd be raising exceptions for *valid* URLs as browsers 
define them. That doesn't seem right at all to me. I'd be frustrated to have to 
catch such an exception, and I'd wonder how to pass through valid exceptions 
without urlparse raising something.


> Making the output 'sanitized' means that invalid input is converted into 
> valid output.  This goes against the principle of least surprise.

Well, not quite, right? The URLs this fixes *are* valid according to browsers. 
Browsers say these tabs and newlines are OK. 



I agree though that there's an issue with the approach of stripping input in a 
way that affects output. That doesn't seem right. 

I think the solution I'd favor (and I imagine what's coming in 43883) is to do 
this properly so that newlines are preserved in the output, but so that the 
scheme is also placed properly in the scheme attribute. 

So instead of this (from the initial report):

> In [9]: from urllib.parse import urlsplit
> In [10]: urlsplit("java\nscript:alert('bad')")
> Out[10]: SplitResult(scheme='', netloc='', path="java\nscript:alert('bad')", 
> query='', fragment='')

We get something like this:

> In [10]: urlsplit("java\nscript:alert('bad')")
> Out[10]: SplitResult(scheme='java\nscript', netloc='', path="alert('bad')", 
> query='', fragment='')

In other words, keep the funky characters and parse properly.

--

___
Python tracker 
<https://bugs.python.org/issue43882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-04 Thread Mike Lissner


Mike Lissner  added the comment:

I haven't watched that Blackhat presentation yet, but from the slides, it seems 
like the fix is to get all languages parsing URLs the same as the browsers. 
That's what @orsenthil has been doing here and plans to do in 
https://bugs.python.org/issue43883.

Should we get a bug filed with requests/urllib3 too? Seems like a good idea if 
it suffers from the same problems.

--

___
Python tracker 
<https://bugs.python.org/issue43882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43882] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-04-19 Thread Mike Lissner


Change by Mike Lissner :


--
nosy: +Mike.Lissner

___
Python tracker 
<https://bugs.python.org/issue43882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43883] Making urlparse WHATWG conformant

2021-04-19 Thread Mike Lissner


Change by Mike Lissner :


--
nosy: +Mike.Lissner

___
Python tracker 
<https://bugs.python.org/issue43883>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29315] \b requires raw strings or to be escaped. Update docs with that hint?

2017-01-18 Thread Mike Lissner

New submission from Mike Lissner:

I just ran into a funny corner case I imagine others are aware of. When you 
write "\b" in Python, it is a single character: "\x08". So if you try to write 
a regex like:

words = '\b(.*)\b'

That won't work. But using a raw string will:

words = r'\b(.*)\b'

As will escaping it in this horrible fashion:

words = '\\b(.*)\\b'

I believe this doesn't affect any of the other regex flags, so I wonder if it's 
worth adding something to the docs to warn about this. I just spent a bunch of 
time trying to figure out why it seemed like \b wasn't working. A little tip in 
the docs would have gone a LONG way.

--
assignee: docs@python
components: Documentation
messages: 285751
nosy: Mike.Lissner, docs@python
priority: normal
severity: normal
status: open
title: \b requires raw strings or to be escaped. Update docs with that hint?
type: enhancement
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29315>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10682] With '*args' or even bare '*' in def/call argument list, trailing comma causes SyntaxError

2016-03-29 Thread Mike Lissner

Mike Lissner added the comment:

This is an old issue, but where I run into it frequently is when I use the 
format function and string interpolation. For example, this throws a 
SyntaxError:

"The name of the person is {name_first} {name_last}".format(
**my_obj.__dict__,
)

Because strings tend to be fairly long, it's pretty common that the arguments 
to format end up on their own line. 

I was always taught to use trailing commas in Python, and I'm fanatical about 
ensuring they're there. It's a smart part of the language that saves you from 
many bugs and much typing when copy/pasting/tweaking. 

This is the first time I've ever run into an implementation bug in CPython, and 
at least from the post on StackOverflow, this looks like the parser isn't 
obeying the grammar: 
https://stackoverflow.com/questions/16950394/python-why-is-this-invalid-syntax

--
nosy: +Mike.Lissner

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10682>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22118] urljoin fails with messy relative URLs

2014-08-11 Thread Mike Lissner

Mike Lissner added the comment:

Just hopping in here to say that the work going down here is beautiful. I've 
filed a lot of bugs. This one's not particularly difficult, but damn, I 
appreciate the speed and quality going into fixing it. 

Glad to see the Python language is a happy place with fast, quality fixes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22118
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Mike Lissner

Mike Lissner added the comment:

@pitrou, I haven't delved into URLs in a long while, but the general idea is:

scheme://domain:port/path?query_string#fragment_id

When would it ever make sense to have something up a level from the root of the 
domain?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22118
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Mike Lissner

Mike Lissner added the comment:

@demian.brecht, that'd make me very pleased if you took this over. I won't have 
time to devote to it, unfortunately.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22118
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22118] urljoin fails with messy relative URLs

2014-08-01 Thread Mike Lissner

New submission from Mike Lissner:

Not sure if this is desired behavior, but it's making my code break, so I 
figured I'd get it filed.

I'm trying to crawl this website: 
https://www.appeals2.az.gov/ODSPlus/recentDecisions2.cfm

Unfortunately, most of the URLs in the HTML are relative, taking the form:

'../../some/path/to/some/pdf.pdf'

I'm using lxml's make_links_absolute() function, which calls urljoin creating 
invalid urls like:

https://www.appeals2.az.gov/../Decisions/CR20130096OPN.pdf

If you put that into Firefox or wget or whatever, it works, despite being 
invalid and making no sense. 

**It works because those clients fix the problem,** joining the invalid path 
and the URL into:

https://www.appeals2.az.gov/Decisions/CR20130096OPN.pdf

I know this will mean making urljoin have a workaround to fix bad HTML, but 
this seems to be what wget, Chrome, Firefox, etc. all do. 

I've never filed a Python bugs before, but is this something we could consider?

--
components: Library (Lib)
messages: 224500
nosy: Mike.Lissner
priority: normal
severity: normal
status: open
title: urljoin fails with messy relative URLs
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22118
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22118] urljoin fails with messy relative URLs

2014-08-01 Thread Mike Lissner

Mike Lissner added the comment:

FWIW, the workaround that I've just created for this problem is this:

u = 'https://www.appeals2.az.gov/../Decisions/CR20130096OPN.pdf'
# Split the url and rejoin it, nuking any '/..' patterns at the
# beginning of the path.
s = urlsplit(u)
urlunsplit(s[:2] + (re.sub('^(/\.\.)+', '', s.path),) + s[3:])

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22118
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com