[issue43882] [security] CVE-2022-0391: urllib.parse should sanitize urls containing ASCII newline and tabs.

2022-02-06 Thread Mike Lissner
Mike Lissner added the comment: Looks like that CVE isn't public yet. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0391 Any chance I can get access (I originally reported this vuln.). My email is m...@free.law, if it's possible and my email is needed. Thanks

[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-06 Thread Mike Lissner
Mike Lissner added the comment: > With the fix for this bug, urlsplit silently removes (some of) those > characters before we can replace them, modifying the output of our > sanitisation code I don't have any good solutions for 3.9.5, but going forward, this feels like anothe

[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-05 Thread Mike Lissner
Mike Lissner added the comment: > I'd wonder how to pass through valid exceptions without urlparse raising > something. Oops, meant to say "valid URLs", not valid exceptions, sorry. -- ___ Python tracker <https://bugs.py

[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-05 Thread Mike Lissner
Mike Lissner added the comment: > Instead of the patches as you see them, we could've raised an exception. In my mind the definition of a valid URL is what browsers recognize. They're moving towards the WHATWG definition, and so too must we. If we make python raise an exception when a

[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-05-04 Thread Mike Lissner
Mike Lissner added the comment: I haven't watched that Blackhat presentation yet, but from the slides, it seems like the fix is to get all languages parsing URLs the same as the browsers. That's what @orsenthil has been doing here and plans to do in https://bugs.python.org/issue43883

[issue43882] urllib.parse should sanitize urls containing ASCII newline and tabs.

2021-04-19 Thread Mike Lissner
Change by Mike Lissner : -- nosy: +Mike.Lissner ___ Python tracker <https://bugs.python.org/issue43882> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43883] Making urlparse WHATWG conformant

2021-04-19 Thread Mike Lissner
Change by Mike Lissner : -- nosy: +Mike.Lissner ___ Python tracker <https://bugs.python.org/issue43883> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue29315] \b requires raw strings or to be escaped. Update docs with that hint?

2017-01-18 Thread Mike Lissner
New submission from Mike Lissner: I just ran into a funny corner case I imagine others are aware of. When you write "\b" in Python, it is a single character: "\x08". So if you try to write a regex like: words = '\b(.*)\b' That won't work. But using a raw string will:

[issue10682] With '*args' or even bare '*' in def/call argument list, trailing comma causes SyntaxError

2016-03-29 Thread Mike Lissner
Mike Lissner added the comment: This is an old issue, but where I run into it frequently is when I use the format function and string interpolation. For example, this throws a SyntaxError: "The name of the person is {name_first} {name_last}".format( **my_obj.__dict__, ) Becau

[issue22118] urljoin fails with messy relative URLs

2014-08-11 Thread Mike Lissner
Mike Lissner added the comment: Just hopping in here to say that the work going down here is beautiful. I've filed a lot of bugs. This one's not particularly difficult, but damn, I appreciate the speed and quality going into fixing it. Glad to see the Python language is a happy place

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Mike Lissner
Mike Lissner added the comment: @pitrou, I haven't delved into URLs in a long while, but the general idea is: scheme://domain:port/path?query_string#fragment_id When would it ever make sense to have something up a level from the root of the domain

[issue22118] urljoin fails with messy relative URLs

2014-08-05 Thread Mike Lissner
Mike Lissner added the comment: @demian.brecht, that'd make me very pleased if you took this over. I won't have time to devote to it, unfortunately. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22118

[issue22118] urljoin fails with messy relative URLs

2014-08-01 Thread Mike Lissner
New submission from Mike Lissner: Not sure if this is desired behavior, but it's making my code break, so I figured I'd get it filed. I'm trying to crawl this website: https://www.appeals2.az.gov/ODSPlus/recentDecisions2.cfm Unfortunately, most of the URLs in the HTML are relative, taking

[issue22118] urljoin fails with messy relative URLs

2014-08-01 Thread Mike Lissner
Mike Lissner added the comment: FWIW, the workaround that I've just created for this problem is this: u = 'https://www.appeals2.az.gov/../Decisions/CR20130096OPN.pdf' # Split the url and rejoin it, nuking any '/..' patterns at the # beginning of the path. s = urlsplit(u) urlunsplit(s[:2