[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-11-30 Thread Nick Coghlan
Nick Coghlan added the comment: Committed in r86889 The docs changes should soon be live at: http://docs.python.org/dev/library/urllib.parse.html If anyone would like to suggest changes to the wording of the docs for post beta1, or finds additional corner cases that the new bytes handling can

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-11-29 Thread Nick Coghlan
Nick Coghlan added the comment: New patch which addresses my last two comments (i.e. some basic explicit tests of the encode/decode methods on result objects, and urldefrag returns a named tuple like urlsplit and urlparse already did). A natural consequence of this patch is that mixed argumen

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-11-09 Thread Hallvard B Furuseth
Hallvard B Furuseth added the comment: urlunparse(url or params = bytes object) produces a result with the repr of the bytes object if params is set. urllib.parse.urlunparse(['http', 'host', '/dir', b'params', '', '']) --> "http://host/dir;b'params'" That's confusing since urllib/parse.py goes

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-11-08 Thread Éric Araujo
Éric Araujo added the comment: Related issue in msg120647. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: h

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-11-07 Thread Nick Coghlan
Nick Coghlan added the comment: Just a note for myself when I next update the patch: the 2-tuple returned by defrag needs to be turned into a real result type of its own, and the decode/encode methods on result objects should be tested explicitly. -- _

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-24 Thread Nick Coghlan
Nick Coghlan added the comment: Unless I hear some reasonable objections within the next week or so, I'm going to document and commit the ascii-strict coercion approach for beta 1. The difference in code clarity is such that I'm not even going to try to benchmark the two approaches against ea

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-24 Thread Nick Coghlan
Nick Coghlan added the comment: Attached a second version of the patch. Notable features: - uses a coercion-to-str-and-back strategy (using ascii-strict) - a significantly more comprehensive pass through the urlparse test suite. I'm happy that the test suite mods are complete with this pass. T

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-08 Thread Stephen J. Turnbull
Changes by Stephen J. Turnbull : -- nosy: +sjt ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pytho

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-08 Thread Nick Coghlan
Nick Coghlan added the comment: I've been pondering the idea of adopting a more conservative approach here, since there are actually two issues: 1. Properly quoted URLs are transferred as pure 7-bit ASCII (due to percent-encoding of everything else). However, most of the manipulation functio

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-05 Thread Senthil Kumaran
Senthil Kumaran added the comment: I wonder if Option2 (ascii+surrogateescape vs latin1) is only about performance. How about escapes that might occur if the Option2 is adopted. That might take higher priority than performance. Do we know 'how tight' that approach is? -- __

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-05 Thread Nick Coghlan
Nick Coghlan added the comment: On Tue, Oct 5, 2010 at 5:32 PM, STINNER Victor wrote: > > STINNER Victor added the comment: > >> If you were worried about performance, then surrogateescape is certainly >> much slower than latin1. > > If you were really worried about performance, the bytes type

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-05 Thread STINNER Victor
STINNER Victor added the comment: > If you were worried about performance, then surrogateescape is certainly > much slower than latin1. If you were really worried about performance, the bytes type is maybe faster than: decode bytes to str using latin-1, process str strings, encode str to byte

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-04 Thread Nick Coghlan
Nick Coghlan added the comment: Yeah, I'll have to time it to see how much difference latin-1 vs surrogateescape makes when the MSB is set in any bytes. -- ___ Python tracker __

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: > As per RDM's email to python-dev, a better way to create the > pseudo_str values would be by decoding as ascii with a surrogate > escape error handler rather than by decoding as latin-1. If you were worried about performance, then surrogateescape is certainly

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-02 Thread Nick Coghlan
Nick Coghlan added the comment: As per RDM's email to python-dev, a better way to create the pseudo_str values would be by decoding as ascii with a surrogate escape error handler rather than by decoding as latin-1. -- ___ Python tracker

[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

2010-10-01 Thread STINNER Victor
Changes by STINNER Victor : -- title: Allow bytes in some APIs that use string literals internally -> urllib.parse: Allow bytes in some APIs that use string literals internally ___ Python tracker __