[issue3270] test_multiprocessing: test_listener_client flakiness
Trent Nelson [EMAIL PROTECTED] added the comment: I was thinking about this on the way home last night and concluded that my last suggestion (s/0.0.0.0/127.0.0.1/) is a terrible one as well. I'd be happy with a mention in the documentation (for now) stating that if you listen on '0.0.0.0', Listener._address won't be a connectable end-point (and you'll have to explicitly connect to 127.0.0.1, for example). As for the original issue, Jesse I'm +1 on your connection_v2.patch. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3270 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3362] locale.getpreferredencoding() gives bus error on Mac OS X 10.4.11 PPC
Martin v. Löwis [EMAIL PROTECTED] added the comment: Lists of possible string encodings are here: http://developer.apple.com/documentation/CoreFoundation/Reference/CFStringRef/Reference/reference.html#//apple_ref/c/tdef/CFStringBuiltInEncodings and http://developer.apple.com/documentation/CoreFoundation/Reference/CFStringRef/Reference/reference.html#//apple_ref/doc/constant_group/External_String_Encodings So it would be interesting to know what CFStringGetSystemEncoding returns on your system. Notice the special value kCFStringEncodingInvalidId, which it might also return. I think printf(Encoding is %x\n, enc); should do. I think mac_getscript is fine as it stands: if name is NULL, it tries CFStringConvertEncodingToIANACharSetName which should perform a lookup in the Apple database. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3362 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3544] expection typo
Georg Brandl [EMAIL PROTECTED] added the comment: This is already fixed in SVN. -- resolution: - out of date status: open - closed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3544 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Hrvoje Nikšić [EMAIL PROTECTED] added the comment: Unfortunately dumping the internal representation of non-long arrays won't work, for several reasons. First, it breaks when porting pickles between platforms of different endianness such as Intel and SPARC. Then, it ignores the considerable work put into correctly pickling floats, including the support for IEEE 754 special values. Finally, it will break when unpickling Unicode character arrays pickled on different Python versions -- wchar_t is 2 bytes wide on Windows, 4 bytes on Unix. I believe pickling arrays to compact strings is the right approach on the grounds of efficiency and I wouldn't change it. We must only be careful to pickle to a string with a portable representation of values. The straightforward way to do this is to pick a standard size for types (much like the struct module does) and endianness and use it in the pickled array. Ints are simple, and the code for handling floats is already there, for example _PyFloat_Pack8 used by cPickle. Pickling arrays as lists is probably a decent workaround for the pending release because it's backward and forward compatible (old pickles will work as well as before and new pickles will be correctly read by old Python versions), but for the next release I would prefer to handle this the right way. If there is agreement on this, I can start work on a patch in the following weeks. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2389 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3545] Python turning off assertions (Windows)
New submission from Anders Bensryd [EMAIL PROTECTED]: We are using Windows XP SP2, Visual Studio 2005 Python 2.5.2. In Objects/exceptions.c the following code turns off all assertions. #if defined _MSC_VER _MSC_VER = 1400 defined(__STDC_SECURE_LIB__) /* Set CRT argument error handler */ prevCrtHandler = _set_invalid_parameter_handler (InvalidParameterHandler); /* turn off assertions in debug mode */ prevCrtReportMode = _CrtSetReportMode(_CRT_ASSERT, 0); #endif As far as I understand, this is to make sure that no assertion dialogs pop up during the internal Python tests. For ordinary users, this is not an issue. However, we are using the Python DLL in our product and when developing we always use the debug version of the Python DLL (as recommended). When we do Py_Initialize() all assertions are turned off, even our assertions, and this is not what we want. The current workaround is as follows (this is in our code): prevCrtReportMode=_CrtSetReportMode(_CRT_ASSERT,_CRTDBG_REPORT_MODE); Py_Initialize(); prevCrtReportMode=_CrtSetReportMode(_CRT_ASSERT,prevCrtReportMode); I am not certain if this is a bug or a feature and I really do not have a suggested solution since I do not know the real reasons for turning off assertions. Perhaps there already is a way to avoid this problem that I have not found? All comments are appreciated. -- components: Windows messages: 71049 nosy: abe severity: normal status: open title: Python turning off assertions (Windows) type: behavior versions: Python 2.5 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3545 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Martin v. Löwis [EMAIL PROTECTED] added the comment: I like to challenge the view what correct behavior is here. If I pickle an array of 32-bit integer values on one system, and unpickle it as an array of 64-bit integer values on a different system, is that correct, or incorrect? IMO, correct behavior would preserve the width as much as possible. For integers, this should be straight-forward, as it should be for floats and doubles (failing to unpickle them if the target system doesn't support a certain format). For Unicode, I think the array module should grow platform-independent width, for both 2-byte and 4-byte Unicode. When pickling, the pickle should always use network byte order; alternatively, the pickle should contain a byte order marker (presence of which could also be used as an indication that the new array pickle format is used). IOW, i would indicate little-endian four byte integers, and so on. -- nosy: +loewis ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2389 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Hrvoje Nikšić [EMAIL PROTECTED] added the comment: I think preserving integer width is a good idea because it saves us from having to throw overflow errors when unpickling to machines with different width of C types. The cost is that pickling/unpickling the array might change the array's typecode, which can be a problem for C code that processes the array's buffer and expects the C type to remain invariant. Instead of sticking to network byte order, I propose to include byte order information in the pickle (for example as '' or '' like struct does), so that pickling/unpickling between the same-endianness architectures doesn't have to convert at all. Floats are always pickled as IEEE754, but the same optimization (not having to convert anything) would apply when unpickling a float array on an IEEE754 architecture. Preserving widths and including endianness information would allow pickling to be as fast as it is now (with the exception of unicode chars and floats on non-IEEE754 platforms). It would also allow unpickling to be as fast between architecture with equal endianness, and correct between others. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2389 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3139] bytearrays are not thread safe
Martin v. Löwis [EMAIL PROTECTED] added the comment: I have now committed the patch to 2.6 as r65654, adding changes for the bz2module. I also decided to make the Py_buffer structure own its reference, as I was running out of arguments why not to. In the process, I removed PyObject_ReleaseBuffer, as it is redundant and would have an unclear sematics (what if the object passed directly and the object passed indirectly were different?). ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3139 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3546] Missing linebreak in ext.doctest output
New submission from Robert Schuppenies [EMAIL PROTECTED]: There is a linebreak missing in the doctest extension. See attached patch. -- assignee: georg.brandl components: Documentation tools (Sphinx) files: linebreak.patch keywords: patch messages: 71053 nosy: georg.brandl, schuppenies severity: normal status: open title: Missing linebreak in ext.doctest output type: behavior Added file: http://bugs.python.org/file11102/linebreak.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3546 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Matt Giuca [EMAIL PROTECTED] added the comment: Bill, this debate is getting snipy, and going nowhere. We could argue about what is the pure and correct thing to do, but we have a limited time frame here, so I suggest we just look at the important facts. 1. There is an overwhelming consensus (including from me) that a str-bytes version is acceptable to have in the library (whether or not it's the correct solution). 2. There is an overwhelming consensus (including from you) that a str-str version is acceptable to have in the library (whether or not it's the correct solution). 3. By default, the str-str version breaks much less code, so both of us decided to use it by default. To this end, both of our patches: 1. Have a str-bytes version available. 2. Have a str-str version available. 3. Have quote and unquote functions call the str-str version. So it seems we have agreed on that. Therefore, there should be no more arguing about which is more right. So all your arguments seem to be essentially saying the str-bytes methods work perfectly; I don't care about if the str-str methods are correct or not. The fact that your string versions quote UTF-8 and unquote Latin-1 shows just how un-seriously you take the str-str methods. Well the fact is that a) a great many users do NOT SHARE your ideals and will default to using quote and unquote rather than the bytes functions, and b) all of the rest of the library uses quote and unquote. So from a practical sense, how these methods behave is of the utmost importance - they are more important than any new functions we introduce at this point. For example, the cgi.FieldStorage and the http.server modules will implicitly call unquote and quote. That means whether you, or I, or Guido, or The King Of The Internet likes it or not, we have to have a most reasonable solution to the problem of quoting and unquoting strings. Good thing we don't need to [handle unescaped non-ASCII characters in unquote]; URIs consist of ASCII characters. Once again, practicality beats purity. I'd argue that it's a *good* (not strictly required) idea to not mangle input unless we have to. * Question: How does unquote_bytes deal with unescaped characters? Not sure I understand this question... I meant unescaped non-ASCII characters, as discussed above (eg. unquote_bytes('\u0123')). Your test cases probably aren't testing things I feel it's necessary to test. I'm interested in having the old test cases for urllib pass, as well as providing the ability to unquote_to_bytes(). I'm sorry, but you're missing the point of test-driven development. If you think there is a bug, you don't just fix it and say look, the old test cases still pass! You write new FAILING test cases to demonstrate the bug. Then you change the code to make the test cases pass. All your test suite proves is that you're happy with things the way they are. Matt, your patch is not some God-given thing here. No, I am merely suggesting that it's had a great deal more thought put into it -- not just my thought, but all the other people in the past month who've suggested different approaches and brought up discussion points. Including yourself -- it was your suggestion in the first place to have the str-bytes functions, which I agree are important. snip - Quote uses cache I see no real advantage there, except that it has a built-in memory leak. Just use a function. Good point. Well the merits of using a cache are completely independent from the behavioural aspects. I simply changed the existing code as little as possible. Hence this patch will have the same performance strengths/weaknesses as all previous versions, and the performance can be tuned after 3.0 if necessary. (Not urgent). On statistics about UTF-8 versus other encodings. Yes, I agree, there are lots of URIs floating around out there, in many different encodings. Unfortunately, we can't implicitly handle them all (and I'm talking once more explicitly about the str-str transform here). We need to pick one as the default. Whether Latin-1 is more popular than UTF-8 *for the time being* is no good reason to pick Latin-1. It is called a legacy encoding for a reason. It is being phased out and should NOT be supported from here on in as the default encoding in a major web programming language. (Also there is no point in claiming to be Unicode compliant then turning around and supporting a charset with 256 symbols by default). Because Python's urllib will mostly be used in the context of building web apps, it is up to the programmer to decide what encoding to use for h(is|er) web app. For future apps, this should almost certainly be UTF-8 (if it isn't, the website won't be able to accept form input across all characters, so isn't Unicode compliant anyway). The problem you mention of browsers submitting URIs encoded based on the charset is simply something we have to live with. A server will never be able to deal with that unless the URIs are coming
[issue3300] urllib.quote and unquote - Unicode issues
Matt Giuca [EMAIL PROTECTED] added the comment: By the way, what is the current status of this bug? Is anybody waiting on me to do anything? (Re: Patch 9) To recap my previous list of outstanding issues raised by the review: Should unquote accept a bytes/bytearray as well as a str? Currently, does not. I think it's meaningless to do so (and how to handle 127 bytes, if so?) Lib/email/utils.py: Should encode_rfc2231 with charset=None accept strings with non-ASCII characters, and just encode them to UTF-8? Currently does. Suggestion to restrict to ASCII on the review tracker; simple fix. Should quote raise a TypeError if given a bytes with encoding/errors arguments? (Motivation: TypeError is what you usually raise if you supply too many args to a function). Resolved. Raises TypeError. Lib/urllib/parse.py: (As discussed above) Should quote accept safe characters outside the ASCII range (thereby potentially producing invalid URIs)? Resolved? Implemented, but too messy and not worth it just to produce invalid URIs, so NOT in patch. That's only two very minor yes/no issues remaining. Please comment. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3300 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Antoine Pitrou [EMAIL PROTECTED] added the comment: I agree that given two similar patches, the one with more tests earns some bonus points. Also, it seems to me that round-trippability of quote()/unquote() is a logical and semantic requirement: in particular, if there is a default encoding, it should be the same for both. For future apps, this should almost certainly be UTF-8 (if it isn't, the website won't be able to accept form input across all characters, so isn't Unicode compliant anyway). Actually, it will be able to accept such form input, as characters not supported by the charset should be entity-encoded by the browser (e.g. #123;). I have no strong opinion on the very remaining points you listed, except that IMHO encode_rfc2231 with charset=None should not try to use UTF8 by default. But someone with more mail protocol skills should comment :) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3300 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2054] add ftp-tls support to ftplib - RFC 4217
Bill Janssen [EMAIL PROTECTED] added the comment: I think I'm just going to bring the unwrap already in the _ssl.c code out to the ssl.py module, that seems to be the simplest fix. Still not sure you can do a proper fix to ftplib here, but that seems to be a good thing to do anyway, rather than having people call directly into the _ssl module to get at it. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2054 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3139] bytearrays are not thread safe
Martin v. Löwis [EMAIL PROTECTED] added the comment: I also started working on porting it to 3.0, but couldn't complete that port yet - the memoryview object doesn't play nicely. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3139 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3547] Ctypes is confused by bitfields of varying integer types
New submission from Tim Maxwell [EMAIL PROTECTED]: Steps to reproduce: Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type help, copyright, credits or license for more information. from ctypes import * fields = [('a', c_short, 4), ('b', c_short, 4), ('c', c_long, 24)] class Foo(Structure): ... _fields_ = fields ... Foo.a Field type=c_short, ofs=0:0, bits=4 Foo.b Field type=c_short, ofs=0:4, bits=4 Foo.c Field type=c_long, ofs=-2:8, bits=24 # Wrong! More about my machine: sizeof(c_short) 2 sizeof(c_long) 4 This particular example comes from a 32-bit Mac OS X Intel machine. The bug has been reproduced on Linux as well, but could not be reproduced on Windows XP. -- assignee: theller components: ctypes messages: 71060 nosy: theller, tim.maxwell severity: normal status: open title: Ctypes is confused by bitfields of varying integer types type: behavior versions: Python 2.5 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3547 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2776] urllib2.urlopen() gets confused with path with // in it
Senthil [EMAIL PROTECTED] added the comment: I could reproduce this issue on trunk and p3k branch. The patch attached by Adrianna Pinska appropriately fixes this issue. I agree with the logic. Attaching the patch for py3k with the same fix. Thanks, Senthil Added file: http://bugs.python.org/file11103/issue2776-py3k.diff ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2776 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2054] add ftp-tls support to ftplib - RFC 4217
Bill Janssen [EMAIL PROTECTED] added the comment: OK, I think I've done the minimal fix necessary to the SSL module to allow this work to proceed. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2054 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3548] subprocess.pipe function
New submission from Miki Tebeka [EMAIL PROTECTED]: Attached is a patch that add pipe command to the subprocess module. pipe([ls], [grep, test_]) will return the output of ls | grep test_. -- components: Library (Lib) files: pipe.patch keywords: patch messages: 71062 nosy: tebeka severity: normal status: open title: subprocess.pipe function type: feature request versions: Python 2.6, Python 3.0 Added file: http://bugs.python.org/file11104/pipe.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3548 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3548] subprocess.pipe function
Miki Tebeka [EMAIL PROTECTED] added the comment: Not sure about the name, maybe chain will be better? ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3548 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Bill Janssen [EMAIL PROTECTED] added the comment: Larry Masinter is off on vacation, but I did get a brief message saying that he will dig up similar discussions that he was involved in when he gets back. Out of curiosity, I sent a note off to the www-international mailing list, and received this: ``For the authority (server name) portion of a URI, RFC 3986 is pretty clear that UTF-8 must be used for non-ASCII values (assuming, for a moment, that IDNA addresses are not Punycode encoded already). For the path portion of URIs, a large-ish proportion of them are, indeed, UTF-8 encoded because that has been the de facto standard in Web browsers for a number of years now. For the query and fragment parts, however, the encoding is determined by context and often depends on the encoding of some page that contains the form from which the data is taken. Thus, a large number of URIs contain non-UTF-8 percent-encoded octets.'' http://lists.w3.org/Archives/Public/www-international/2008JulSep/0041.html ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3300 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Bill Janssen [EMAIL PROTECTED] added the comment: For Antoine: I think the problem that Barry is facing with the email package is that Unicode strings are an ambiguous representation of a sequence of bytes; that is, there are a number of different byte sequences a Unicode string may have come from. His ingenious use of raw-unicode-escape is an attempt to conform to the requirement of having to produce a string, but without losing any data, so that an application program can, if it needs to, still reprocess that string and retrieve the original data. Naive application programs that sort of expected the result to be an ASCII string will be unaffected. Not sure it's the best idea; this is all about just where to force unexpected runtime failures. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3300 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3545] Python turning off assertions (Windows)
Martin v. Löwis [EMAIL PROTECTED] added the comment: As you must be building your own Python DLL, anyway, can't you just simply remove that code if you don't want it? -- nosy: +loewis ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3545 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Guido van Rossum [EMAIL PROTECTED] added the comment: Instead of sticking to network byte order, I propose to include byte order information in the pickle (for example as '' or '' like struct does), so that pickling/unpickling between the same-endianness architectures doesn't have to convert at all. Floats are always pickled as IEEE754, but the same optimization (not having to convert anything) would apply when unpickling a float array on an IEEE754 architecture. Preserving widths and including endianness information would allow pickling to be as fast as it is now (with the exception of unicode chars and floats on non-IEEE754 platforms). It would also allow unpickling to be as fast between architecture with equal endianness, and correct between others. This sounds like the best approach yet -- it can be made backwards compatible (so 2.6 can read 2.5 pickles at least on the same platform) and can be just as fast when unpickling on the same platform, and only slightly slower on a different platform. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2389 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Ismail Donmez [EMAIL PROTECTED] added the comment: With trunk when running test_multiprocessing in a tight loop I saw another problem: test_multiprocessing Process Process-61: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 229, in _bootstrap Process Process-60: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 229, in _bootstrap Process Process-62: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 229, in _bootstrap util._run_after_forkers() File /Users/cartman/Sources/py3k/Lib/multiprocessing/util.py, line 138, in _run_after_forkers util._run_after_forkers() File /Users/cartman/Sources/py3k/Lib/multiprocessing/util.py, line 138, in _run_after_forkers util._run_after_forkers() File /Users/cartman/Sources/py3k/Lib/multiprocessing/util.py, line 138, in _run_after_forkers items = list(_afterfork_registry.items()) items = list(_afterfork_registry.items()) File /Users/cartman/Sources/py3k/Lib/weakref.py, line 103, in items File /Users/cartman/Sources/py3k/Lib/weakref.py, line 103, in items items = list(_afterfork_registry.items()) File /Users/cartman/Sources/py3k/Lib/weakref.py, line 103, in items for key, wr in self.data.items(): RuntimeError: dictionary changed size during iteration for key, wr in self.data.items(): RuntimeError: dictionary changed size during iteration for key, wr in self.data.items(): RuntimeError: dictionary changed size during iteration The original problem itself seems to be fixed, so cheers for that! ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Bill Janssen [EMAIL PROTECTED] added the comment: Here's another thought: Let's put string_to_bytes and string_from_bytes into the binascii module, as a2b_percent and b2a_percent, respectively. Then parse.py would import them as from binascii import a2b_percent as percent_decode_as_bytes from binascii import b2a_percent as percent_encode_from_bytes and add two more functions: def percent_encode(string, encoding=UTF-8, error=strict, plus=False) def percent_decode(string, encoding=UTF-8, error=strict, plus=False) and would add backwards-compatible but deprecated functions for quote and unquote: def quote(s): warnings.warn(urllib.parse.quote should be replaced by percent_encode or percent_encode_from_bytes, FutureDeprecationWarning) if isinstance(s, str): return percent_encode(s) else: return percent_encode_from_bytes(s) def unquote(s): warnings.warn(urllib.parse.unquote should be replaced by percent_decode or percent_decode_to_bytes, FutureDeprecationWarning) if isinstance(s, str): return percent_decode(s) else: return percent_decode(str(s, ASCII, strict)) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3300 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2275] urllib2 header capitalization
John J Lee [EMAIL PROTECTED] added the comment: The CaseInsensitive dict class fails to preserve its invariants (implied invariants, since there are no tests for it). There are also problems with the documentation in the patch. I will submit a modified patch, I hope later this week. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2275 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2275] urllib2 header capitalization
John J Lee [EMAIL PROTECTED] added the comment: By the way, this is a feature addition, not a bug fix. The first beta releases for 2.6 and 3.0 came out some time ago, so according to PEP 361, this change should not be committed to trunk until after the 2.6 / 3.0 maintenance branches have been created. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2275 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Antoine Pitrou [EMAIL PROTECTED] added the comment: Le mardi 12 août 2008 à 19:37 +, Bill Janssen a écrit : Let's put string_to_bytes and string_from_bytes into the binascii module, as a2b_percent and b2a_percent, respectively. Well, it's my personal opinion, but I think we should focus on a simple and straightforward solution for the present issue before beta3 is released (which is in 8 days now). It has already been difficult to find a (quasi-)consensus for a simple patch to adapt quote()/unquote() to the realities of bytes/unicode separation in py3k: witness the length of the present discussion. (perhaps a sophisticated solution could still be adopted for 3.1, especially if it has backwards compatibility in mind) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3300 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Guido van Rossum [EMAIL PROTECTED] added the comment: Matt Giuca [EMAIL PROTECTED] added the comment: By the way, what is the current status of this bug? Is anybody waiting on me to do anything? (Re: Patch 9) I'll be reviewing it today or tomorrow. From looking at it briefly I worry that the implementation is pretty slow -- a method call for each character and a map() call sounds pretty bad. To recap my previous list of outstanding issues raised by the review: Should unquote accept a bytes/bytearray as well as a str? Currently, does not. I think it's meaningless to do so (and how to handle 127 bytes, if so?) The bytes 127 would be translated as themselves; this follows logically from how stuff is parsed -- %% and %FF are translated, everything else is not. But I don't really care, I doubt there's a need. Lib/email/utils.py: Should encode_rfc2231 with charset=None accept strings with non-ASCII characters, and just encode them to UTF-8? Currently does. Suggestion to restrict to ASCII on the review tracker; simple fix. I think I agree with that comment; it seems wrong to return UTF8 without setting that in the header. The alternative would be to default charset to utf8 if there are any non-ASCII chars in the input. I'd be okay with that too. Should quote raise a TypeError if given a bytes with encoding/errors arguments? (Motivation: TypeError is what you usually raise if you supply too many args to a function). Resolved. Raises TypeError. Lib/urllib/parse.py: (As discussed above) Should quote accept safe characters outside the ASCII range (thereby potentially producing invalid URIs)? Resolved? Implemented, but too messy and not worth it just to produce invalid URIs, so NOT in patch. Agreed, safe should be ASCII chars only. That's only two very minor yes/no issues remaining. Please comment. I believe patch 9 still has errors defaulting to strict for quote(). Weren't you going to change that? Regarding using UTF-8 as the default encoding, I still think this the right thing to do -- while the tables shown by Bill indicate that there's still a lot of Latin-1 out there, UTF-8 is definitely gaining on it, and I expect that Python apps, especially Py3k apps, are much more likely to follow (and hopefully reinforce! :-) this trend than to lag behind. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3300 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3362] locale.getpreferredencoding() gives bus error on Mac OS X 10.4.11 PPC
cfr [EMAIL PROTECTED] added the comment: Interesting. At least the 39 makes sense. I don't understand the documentation well enough to know what the 79 is about. I'm sorry but I can't work out what I should do with: printf(Encoding is %x\n, enc); Am I meant to use this in python, a standard shell or something else? I tried in a bash shell and a python interpreter (after undoing my work around) and both gave errors - a syntax error in the case of bash; a complaint about printf being unrecognised in python. I also tried import os, sys, locale first just in case. bash: syntax error near unexpected token `Encoding is %x\n,' (python) Traceback (most recent call last): File stdin, line 1, in module NameError: name 'printf' is not defined Sorry for being dumb about this. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3362 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3362] locale.getpreferredencoding() gives bus error on Mac OS X 10.4.11 PPC
cfr [EMAIL PROTECTED] added the comment: Just realised what I'm meant to do with it. Sorry - it is late (early, actually). Will report back when I get a chance to recompile. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3362 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com