Re: [Python-Dev] Python-3.0, unicode, and os.environ
> If the Unicode APIs only have correct unicode, sure. If not you'll > get errors translating to UTF-8 (and the byte APIs are supposed to > pass bad names through unaltered.) Kinda ironic, no? As far as I can see all Python Unicode strings can be encoded to UTF-8, even things like lone surrogates because Python doesn't care about them. So both the Unicode API and the binary API would be fail-safe on Windows. - Hagen ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Sun, Dec 7, 2008 at 2:07 AM, Hagen Fürstenau <[EMAIL PROTECTED]> wrote: >> If the Unicode APIs only have correct unicode, sure. If not you'll >> get errors translating to UTF-8 (and the byte APIs are supposed to >> pass bad names through unaltered.) Kinda ironic, no? > > As far as I can see all Python Unicode strings can be encoded to UTF-8, > even things like lone surrogates because Python doesn't care about them. > So both the Unicode API and the binary API would be fail-safe on Windows. Python is broken and needs to be fixed. http://bugs.python.org/issue3672 http://bugs.python.org/issue3297 -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
>> As far as I can see all Python Unicode strings can be encoded to UTF-8, >> even things like lone surrogates because Python doesn't care about them. >> So both the Unicode API and the binary API would be fail-safe on Windows. > > Python is broken and needs to be fixed. > > http://bugs.python.org/issue3672 > http://bugs.python.org/issue3297 But the question of whether Python should care about lone surrogates or not is at best tangential to the issue at hand. If you have lone surrogates in the Unicode API (and didn't raise an exception on the way getting there), then the sensible thing is to encode them into lone UTF-8 surrogates. Even if you wanted to prevent lone surrogates, encoding to UTF-8 for the binary API would not be the place to enforce it. - Hagen ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rewrite map for old URLs in place
Georg Brandl wrote: > Hi, > > with a bit of delay I finally got around to creating a mod_rewrite map of > the 2.5 URLs. URLs like http://docs.python.org/tut/node3.html will now > point permanently to the new URL. > > Let me know if you find a problem. Excellent news! Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Rewrite map for old URLs in place
Hi, with a bit of delay I finally got around to creating a mod_rewrite map of the 2.5 URLs. URLs like http://docs.python.org/tut/node3.html will now point permanently to the new URL. Let me know if you find a problem. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
Brett Cannon wrote: > On Sat, Dec 6, 2008 at 15:41, Barry Warsaw <[EMAIL PROTECTED]> wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote: >> >>> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson >>> <[EMAIL PROTECTED]> wrote: Since the release of 3.0, several critical issues have come to our attention. Namely, the builtin cmp function wasn't removed [1] and the new IO library proved to be (as expected) abysmally slow [2][3][4]. Christian proposed that we release 3.0.1 within the next week to patch up this critical issues. Thoughts? [1] http://bugs.python.org/1717 [2] http://bugs.python.org/4533 [3] http://bugs.python.org/4561 [4] http://bugs.python.org/4565 >> I've set the priority on all these to release blockers, but I have my >> reservations about 4561 and 4565. Resolution of those seem like more than a >> week or so away. >> >> If we want to do a bug fix release for 3.0.1, I'd like to do it no later >> than the 19th. >> > > +1 just to get rid of cmp(). And if io speedups can happen, great, but > they can also wait for 3.0.2. > A point release just to remove a function whose withdrawal has been advertised as a 3.0 change hardly seems worth the substantial effort of cutting a release. If cmp() shouldn't have been in 3.0 and was then there's surely no problem about removing it later as promised: anyone who uses it in 3.0 code shouldn't be. If it doesn't have to wait for a major release then is there any real need to cut the minor release immediately? regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] distutils patches, request for review
Hi, I am looking for a core developer to review a few patches for distutils. #1 is mandatory (it removes a bad bug) #2 is very nice to have #3 to #5 are test coverage and code beautication In order: 1. #4400 : the default generated .pypirc is broken. This patch fixes it: http://bugs.python.org/issue4400 2. #4394 : no need to store the password in pypirc anymore : using the prompt if not stored. http://bugs.python.org/issue4394 3. #2461 : more test coverage. http://bugs.python.org/issue2461 4. #3992 : removes custom log implementation -> uses logging instead. http://bugs.python.org/issue3992 5. #3985 : more cleanup. http://bugs.python.org/issue3985 6. #3986 : http://bugs.python.org/issue3986 Some of them are a few month old so I can refresh the patch on the current trunk(s) as soon as they are picked. Regards Tarek -- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
On Sun, Dec 7, 2008 at 5:38 AM, Steve Holden <[EMAIL PROTECTED]> wrote: > A point release just to remove a function whose withdrawal has been > advertised as a 3.0 change hardly seems worth the substantial effort of > cutting a release. If cmp() shouldn't have been in 3.0 and was then > there's surely no problem about removing it later as promised: anyone > who uses it in 3.0 code shouldn't be. > > If it doesn't have to wait for a major release then is there any real > need to cut the minor release immediately? Well, since 2to3 doesn't remove cmp, and it actually works, it's likely that people will be accidentally depending on it in code converted from 2.x. In the past, where there was a discrepancy between docs and code, we've often ruled in favor of the code using arguments like "it always worked like this so we'll break working code if we change it now". There's clearly an argument of timeliness there, which is why we'd like to get this fixed ASAP. The alternative, which nobody likes, would be to keep it around, deprecate it in 3.1, and remove it in 3.2 or 3.3. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Sun, Dec 7, 2008 at 2:35 AM, Hagen Fürstenau <[EMAIL PROTECTED]> wrote: >>> As far as I can see all Python Unicode strings can be encoded to UTF-8, >>> even things like lone surrogates because Python doesn't care about them. >>> So both the Unicode API and the binary API would be fail-safe on Windows. >> >> Python is broken and needs to be fixed. >> >> http://bugs.python.org/issue3672 >> http://bugs.python.org/issue3297 > > But the question of whether Python should care about lone surrogates or > not is at best tangential to the issue at hand. If you have lone > surrogates in the Unicode API (and didn't raise an exception on the way > getting there), then the sensible thing is to encode them into lone > UTF-8 surrogates. Even if you wanted to prevent lone surrogates, > encoding to UTF-8 for the binary API would not be the place to enforce it. No. Unicode *requires* them to be treated as errors. If you want to pass them through then you're creating a custom encoding... which you might argue for in this case, but it needs to be clearly separate from the real UTF-8. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
[EMAIL PROTECTED] wrote:
>
> On 06:07 am, [EMAIL PROTECTED] wrote:
>> Most apps aren't file managers or ftp clients but when they interact
>> with files (for instance, a file selection dialog) they need to be able
>> to show the user all the relevant files. So on an app-by-app basis the
>> need for this is high.
>
> While I tend to agree emphatically with this, the *real* solution here
> is a path-abstraction library.
Why don't you send me some information offlist. I'm not sure I agree
that a path-abstraction library can work correctly but if it can it
would be nice to have that at a level higher than the file-dialog
libraries that I was envisioning.
[snip]
>> ... but that still
>> doesn't help me identify when someone would expect that asking python
>> for a list of all files in a directory or a specific set of files in a
>> directory should, without warning, return only a subset of them. In
>> what situations is this appropriate behaviour?
>
> If you say listdir(unicode) on a POSIX OS, your program is saying "I
> only know how to deal with unicode results from this function, so please
> only give me those.".
No. (explained below)
> If your program is smart enough to deal with
> bytes, then you would have asked for bytes, no?
Yes (explained below)
> Returning only
> filenames which can be properly decoded makes sense. Otherwise everyone
> needs to learn about this highly confusing issue, even for the simplest
> scripts.
>
os.listdir(unicode) (currently) means that the *programmer* is asking
that the stdlib return the decodable filenames from this directory. The
question is whether the programmer understood that this is what they
were asking for and whether it is what they most likely want. I would
make the following statements WRT to this:
1) The programmer most likely does not want decodable filenames and only
decodable filename. If they were, we'd see a lot of python2.x code that
turns pathnames into unicode and discards everything that wasn't
decodable. No one has given a use case for finding only the *decodable*
subset of files. If I request to see all *.py files in a directory, I
want to see all of the *.py files in the directory, decodable or not.
If you can show how programmers intend "90%" of their calls to
os.listdir()/glob.glob('*.txt') to show only the decodable subset of the
results, then the foundation of my arguments is gone. So please, give
examples to prove this wrong.
- If this is true, a definition of os.listdir() that would
better meet programmer expectation would be: "Give me all files in a
directory with the output as str type". The definition of
os.listdir() would be "Give me all files in a directory
with the output as bytes type". Raising an exception when the filenames
are undecodable is perfectly reasonable in this situation.
2) For the programmer to understand the difference between
os.listdir() and os.listdir() they have to
understand the "highly confusing issue" and what it means for their
code. So the current method is forcing programmers to understand it
even for the simplest scripts if their environment is not uniform with
no clue from the interpreter that there is an issue.
- Similarly, raising an exception on undecodable values means that the
programmer can ignore the issue in any scripts in sane environments and
will be told that they need to deal with it (via an exception) when
their script runs in a non-sane environment.
3) The usage of unicode vs bytes is easy to miss for someone starting
with py2.x or windows and moving to a multi-platform or unix project.
Even simple testing won't reveal the problem unless the programmer knows
that they have to test what happens when encodings are mixed. Once
again, this is requiring the programmer to understand the encoding issue
without help from the interpreter.
> Skipping undecodable values is good enough that it will work 90% of the
> time.
You and Guido have now made this claim to defend not raising an
exception but I still don't have a use case.
Here are use cases that I see:
* Bill is coding an application for use inside his company. His company
only uses utf-8. His code naively uses os.listdir().
- The code does not throw an exception whether we use the current
os.listdir() or one that could throw an exception because the system
admins have sanitised the environment. Bill did not need to understand
the implications of encoding for his code to work in this script whether
simple or complex.
* Mary is coding an application for use inside her company. It finds
all html files on a system and updates her company's copyright, privacy
policy, and other legal boilerplate. Her expectation is that after her
program runs every file will have been updated. Her environment is a
mixture of different filename encodings due to having many legacy
documents for users in different locales. Mary's code also naively uses
os.listdir(). Her test case checks that the code does the
right thing on m
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Sun, Dec 7, 2008 at 11:35, Adam Olsen <[EMAIL PROTECTED]> wrote: >>> http://bugs.python.org/issue3672 >>> http://bugs.python.org/issue3297 > > No. Unicode *requires* them to be treated as errors. If you want to > pass them through then you're creating a custom encoding... which you > might argue for in this case, but it needs to be clearly separate from > the real UTF-8. I suspect it is a common and convenient but (according to what you say) misconceived expectation that using UTF-8 to encode any Unicode string will not raise an exception. This behavior is not something which should be discarded lightly. I see little reason that this couldn't be a new codec or error handler that allowed people to choose between correct pure UTF-8 behavior or the technically incorrect but very practical behavior it currently has. [My apologies, Adam, for sending this only to you the first time] -- Michael Urman ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Sun, Dec 7, 2008 at 11:18 AM, Michael Urman <[EMAIL PROTECTED]> wrote: > On Sun, Dec 7, 2008 at 11:35, Adam Olsen <[EMAIL PROTECTED]> wrote: http://bugs.python.org/issue3672 http://bugs.python.org/issue3297 >> >> No. Unicode *requires* them to be treated as errors. If you want to >> pass them through then you're creating a custom encoding... which you >> might argue for in this case, but it needs to be clearly separate from >> the real UTF-8. > > I suspect it is a common and convenient but (according to what you > say) misconceived expectation that using UTF-8 to encode any Unicode > string will not raise an exception. This behavior is not something > which should be discarded lightly. It is *not* a valid Unicode string in the first place. Therein lies the problem. > I see little reason that this couldn't be a new codec or error handler > that allowed people to choose between correct pure UTF-8 behavior or > the technically incorrect but very practical behavior it currently > has. Note that many of the restrictions were added for security reasons. You might receive a UTF-8 encoded file name from a malicious user, check if it contains something dangerous (like "../../../../../etc/password"), then decode it. If your decoder isn't compliant (ie doesn't check for overly long sequences) then a b'\xC0\xAF' gets translated into u'/', bypassing your previous check. However, in this context we only need to allow lone surrogates. CESU-8 comes to mind. (It is a perverse world we live in.) -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "as" keyword woes
On Sat Dec 6 21:29:09 CET 2008, Guido van Rossum wrote: > > On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano > wrote: > > As someone somewhat knowledgable of how parsers work, I do not > > understand why a method/attribute name "object_name.as(...)" must > > necessarily conflict with a standalone keyword " as ". It seems to me > > that it should be possible to unambiguously separate the two without > > ambiguity or undue complication of the parser. > > That's possible with sufficiently powerful parser technology, but > that's not how the Python parser (and most parsers, in my experience) > treat reserved words. Reserved words are reserved in all contexts, > regardless of whether ambiguity could arise. Just a quick aside from someone who merely lurks on this list: in SQL, it's quite possible to use keywords in a fashion similar to that desired by the inquirer, and it's actually possible to double-quote keywords and use them as names for things. I'm not advocating more complicated parsing technology for any Python implementation, but I think it's pertinent to point out that the technology isn't particularly obscure. Apologies for the interruption, Paul ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
> There's clearly an argument of timeliness there, which > is why we'd like to get this fixed ASAP. I think it is still timely when fixed in January or February. In fact, releasing it still in December might not be possible, due to the limited time available. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Toshio Kuratomi wrote: - If this is true, a definition of os.listdir() that would better meet programmer expectation would be: "Give me all files in a directory with the output as str type". The definition of os.listdir() would be "Give me all files in a directory with the output as bytes type". Raising an exception when the filenames are undecodable is perfectly reasonable in this situation. Your examples (snipped) pretty well convince me that there is a use case for raising exceptions. We should move beyond arguing over which one way is right. I think there should be a second argument 'ignorebad=False' to ignore undecodable files rather than raise the exception (or 'strict=True' to stop and raise exception on non-decodable names -- then code is 'if strict: raise ...'). I believe other functions have a similar parameter. tjr ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: > Toshio Kuratomi wrote: > >> - If this is true, a definition of os.listdir() that would >> better meet programmer expectation would be: "Give me all files in a >> directory with the output as str type". The definition of >> os.listdir() would be "Give me all files in a directory >> with the output as bytes type". Raising an exception when the filenames >> are undecodable is perfectly reasonable in this situation. > > Your examples (snipped) pretty well convince me that there is a use case for > raising exceptions. We should move beyond arguing over which one way is > right. I think there should be a second argument 'ignorebad=False' to > ignore undecodable files rather than raise the exception (or 'strict=True' > to stop and raise exception on non-decodable names -- then code is 'if > strict: raise ...'). I believe other functions have a similar parameter. If you want the exceptions, just use the bytes API and try to decode the byte strings using the system encoding. My problem with raising exceptions *by default* when an undecodable name exists is that it may render an app completely useless in a situation where the developer is no longer around. This happened all the time with the 2.x Unicode API, where the developer hadn't anticipated a particular input potentially containing non-ASCII bytes, and the user fed the application non-ASCII text. Making os.listdir raise an exception when a directory contains a single undecodable file means that the entire directory can't be read, and most likely the entire app crashes at that point. Most likely the developer never anticipated this situation (since in most places it is either impossible or very unlikely) -- after all, if they had anticipated it they would have used the bytes API in the first place. (It's worse because the exception being raised would be UnicodeError -- most people expect os.listdir to raise OSError, not other errors.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Nonlocal shortcut
Hi, I'm currently implementing a parser to handle Python 3.0, and one of the points I found conflicting with the grammar specification is the PEP 3104. It says that a shortcut would be added to Python 3.0 so that "nonlocal x = 0" can be written. However, the latest grammar specification (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) doesn't seem to take that into account... So, can someone enlighten me on what should be the correct treatment for that on a grammar that wants to support Python 3.0? Thanks, Fabio ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Terry Reedy wrote: > Toshio Kuratomi wrote: > >> - If this is true, a definition of os.listdir() that would >> better meet programmer expectation would be: "Give me all files in a >> directory with the output as str type". The definition of >> os.listdir() would be "Give me all files in a directory >> with the output as bytes type". Raising an exception when the filenames >> are undecodable is perfectly reasonable in this situation. > > Your examples (snipped) pretty well convince me that there is a use case > for raising exceptions. We should move beyond arguing over which one > way is right. I think there should be a second argument > 'ignorebad=False' to ignore undecodable files rather than raise the > exception (or 'strict=True' to stop and raise exception on non-decodable > names -- then code is 'if strict: raise ...'). I believe other > functions have a similar parameter. If we were going to do anything like that for os.listdir() and other filesystem APIs (like glob) that return multiple paths, we'd probably be best advised to just have a normal Unicode 'errors' parameter which allowed: 'strict' - raise an Exception for malformed binary data 'replace' - insert '?' or some other symbol in place of malformed binary data 'ignore' - simply leave out the malformed binary data 'skip' - run the underlying codec in strict mode, but skip over any items which raise UnicodeDecodeError (default/current Py3k behaviour) Obviously, 'skip' doesn't make any sense for APIs like getcwd() that return a single value - a case could be made for those defaulting to either replace or strict. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Nonlocal shortcut
Hello, Fabio Zadrozny wrote: > Hi, > > I'm currently implementing a parser to handle Python 3.0, and one of > the points I found conflicting with the grammar specification is the > PEP 3104. > > It says that a shortcut would be added to Python 3.0 so that "nonlocal > x = 0" can be written. However, the latest grammar specification > (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) > doesn't seem to take that into account... So, can someone enlighten me > on what should be the correct treatment for that on a grammar that > wants to support Python 3.0? An issue was already filed about this: http://bugs.python.org/issue4199 It should be ready for inclusion in 3.0.1. -- Amaury Forgeot d'Arc ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Nick Coghlan wrote: For binary wrappers around the Windows Unicode APIs, I was thinking specifically of using UTF-8, since that should be able to encode anything the Unicode APIs can handle. Why shouldn't the binary interface just expose the raw utf16 as bytes? -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Guido van Rossum wrote: On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: Toshio Kuratomi wrote: - If this is true, a definition of os.listdir() that would better meet programmer expectation would be: "Give me all files in a directory with the output as str type". The definition of os.listdir() would be "Give me all files in a directory with the output as bytes type". Raising an exception when the filenames are undecodable is perfectly reasonable in this situation. Your examples (snipped) pretty well convince me that there is a use case for raising exceptions. We should move beyond arguing over which one way is right. I think there should be a second argument 'ignorebad=False' to ignore undecodable files rather than raise the exception (or 'strict=True' to stop and raise exception on non-decodable names -- then code is 'if strict: raise ...'). I believe other functions have a similar parameter. I was thinking of the "normal Unicode 'errors' parameter", as described by Nick. If you want the exceptions, just use the bytes API and try to decode the byte strings using the system encoding. If it was a matter of adding a new method, I might agree. But: 1. We already have a method that does exactly what you describe. It is only a matter of adding flexibility to the response to problems, for which there is already precedent. 2. Suggesting that people who want strings and not bytes should have to deal with bytes, just to get an error notification, seems to negate that point of moving to 3.0 3. A builtin would probably do so better than most programmers would, with little touches such as the one suggested below. 4. An error parameter would ALERT programmers to the possibility of a PROBLEM, both in the present and future. As you say below, people need to better anticipate the future. My problem with raising exceptions *by default* when an undecodable name exists is that it may render an app completely useless in a situation where the developer is no longer around. This happened all the time with the 2.x Unicode API, where the developer hadn't anticipated a particular input potentially containing non-ASCII bytes, and the user fed the application non-ASCII text. Making os.listdir raise an exception when a directory contains a single undecodable file means that the entire directory can't be read, and most likely the entire app crashes at that point. Most likely the developer never anticipated this situation (since in most places it is either impossible or very unlikely) -- after all, if they had anticipated it they would have used the bytes API in the first place. (It's worse because the exception being raised would be UnicodeError -- most people expect os.listdir to raise OSError, not other errors.) This to be is an argument for keeping the default the current behavior, but not for rejecting flexibility. The computing world seems to be messier than we would like and worse that I realized until this week. As you say below, people need to better anticipate the future, and an errors parameter would help do that. Is Windows really immune? What about when it reads the directory of possibly old removable media with whatever byte name encodings? Is this a possible source of 'unanticipated' problems? As to your last sentence, os.listdir() with an errors parameter could convert a decoding UnicodeError to "OSError: undecodable file name ", thereby supplying the expected exception as well as an extractable representation of problematical the raw bytes Here is a possible use case: I want filenames as 3.0 strings and I anticipate no problems at present but, as you say above, something might happen years in the future. I am using 3.0 *because* of the strings == unicode feature. I would like to write try: files = os.listdir(somedir, errors = strict) except OSError as e: log() files = os.listdir(somedir) and go one without the problem file but not without logging the problem so a future maintainer can consider what to do about it, but only when there is an actual need to think about it. Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Nonlocal shortcut
Fabio Zadrozny wrote: Hi, I'm currently implementing a parser to handle Python 3.0, and one of the points I found conflicting with the grammar specification is the PEP 3104. It says that a shortcut would be added to Python 3.0 so that "nonlocal x = 0" can be written. As near as I can tell from testing, that did not happen. The PEP needs revision to delete that or push it to a later version. > However, the latest grammar specification (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) doesn't seem to take that into account... So, can someone enlighten me on what should be the correct treatment for that on a grammar that wants to support Python 3.0? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
Martin v. Löwis wrote: I think it is still timely when fixed in January or February. In fact, releasing it still in December might not be possible, due to the limited time available. The cmp() / PyObject_Compare() removal patch is almost done. With some help I can finish it until Tuesday evening. We can have another release by Monday Dec 15th. Python 3.0.0 has some defects that should be fixed before people are spending their Xmas holidays with 3.0. The defects include * cmp(), PyObject_Compare() and frieds * global/nonlocal shortcuts (global x = 0) aren't working * unnecessary slowdown of read() due slow buffer resizing. An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases again. If we release it now we can have an combined release of 2.6.2 and 3.0.2 in two months from now. Two months are quite some time to fix the performance issue of the new IO library. If Guido and Barry are fine with a lax policy on performance fixes we can integrate more tweaks. I believe performances patches were considered as features in the past. For this reason they weren't allowed for minor releases. Mark's work on long integer optimizations and json speedup are good candidates. Christian ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
On Sun, Dec 7, 2008 at 6:05 PM, Christian Heimes <[EMAIL PROTECTED]> wrote: > Martin v. Löwis wrote: >> >> I think it is still timely when fixed in January or February. >> In fact, releasing it still in December might not be possible, >> due to the limited time available. > > The cmp() / PyObject_Compare() removal patch is almost done. With some help > I can finish it until Tuesday evening. We can have another release by Monday > Dec 15th. Python 3.0.0 has some defects that should be fixed before people > are spending their Xmas holidays with 3.0. The defects include > > * cmp(), PyObject_Compare() and frieds > * global/nonlocal shortcuts (global x = 0) aren't working I have a patch for this [1], but I don't think this should be considered a release blocker or even backported to 3.0. It's merely a convenience feature and doesn't inhibit the usefulness of the PEP in any way. > * unnecessary slowdown of read() due slow buffer resizing. -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
Benjamin Peterson wrote: I have a patch for this [1], but I don't think this should be considered a release blocker or even backported to 3.0. It's merely a convenience feature and doesn't inhibit the usefulness of the PEP in any way. Amaury said: An issue was already filed about this: http://bugs.python.org/issue4199 It should be ready for inclusion in 3.0.1. I'm +0 for the patch. Given the nature of Python 3.0 I'm fine with getting it right. Christian ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Dec 7, 2008, at 7:05 PM, Christian Heimes wrote: Martin v. Löwis wrote: I think it is still timely when fixed in January or February. In fact, releasing it still in December might not be possible, due to the limited time available. The cmp() / PyObject_Compare() removal patch is almost done. With some help I can finish it until Tuesday evening. We can have another release by Monday Dec 15th. Python 3.0.0 has some defects that should be fixed before people are spending their Xmas holidays with 3.0. The defects include * cmp(), PyObject_Compare() and frieds * global/nonlocal shortcuts (global x = 0) aren't working * unnecessary slowdown of read() due slow buffer resizing. An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases again. If we release it now we can have an combined release of 2.6.2 and 3.0.2 in two months from now. Two months are quite some time to fix the performance issue of the new IO library. If Guido and Barry are fine with a lax policy on performance fixes we can integrate more tweaks. I believe performances patches were considered as features in the past. For this reason they weren't allowed for minor releases. Mark's work on long integer optimizations and json speedup are good candidates. I'm personally okay with performance fixes in point releases, as long it doesn't change API or add additional features. - -Barry -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTxv5XEjvBPtnXfVAQIu6AQAkxyGwhapcREx5/E3yHUf8lWvM4lh/FdR AfHwwp7hs+yX8rR05CWAUfllY9dHcHKHvBCwTCgfuIrc4GJWbJHcx9/b19GTpzre 7fcikjQ0sk6zUq85DiJah7qL5AkA6Jmiby+rol7iudHlmQO/+6F6+aeL+vSKG8IC vYbLILAFapI= =ScYg -END PGP SIGNATURE- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
Barry Warsaw wrote: I'm personally okay with performance fixes in point releases, as long it doesn't change API or add additional features. Does your okay include or exclude new internal APIs like new helper functions or a new C modules? Christian ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Nonlocal shortcut
>> I'm currently implementing a parser to handle Python 3.0, and one of >> the points I found conflicting with the grammar specification is the >> PEP 3104. >> >> It says that a shortcut would be added to Python 3.0 so that "nonlocal >> x = 0" can be written. However, the latest grammar specification >> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) >> doesn't seem to take that into account... So, can someone enlighten me >> on what should be the correct treatment for that on a grammar that >> wants to support Python 3.0? > > An issue was already filed about this: > http://bugs.python.org/issue4199 > It should be ready for inclusion in 3.0.1. > Thanks for pointing that out. Fabio ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On approximately 12/7/2008 10:56 AM, came the following characters from the keyboard of Adam Olsen: You might receive a UTF-8 encoded file name from a malicious user, check if it contains something dangerous (like "../../../../../etc/password"), then decode it. If your decoder isn't compliant (ie doesn't check for overly long sequences) then a b'\xC0\xAF' gets translated into u'/', bypassing your previous check. You might indeed. But if you are interested in checking for security issues, shouldn't you _first_ decode into some canonical form, specifying what sorts of Unicode strictness (such as overlong sequences) to check for during the decode process, and once the string is in canonical form, _then_ do checks for various attacks, such as the ../ sequence you mention? And with that order of operation, even if you don't reject overlong sequences, you have canonized them, and can recognize the resulting characters as good or bad. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RELEASED Python 3.0 final
[EMAIL PROTECTED] writes: > But still, you can't honestly expect me to recommend 3.0 until someone > has gotten at least a basic skeleton of Twisted up and running under it > :). My own attempts to do so have failed miserably, to the point where > I can't even produce a useful bug report without a lot more work. How about an issue in the Python tracker---or the Twisted one, with a xref from the Python tracker to the Twisted tracker where the work will be done---that says "Twisted wants to be ported but we don't have enough developers, please help"? Maybe with some encouraging statement about how you can provide X amount of advice. In general, maybe there should be some sort of (semi-)formal process for proposing ports of libraries and coordinating work on them. Even just a focal point for where to make such requests, and a way to saerch for them so you can find others with similar interests. > I don't think there's anything about the 3.0 language which > couldn't be supported in a VM that understood both 2 and 3. Strings vs. bytes. It can't do both 2-style "bytes are text" and 3-style "no way are bytes text" simultaneously AFAICS. > I also don't think 3.0 is perfect, and five years on, there will be > a temptation to make more "just this once" incompatible changes. > Of course, you've promised these changes won't be made, and *this* > set of design mistakes will be with us forever. For values of "forever" approximating ten years. > It would be nice if there were a way for evolution to continue > without another reboot of the world. Stephen J. Gould says not. I think Java is a very different case from Python. It is the product of a language evolution that goes back to the early 1970s or so, and the standardization effort was carefully shepherded by a powerful company which provided resources to ensure that things went its way. For that reason, I think it's a remarkable compliment to Python and to Python 3 in particular that you consider Java an appropriate standard of comparison for Python. There's also the danger of stasis. I think Lisp will never die, and Common Lisp has done a good job of avoiding reboots. But for precisely that reason there continues to be a lively evolution of seriously incompatible dialects, both Lisp-1 (Scheme) and Lisp-2. I see Python 3 as an attempt to bridle and ride this tiger, without turning the rope into a noose and strangling the beast. > >If they're that easily convinced that Java is better they probably > >were a lost cause anyway, so I won't mourn their departure too much. > > I really believe that *all* new users are fickle, if they don't have a > mandate as to what they need to be learning. Personally, I learned > Python because of a memory leak in Swing. Sure, but what Guido is saying, I think, is that as long as prominent Python developers don't announce its funeral, the other things we could do to encourage them are going to get lost in the noise of inherent fickleness. Which isn't just random, it depends on things like availability of just the right library for one's app, etc. But there are too many of those to do them all, or even just to list them up and try to prioritize them "objectively"---might as well be random. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Glenn Linderman writes: > But if you are interested in checking for security issues, shouldn't you > _first_ decode into some canonical form, Yes. That's all that is being asked for: that Python do strict decoding to a canonical form by default. That's a lot to ask, as it turns out, but that is what we (the minority of strict Unicode adherents, that is) want. If you want the convenience and risk, I believe you should ask for it by name (I suggest a name like "own_me" for the relaxed decoding flag). Failing that, it would be nice to have a global flag to change the default. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.0.1 possibilities
>> I think it is still timely when fixed in January or February. >> In fact, releasing it still in December might not be possible, >> due to the limited time available. > > The cmp() / PyObject_Compare() removal patch is almost done. I wasn't (primarily) talking about fixing this particular issue. Time needs to be made available also for the upcoming 2.4.6 and 2.5.3 releases (which should, IMO, get priority over a 3.0 bugfix release at this point) > With some > help I can finish it until Tuesday evening. We can have another release > by Monday Dec 15th. Python 3.0.0 has some defects that should be fixed > before people are spending their Xmas holidays with 3.0. The defects > include > > * cmp(), PyObject_Compare() and frieds > * global/nonlocal shortcuts (global x = 0) aren't working > * unnecessary slowdown of read() due slow buffer resizing. I think 3.0.1 should also address other serious bugs in 3.0, such as - various IDLE bugs with non-ASCII characters (2827, 4008, 4323, 4410) - various ways to crash Python through the buffer protocol (4583, 4509; also 4580) > An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases > again. IIUC, you want the bugfix version number to be sync'ed. I don't think that is a useful thing to have. > If Guido and Barry are fine with a lax policy on performance fixes we > can integrate more tweaks. I believe performances patches were > considered as features in the past. For this reason they weren't allowed > for minor releases. Mark's work on long integer optimizations and json > speedup are good candidates. I don't recall such policy, and I can't see anything wrong with including performance fixes in a bug fix release. Maybe you were confusing this with whether performance fixes can be considered release-critical (which they shouldn't, IMO)? Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On approximately 12/7/2008 8:13 PM, came the following characters from the keyboard of Stephen J. Turnbull: Glenn Linderman writes: > But if you are interested in checking for security issues, shouldn't you > _first_ decode into some canonical form, Yes. That's all that is being asked for: that Python do strict decoding to a canonical form by default. That's a lot to ask, as it turns out, but that is what we (the minority of strict Unicode adherents, that is) want. I have no problem with having strict validation available. But doesn't validation take significantly longer than decoding? So I think it should be logically decoupled... do validation when/where it is needed for security reasons, and allow internal [de]coding to be faster. I'm mostly indifferent about which should be the default... maybe there shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and the "fUTF-8" decoder for the faster, non-validating version. Or something like that. With appropriate documentation. Of course, "UTF-8" already exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... but it could be deprecated. You didn't address the issue that if the decoding to a canonical form is done first, many of the insecurities just go away, so why throw errors? -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman <[EMAIL PROTECTED]> wrote: > On approximately 12/7/2008 8:13 PM, came the following characters from the > keyboard of Stephen J. Turnbull: >> >> Glenn Linderman writes: >> >> > But if you are interested in checking for security issues, shouldn't >> you > _first_ decode into some canonical form, >> >> Yes. That's all that is being asked for: that Python do strict >> decoding to a canonical form by default. That's a lot to ask, as it >> turns out, but that is what we (the minority of strict Unicode >> adherents, that is) want. > > > I have no problem with having strict validation available. But doesn't > validation take significantly longer than decoding? So I think it should be > logically decoupled... do validation when/where it is needed for security > reasons, and allow internal [de]coding to be faster. I'd like to see benchmarks of such a claim. > I'm mostly indifferent about which should be the default... maybe there > shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and > the "fUTF-8" decoder for the faster, non-validating version. Or something > like that. With appropriate documentation. Of course, "UTF-8" already > exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... > but it could be deprecated. > > > You didn't address the issue that if the decoding to a canonical form is > done first, many of the insecurities just go away, so why throw errors? Unicode is intended to allow interaction between various bits of software. It may be that a library checked it in UTF-8, then passed it to python. It would be nice if the library validated too, but a major advantage of UTF-8 is older libraries (or protocols!) intended for ASCII need only be 8-bit clean to be repurposed for UTF-8. Their security checks continue to work, so long as nobody down stream introduces problems with a non-validating decoder. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On approximately 12/7/2008 9:11 PM, came the following characters from the keyboard of Adam Olsen: On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman <[EMAIL PROTECTED]> wrote: On approximately 12/7/2008 8:13 PM, came the following characters from the keyboard of Stephen J. Turnbull: Glenn Linderman writes: > But if you are interested in checking for security issues, shouldn't you > _first_ decode into some canonical form, Yes. That's all that is being asked for: that Python do strict decoding to a canonical form by default. That's a lot to ask, as it turns out, but that is what we (the minority of strict Unicode adherents, that is) want. I have no problem with having strict validation available. But doesn't validation take significantly longer than decoding? So I think it should be logically decoupled... do validation when/where it is needed for security reasons, and allow internal [de]coding to be faster. I'd like to see benchmarks of such a claim. "significantly" seems to be the only word at question; it seems that there are a fair number of validation checks that could be performed; the numeric part of UTF-8 decoding is just a sequence of shifts, masks, and ORs, so can be coded pretty tightly in C or assembly language. Anything extra would be slower; how much slower is hard to predict prior to the implementation. My "significantly" was just the expectation that the larger code with more conditional branches that is required for validation is less likely to stay in cache, and take longer to load into cache, and take longer to execute. This also seems to be supported by Stephen's comment "That's a lot to ask, as it turns out." Once upon a time I did write an unvalidated UTF-8 encoder/decoder in C, I wonder if I could find that code? Can you supply a validated decoder? Then we could run some benchmarks, eh? I'm mostly indifferent about which should be the default... maybe there shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and the "fUTF-8" decoder for the faster, non-validating version. Or something like that. With appropriate documentation. Of course, "UTF-8" already exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... but it could be deprecated. You didn't address the issue that if the decoding to a canonical form is done first, many of the insecurities just go away, so why throw errors? Unicode is intended to allow interaction between various bits of software. It may be that a library checked it in UTF-8, then passed it to python. It would be nice if the library validated too, but a major advantage of UTF-8 is older libraries (or protocols!) intended for ASCII need only be 8-bit clean to be repurposed for UTF-8. Their security checks continue to work, so long as nobody down stream introduces problems with a non-validating decoder. So I don't understand how this is responsive to the "decoding removes many insecurities" issue? Yes, you might use libraries. Either they have insecurities, or not. Either they validate, or not. Either they decode, or not. They may be immune to certain attacks, because of their structure and code, or not. So when you examine a library for potential use, you have documentation or code to help you set your expectations about what it does, and whether or not it may have vulnerabilities, and whether or not those vulnerabilities are likely or unlikely, whether you can reduce the likelihood or prevent the vulnerabilities by wrapping the API, etc. And so you choose to use the library, or not. This whole discussion about libraries seems somewhat irrelevant to the question at hand, although it is certainly true that understanding how a library handles Unicode is an important issue for the potential user of a library. So how does a non-validating decoder introduce problems? I can see that it might not solve all problems, but how does it introduce problems? Wouldn't the problems be introduced by something else, and the use of a non-validating decoder may not catch the problem... but not be the cause of the problem? And then, if you would like to address the original issue, that would be fine too. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Sun, Dec 7, 2008 at 11:04 PM, Glenn Linderman <[EMAIL PROTECTED]> wrote: > On approximately 12/7/2008 9:11 PM, came the following characters from the > keyboard of Adam Olsen: >> On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman <[EMAIL PROTECTED]> >> wrote: > > Once upon a time I did write an unvalidated UTF-8 encoder/decoder in C, I > wonder if I could find that code? Can you supply a validated decoder? Then > we could run some benchmarks, eh? There is no point for me, as the behaviour of a real UTF-8 codec is clear. It is you who needs to justify a second non-standard UTF-8-ish codec. See below. >>> You didn't address the issue that if the decoding to a canonical form is >>> done first, many of the insecurities just go away, so why throw errors? >> >> Unicode is intended to allow interaction between various bits of >> software. It may be that a library checked it in UTF-8, then passed >> it to python. It would be nice if the library validated too, but a >> major advantage of UTF-8 is older libraries (or protocols!) intended >> for ASCII need only be 8-bit clean to be repurposed for UTF-8. Their >> security checks continue to work, so long as nobody down stream >> introduces problems with a non-validating decoder. > > > So I don't understand how this is responsive to the "decoding removes many > insecurities" issue? > > Yes, you might use libraries. Either they have insecurities, or not. Either > they validate, or not. Either they decode, or not. They may be immune to > certain attacks, because of their structure and code, or not. > > So when you examine a library for potential use, you have documentation or > code to help you set your expectations about what it does, and whether or > not it may have vulnerabilities, and whether or not those vulnerabilities > are likely or unlikely, whether you can reduce the likelihood or prevent the > vulnerabilities by wrapping the API, etc. And so you choose to use the > library, or not. > > This whole discussion about libraries seems somewhat irrelevant to the > question at hand, although it is certainly true that understanding how a > library handles Unicode is an important issue for the potential user of a > library. > > So how does a non-validating decoder introduce problems? I can see that it > might not solve all problems, but how does it introduce problems? Wouldn't > the problems be introduced by something else, and the use of a > non-validating decoder may not catch the problem... but not be the cause of > the problem? > > And then, if you would like to address the original issue, that would be > fine too. Your non-validating encoder is translating an invalid sequence into a valid one, thus you are introducing the problem. A completely naive environment (8-bit clean ASCII) would leave it as an invalid sequence throughout. This is not a theoretical problem. See http://tools.ietf.org/html/rfc3629#section-10 . We MUST reject invalid sequences, or else we are not using UTF-8. There is no wiggle room, no debate. (The absoluteness is why the standard behaviour doesn't need a benchmark. You are essentially arguing that, when logging in as root over the internet, it's a lot faster if you use telnet rather than ssh. One is simply not an option.) -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
