Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
M.-A. Lemburg: 2) Return unicode when the text can not be represented in ASCII. This will cause a change of behaviour for existing code which deals with non-ASCII data. +1 on this one (s/ASCII/Python's default encoding). I assume you mean the result of sys.getdefaultencoding() here. Unless much of the Python library is modified to use the default encoding, this will break. The problem is that different implicit encodings are being used for reading data and for accessing files. When calling a function, such as open, with a byte string, Python passes that byte string through to Windows which interprets it as being encoded in CP_ACP. When this differs from sys.getdefaultencoding() there will be a mismatch. Say I have been working on a machine set up for Australian English (or other Western European locale) but am working with Russian data so have set Python's default encoding to cp1251. With this simple script, g.py: import sys print file(sys.argv[1]).read() I process a file called '€.txt' with contents European Euro to produce C:\zedpython_d g.py €.txt European Euro With the proposed modification, sys.argv[1] u'\u20ac.txt' is converted through cp1251 to '\x88.txt' as the Euro is located at 0x88 in CP1251. The operating system is then asked to open '\x88.txt' which it interprets through CP_ACP to be u'\u02c6.txt' ('ˆ.txt') which then fails. If you are very unlucky there will be a file called 'ˆ.txt' so the call will succeed and produce bad data. Simulating with str(sys.argvu[1]): C:\zedpython_d g.py €.txt Traceback (most recent call last): File g.py, line 2, in ? print file(str(sys.argvu[1])).read() IOError: [Errno 2] No such file or directory: '\x88.txt' -1: code pages are evil and the reason why Unicode was invented in the first place. This would be a step back in history. Features used to specify files (sys.argv, os.environ, ...) should match functions used to open and perform other operations with files as they do currently. This means their encodings should match. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [C++-sig] GCC version compatibility
On Tue, Jul 12, 2005 at 01:07:56AM +0200, Martin v. Löwis wrote: Christoph Ludwig wrote: Yes, but on ELF/Linux the default configuration should be --without-cxx in the first place. If the build instructions make it sufficiently clear that you should prefer this configuration whenever possible then this should be a non-issue on platforms like ELF/Linux. Some users will complain about this. Specifying --without-cxx also causes configure not to look for a C++ compiler, meaning that distutils won't know what the C++ compiler is, meaning that it will link extension modules with the C compiler instead. If I understood Dave Abraham's reply somewhere above in this thread correctly then you can build different C++ extension modules with different C++ compilers on ELF/Linux. (I don't have the time right now to actually try it, sorry.) There is no need to fix the C++ compiler as soon as python is built. If distutils builds C++ extensions with the C compiler then I consider this a bug in distutils because it is unlikely to work. (Unless the compiler can figure out from the source file suffixes in the compilation step *and* some info in the object files in the linking step that it is supposed to act like a C++ compiler. None of the compilers I am familiar with does the latter.) distutils should rather look for a C++ compiler in the PATH or explicitly ask the user to specify the command that calls the C++ compiler. It is different if --with-cxx=compiler was used. I agree that in this case distutils should use compiler to build C++ extensions. (distutils does not behave correctly when building C++ extensions anyway. It calls the C compiler to compile the C++ source files and passes options that gcc accepts only in C mode. The compiler version I am using is docile and only issues warnings. But these warnings are unnecessary and and I would not blame gcc if the next compiler release refused to compile C++ sources if the command line contains C specific options. But the distutils mailing list is a better place to bring this eventually up, I guess.) Regards Christoph -- http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/cludwig.html LiDIA: http://www.informatik.tu-darmstadt.de/TI/LiDIA/Welcome.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Hi Neil, 2) Return unicode when the text can not be represented in ASCII. This will cause a change of behaviour for existing code which deals with non-ASCII data. +1 on this one (s/ASCII/Python's default encoding). I assume you mean the result of sys.getdefaultencoding() here. Yes. The default encoding is the encoding that Python assumes when auto-converting a string to Unicode. It is normally set to ASCII, but a user may want to use a different encoding. However, we've always made it very clear that the user is on his own when chainging the ASCII default to something else. Unless much of the Python library is modified to use the default encoding, this will break. The problem is that different implicit encodings are being used for reading data and for accessing files. When calling a function, such as open, with a byte string, Python passes that byte string through to Windows which interprets it as being encoded in CP_ACP. When this differs from sys.getdefaultencoding() there will be a mismatch. As I said: code pages are evil :-) Say I have been working on a machine set up for Australian English (or other Western European locale) but am working with Russian data so have set Python's default encoding to cp1251. With this simple script, g.py: import sys print file(sys.argv[1]).read() I process a file called '€.txt' with contents European Euro to produce C:\zedpython_d g.py €.txt European Euro With the proposed modification, sys.argv[1] u'\u20ac.txt' is converted through cp1251 Actually, it is not: if you pass in a Unicode argument to one of the file I/O functions and the OS supports Unicode directly or at least provides the notion of a file system encoding, then the file I/O should use the Unicode APIs of the OS or convert the Unicode argument to the file system encoding. AFAIK, this is how posixmodule.c already works (more or less). I was suggesting that OS filename output APIs such as os.listdir() should return strings, if the filename matches the default encoding, and Unicode, if not. On input, file I/O APIs should accept both strings using the default encoding and Unicode. How these inputs are then converted to suit the OS is up to the OS abstraction layer, e.g. posixmodule.c. Note that the posixmodule currently does not recode string arguments: it simply passes them to the OS as-is, assuming that they are already encoded using the file system encoding. Changing this is easy, though: instead of using the et getargs format specifier, you'd have to use es. The latter recodes strings based on the default encoding assumption to whatever other encoding you specify. to '\x88.txt' as the Euro is located at 0x88 in CP1251. The operating system is then asked to open '\x88.txt' which it interprets through CP_ACP to be u'\u02c6.txt' ('ˆ.txt') which then fails. If you are very unlucky there will be a file called 'ˆ.txt' so the call will succeed and produce bad data. Simulating with str(sys.argvu[1]): C:\zedpython_d g.py €.txt Traceback (most recent call last): File g.py, line 2, in ? print file(str(sys.argvu[1])).read() IOError: [Errno 2] No such file or directory: '\x88.txt' See above: this is what I'd consider a bug in posixmodule.c -1: code pages are evil and the reason why Unicode was invented in the first place. This would be a step back in history. Features used to specify files (sys.argv, os.environ, ...) should match functions used to open and perform other operations with files as they do currently. This means their encodings should match. Right. However, most of these APIs currently either don't make any assumption on the strings contents and simply pass them around, or they assume that these strings use the file system encoding - which, like in the example you gave above, can be different from the default encoding. To untie this Gordian Knot, we should use strings and Unicode like they are supposed to be used (in the context of text data): * strings are fine for text data that is encoded using the default encoding * Unicode should be used for all text data that is not or cannot be encoded in the default encoding Later on in Py3k, all text data should be stored in Unicode and all binary data in some new binary type. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 12 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [C++-sig] GCC version compatibility
Christoph Ludwig wrote: If I understood Dave Abraham's reply somewhere above in this thread correctly then you can build different C++ extension modules with different C++ compilers on ELF/Linux. (I don't have the time right now to actually try it, sorry.) There is no need to fix the C++ compiler as soon as python is built. There is, somewhat: how do you know the name of the C++ compiler? If distutils builds C++ extensions with the C compiler then I consider this a bug in distutils because it is unlikely to work. (Unless the compiler can figure out from the source file suffixes in the compilation step *and* some info in the object files in the linking step that it is supposed to act like a C++ compiler. None of the compilers I am familiar with does the latter.) distutils should rather look for a C++ compiler in the PATH or explicitly ask the user to specify the command that calls the C++ compiler. How should it do that? The logic is quite involved, and currently, distutils relies on configure to figure it out. If you think this should be changed, please contribute a patch. (distutils does not behave correctly when building C++ extensions anyway. It calls the C compiler to compile the C++ source files and passes options that gcc accepts only in C mode. The compiler version I am using is docile and only issues warnings. But these warnings are unnecessary and and I would not blame gcc if the next compiler release refused to compile C++ sources if the command line contains C specific options. But the distutils mailing list is a better place to bring this eventually up, I guess.) The best way to bring this up is to contribute a patch. Bringing it up in the sense of sending an email message to some mailing list likely has no effect whatsoever. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Linux Python linking with G++?
Tim Peters [EMAIL PROTECTED] writes: [Michael Hudson] --with-fpectl, for example. Does anyone lurking here actually use that, know what it does and require the functionality? Inquiring minds want to know. I know what it intends to do: Surprise! fpectlmodule.c intends to enable the HW FPU divide-by-0, overflow, and invalid operation traps; if any of those traps trigger, raise the C-level SIGFPE signal; and convert SIGFPE to a Python-level FloatingPointError exception. The comments in pyfpe.h explain this best. But do you use it? I know what it intends to do too, but I don't use it. The questions I asked were in the order they were for a reason. Cheers, mwh -- cube If you are anal, and you love to be right all the time, C++ gives you a multitude of mostly untimportant details to fret about so you can feel good about yourself for getting them right, while missing the big picture entirely -- from Twisted.Quotes ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible context managers in stdlib
Skip Montanaro wrote: After seeing so many messages about with statements my eyes began to glaze over, so I stopped following that thread. Then I saw mention of context manager with no reference to any PEPs or to the with statement to provide context. The main outcome of the PEP 343 terminology discussion was some proposed documentation I put on the Sourceforge patch tracker ([1]). The patch is currently assigned to Raymond (since he started the terminology discussion) but any other reviews would be welcome. Since SF currently doesn't want to play, and the proposed documentation isn't that long, I've included the latest version below for anyone who wants to read it. None of the context-providing messages seemed to have been indexed by Google when I checked, so searching for Python context manager failed to return anything useful. Hence the post. Google appears to have spidered the list archives some time today, so anyone else doing the same search should get some relevant hits. Cheers, Nick. [1] http://www.python.org/sf/1234057 == With Statements and Context Management A frequent need in programming is to ensure a particular action is taken after a specific section of code has been executed (such as closing a file or releasing a lock). Traditionally, this is handled using 'try'/'finally' statements. However, that approach can lead to the reproduction of non-trivial amounts of boilerplate whenever the action needs to be invoked. A simpler way to achieve this in Python is to use the 'with' statement along with the appropriate context manager. Context managers define an action which is taken to enter the context and a second action to exit the context (usually restoring the environment that existed before the context was entered). The 'with' statement ensures that the context is entered and exited at the appropriate times (that is, before and after the execution of the suite contained in the 'with' statement). The precise behaviour of the 'with' statement is governed by the supplied context manager - an object which supports the context management protocol. This protocol consists of two methods: __enter__(self): Context managers use this method to enter the desired context before the execution of the contained suite. This method is called without arguments before execution of the contained suite starts. If the 'as' clause of the 'with' statement is used, the value returned from this method is assigned to the specified target. Many context managers will return self from this method, but returning a different object may make sense for some managers (for example, see the 'closing' suite manager described below). __exit__(self, exc_type, exc_value, exc_traceback): Context managers use this method to exit the context after execution of the contained suite. This method is called after execution of the contained suite is completed. If execution completed due to an exception, the details of that exception are passed as arguments. Otherwise, all three arguments are set to None. If exception details are passed in, and this method returns without incident, then the original exception continues to propagate. Otherwise, the exception raised by this method will replace the original exception. Using Contexts to Manage Resources The simplest use of context management is to strictly control the handling of key resources (such as files, generators, database connections, synchronisation locks). These resource managers will generally acquire the resource in their __enter__ method, although some resource managers may accept the resource to be managed as an argument to the constructor or acquire it during construction. Resource managers will then release the resource in their __exit__ method. For example, the following context manager allows prompt closure of any resource with a 'close' method (e.g. a generator or file): class closing(object): def __init__(self, resource): self.resource = resource def __enter__(self): return self.resource def __exit__(self, *exc_info): self.resource.close() with closing(my_generator()) as g: # my_generator() is assigned to g via call to __enter__() for item in g: print item # g is closed as the with statement ends Some resources (such as threading.Lock) support the context management protocol natively, allowing them to be used directly in 'with' statements. The meaning of the established context will depend on the specific resource. In the case of threading.Lock, the lock is acquired by the __enter__ method, and released by the __exit__ method. with the_lock: # Suite is executed with the_lock held # the_lock is released as the with statement ends More
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFEfor review)
M A Lemburg writes: we should use strings and Unicode like they are supposed to be used (in the context of text data): * strings are fine for text data that is encoded using the default encoding * Unicode should be used for all text data that is not or cannot be encoded in the default encoding Later on in Py3k, all text data should be stored in Unicode and all binary data in some new binary type. Wow. That is the most succinct and clear explanation of how to use unicode in Python that I think I've ever heard. It might even be simple enough for _me_ to understand it! I think I'm going to go frame this and have it posted in my cubical. -- Michael Chermside ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFEfor review)
QOTF candidate; should add that the default encoding is usually ASCII. On 7/12/05, Michael Chermside [EMAIL PROTECTED] wrote: M A Lemburg writes: we should use strings and Unicode like they are supposed to be used (in the context of text data): * strings are fine for text data that is encoded using the default encoding * Unicode should be used for all text data that is not or cannot be encoded in the default encoding Later on in Py3k, all text data should be stored in Unicode and all binary data in some new binary type. Wow. That is the most succinct and clear explanation of how to use unicode in Python that I think I've ever heard. It might even be simple enough for _me_ to understand it! I think I'm going to go frame this and have it posted in my cubical. -- Michael Chermside ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Linux Python linking with G++?
Nobody uses it. It should be ripped out. If someone disagrees, let them speak up. On 7/12/05, Michael Hudson [EMAIL PROTECTED] wrote: Tim Peters [EMAIL PROTECTED] writes: [Michael Hudson] --with-fpectl, for example. Does anyone lurking here actually use that, know what it does and require the functionality? Inquiring minds want to know. I know what it intends to do: Surprise! fpectlmodule.c intends to enable the HW FPU divide-by-0, overflow, and invalid operation traps; if any of those traps trigger, raise the C-level SIGFPE signal; and convert SIGFPE to a Python-level FloatingPointError exception. The comments in pyfpe.h explain this best. But do you use it? I know what it intends to do too, but I don't use it. The questions I asked were in the order they were for a reason. Cheers, mwh -- cube If you are anal, and you love to be right all the time, C++ gives you a multitude of mostly untimportant details to fret about so you can feel good about yourself for getting them right, while missing the big picture entirely -- from Twisted.Quotes ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [C++-sig] GCC version compatibility
On 7/12/05, Christoph Ludwig [EMAIL PROTECTED] wrote: If distutils builds C++ extensions with the C compiler then I consider this a bug in distutils because it is unlikely to work. (Unless the compiler can figure out from the source file suffixes in the compilation step *and* some info in the object files in the linking step that it is supposed to act like a C++ compiler. None of the compilers I am familiar with does the latter.) distutils should rather look for a C++ compiler in the PATH or explicitly ask the user to specify the command that calls the C++ compiler. You practically always have to use --compiler with distutils when building C++ extensions anyhow, and even then it rarely does what I would consider 'The Right Thing(tm)'. The problem is the distutils core assumption that you want to build extension modules with the same compiler options that you built Python with, is in many cases the wrong thing to do for C++ extension modules, even if you built Python with --with-cxx. This is even worse on windows where the MSVC compiler, until very recently, was crap for C++, and you really needed to use another compiler for C++, but Python was always built using MSVC (unless you jumped through hoops of fire). The problem is that this is much more complicated than it seems - you can't just ask the user for the C++ compiler, you really need to provide an abstraction layer for all of the compiler and linker flags, so that a user could specify what those flags are for their compiler of choice. Of course, once you've done that, the user might as well have just written a new Compiler class for distutils, which wouldn't pay any attention to how Python was built (other than where Python.h is). -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible context managers in stdlib
FWIW, I've updated PEP 343 to use @contextmanager and class ContextWrapper. Please proofread. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible context managers in stdlib
Nick Coghlan [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] The main outcome of the PEP 343 terminology discussion was some proposed documentation I put on the Sourceforge patch tracker ([1]). Is this a proposal for the Language Reference manual? [1] http://www.python.org/sf/1234057 == With Statements and Context Management A frequent need in programming ... A simpler way to achieve this in Python is to use the 'with' statement along with the appropriate context manager. Somewhere about here we need the syntax itself. Context managers define an... the contained suite starts. If the 'as' clause of the 'with' Else this does not mean much. ... The simplest use of context management is to strictly control the handling of key resources (such as files, generators, database connections, synchronisation locks). I have a little trouble seeing generators (as opposed to iterables) as resources needing management. For example, the following context manager allows prompt closure of any resource with a 'close' method (e.g. a generator or file): And I was not aware that they had close methods. You mean a iterable (not just a file) with both an associated generator and a close? Or are generators gaining close methods (which make no sense to me). Or are you using 'generator' in a different sense? class closing(object): def __init__(self, resource): self.resource = resource def __enter__(self): return self.resource def __exit__(self, *exc_info): self.resource.close() with closing(my_generator()) as g: # my_generator() is assigned to g via call to __enter__() for item in g: print item # g is closed as the with statement ends To me, this should be with closing(my_iterable())... with 'for' calling g.__iter__ to get the iterator that is possibly a generator. Otherwise, I don't understand it. The rest is pretty clear. Terry J. Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Terminology for PEP 343
Probably late in the game, esp. for an outsider, but I read the terminology discussion with interest. FWIW, I do like Philip's use of context, though I feel that it is a very generic word that may clash with many application-level classes... For that reason, I also liked scope a lot, though it was an... expension of that term's usual meaning beyond namespaces. Anyway, what really struck me all along is that, when reading the keyword with, I always felt that I would replace it with within, which imho fits the context/scope terminology better. Thus within a context, we do certain actions... which are fenced with __begincontext and __endcontext. (Oh, yes, fences... What was the original precise computer science meaning of that word, again?) Cheers, Marc-Antoine ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible context managers in stdlib
Terry Reedy wrote: Nick Coghlan [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] The main outcome of the PEP 343 terminology discussion was some proposed documentation I put on the Sourceforge patch tracker ([1]). Is this a proposal for the Language Reference manual? No - it's for an entry in the Library reference under 'built-in types', as a sibling to the current documentation of the iteration protocol. The 'with' statement itself would have to be documented along with the rest of the grammar. A simpler way to achieve this in Python is to use the 'with' statement along with the appropriate context manager. Somewhere about here we need the syntax itself. I'm not sure. We don't reproduce the 'for' loop syntax in the documentation of iterators, so should we reproduce the 'with' statement syntax in the documentation of context managers? Again, modelling on the existing documentation of the iteration protocol, I would expect the statement syntax to be introduced in the tutorial (e.g. as part of Section 8.6, where try/finally is introduced). Context managers define an... the contained suite starts. If the 'as' clause of the 'with' Else this does not mean much. ... The simplest use of context management is to strictly control the handling of key resources (such as files, generators, database connections, synchronisation locks). I have a little trouble seeing generators (as opposed to iterables) as resources needing management. PEP 342 adds this, in order to allow 'yield' inside tyr/finally blocks. For example, the following context manager allows prompt closure of any resource with a 'close' method (e.g. a generator or file): And I was not aware that they had close methods. You mean a iterable (not just a file) with both an associated generator and a close? Or are generators gaining close methods (which make no sense to me). Or are you using 'generator' in a different sense? Sorry - these docs assume PEP 342 has been implemented, so generator's have close() methods. I was trying to steer clear of files, since we don't know yet whether there is going to be an opening or closing implementation in the standard library, or whether files will become context managers. The latter is my preference, but Guido didn't seem too keen on the idea last time it was brought up. class closing(object): def __init__(self, resource): self.resource = resource def __enter__(self): return self.resource def __exit__(self, *exc_info): self.resource.close() with closing(my_generator()) as g: # my_generator() is assigned to g via call to __enter__() for item in g: print item # g is closed as the with statement ends To me, this should be with closing(my_iterable())... with 'for' calling g.__iter__ to get the iterator that is possibly a generator. Otherwise, I don't understand it. The rest is pretty clear. Terry J. Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40email.com -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Hi Marc-Andre, With the proposed modification, sys.argv[1] u'\u20ac.txt' is converted through cp1251 Actually, it is not: if you pass in a Unicode argument to one of the file I/O functions and the OS supports Unicode directly or at least provides the notion of a file system encoding, then the file I/O should use the Unicode APIs of the OS or convert the Unicode argument to the file system encoding. AFAIK, this is how posixmodule.c already works (more or less). Yes it is. The initial stage is reading the command line arguments. The proposed modification is to change behaviour when constructing sys.argv, os.environ or when calling os.listdir to Return unicode when the text can not be represented in Python's default encoding. I take this to mean that when the value can be represented in Python's default encoding then it is returned as a byte string in the default encoding. Therefore, for the example, the code that sets up sys.argv has to encode the unicode command line argument into cp1251. On input, file I/O APIs should accept both strings using the default encoding and Unicode. How these inputs are then converted to suit the OS is up to the OS abstraction layer, e.g. posixmodule.c. This looks to me to be insufficiently compatible with current behaviour whih accepts byte strings outside the default encoding. Existing code may call open(€.txt). This is perfectly legitimate current Python (with a coding declaration) as €.txt is a byte string and file systems will accept byte string names. Since the standard default encoding is ASCII, should such code raise UnicodeDecodeError? Changing this is easy, though: instead of using the et getargs format specifier, you'd have to use es. The latter recodes strings based on the default encoding assumption to whatever other encoding you specify. Don't you want to convert these into unicode rather than another byte string encoding? It looks to me as though the es format always produces byte strings and the only byte string format that can be passed to the operating system is the file system encoding which may not contain all the characters in the default encoding. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com