On Tue, Sep 30, 2008 at 12:22 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
> Victor Stinner schrieb:
>> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit :
>>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
>>> encoding (if it were UTF-8 otherwise), despite
Victor Stinner schrieb:
> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit :
>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
>> encoding (if it were UTF-8 otherwise), despite possible surprises when a
>> such-encoded filename escapes from Python.
>
> I
> Change the default file system encoding to store bytes in Unicode is like
> introducing a new Python type: .
Exactly. Seems like the best solution to me, despite your polemics.
Regards,
Martin
___
Python-3000 mailing list
Python-3000@python.org
http
On Mon, Sep 29, 2008 at 5:47 PM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> See attached patch: python3_bytes_filename.patch
>
Patches should go on the tracker, not the mailing list. Otherwise it
will just get lost.
-Brett
___
Python-3000 maili
James Y Knight writes:
> On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
> > UTF-8b doesn't work as intended. It produces an invalid unicode
> > object (garbage surrogates) that cannot be used with external APIs or
> > libraries that require unicode.
>
> I'd be interested to hear more detai
Guido van Rossum writes:
> On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
> <[EMAIL PROTECTED]> wrote:
> > It would be hard for a newbie programmer to understand why he's
> > unable to find his very important file ("important r?port.doc")
> > using os.listdir().
> *Every* failure in this s
Hi,
See attached patch: python3_bytes_filename.patch
Using the patch, you will get:
- open() support bytes
- listdir(unicode) -> only unicode, *skip* invalid filenames
(as asked by Guido)
- remove os.getcwdu()
- create os.getcwdb() -> bytes
- glob.glob() support bytes
- fnmatch.filter()
Le Tuesday 30 September 2008 01:31:45 Adam Olsen, vous avez écrit :
> The alternative is not be valid unicode, but since we can't use such
> objects with external libs, can't even print them, we might as well
> call them something else. We already have a name for that: bytes.
:-)
__
On Mon, Sep 29, 2008 at 5:29 PM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
>> >> - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about
> import os
> import os.path
> import sys
> if os.path.supports_unicode_filenames:
> cwd = getcwd()
> else:
> cwd = getcwdb()
> encoding = sys.getfilesystemencoding()
> for filename in os.listdir(cwd):
> if os.path.supports_unicode_filenames:
> text = str(fil
Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez écrit :
> If I had to choose, I'd still argue for the modified UTF-8 as filesystem
> encoding (if it were UTF-8 otherwise), despite possible surprises when a
> such-encoded filename escapes from Python.
If I understand correctly this sol
On Mon, Sep 29, 2008 at 5:33 PM, James Y Knight <[EMAIL PROTECTED]> wrote:
> On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
>>
>> An ugly hack, but more correct than UTF-8b or any similar attempt to
>> do "unicode but not quite unicode"; either it's lossy, or it's not
>> unicode. There's no in bet
On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
>> >> - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about
On Mon, Sep 29, 2008 at 5:31 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>>> ISTM that 8859-1 is all about decoding, so I don't understand why
>>> you say it is a way not to decode.
>>
>> 8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an
>> API that always returns unicode bu
On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
An ugly hack, but more correct than UTF-8b or any similar attempt to
do "unicode but not quite unicode"; either it's lossy, or it's not
unicode. There's no in between.
Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a
very poo
>> ISTM that 8859-1 is all about decoding, so I don't understand why
>> you say it is a way not to decode.
>
> 8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an
> API that always returns unicode but accepts an encoding you can use
> it, then reencode using 8859-1 to get back the
Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
> >> - listdir(unicode) -> unicode and raise an error on invalid filename
>
> I know I keep flipflopping on this one, but the more I think about it
> the more I believe it is better to drop those names than to raise an
> exce
On Mon, Sep 29, 2008 at 5:14 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
>> There's no solution except to not
>> decode, and 8859-1 is the way to do that.
>
> I think you need to elaborate that. What does ISO-8859-1 has to do
> with a Python datatype in this context: which
Adam Olsen wrote:
> There's no solution except to not
> decode, and 8859-1 is the way to do that.
I think you need to elaborate that. What does ISO-8859-1 has to do
with a Python datatype in this context: which datatype, and what
algorithm on it are you specifically referring to?
When I do (in 2.
> The default behaviour should be to use unicode and raise an error if
> conversion to unicode fails. It should also be possible to use bytes using
> bytes arguments and optional arguments (for getcwd).
I'm still opposed to allowing bytes as file names at all in 3k. Python
should really strive
On Sep 29, 2008, at 6:17 PM, Adam Olsen wrote:
I suspect linux will eventually take this route as well. If ext3 had
an option for UTF-8 validation I know I'd want it on. That'd move the
error to the program creating bogus file names, rather than those
trying to read, display, and manage them.
On Mon, Sep 29, 2008 at 11:06 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
>
>> This approach (changing all path-handling functions to accept either bytes
>> or string, but not both) is doomed in my eyes. First, there are
On Mon, Sep 29, 2008 at 10:00 AM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez écrit :
>> > - getcwd() -> unicode
>> > - getcwd(bytes=True) -> bytes
>>
>> Please let's not introduce boolean flags like this. How about
>> ``getcwdb`` in
On Mon, Sep 29, 2008 at 5:12 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> Adam Olsen gmail.com> writes:
>>
>> UTF-8b doesn't work as intended. It produces an invalid unicode
>> object (garbage surrogates) that cannot be used with external APIs or
>> libraries that require unicode.
>
> At least
> Victor Stinner schrieb:
(Thanks Victor for moving this to the list. Having a discussion in the
tracker is really painful, I find.)
>> POSIX OS
>>
>>
>> The default behaviour should be to use unicode and raise an error if
>> conversion to unicode fails. It should also be possible to use
Victor Stinner schrieb:
> POSIX OS
>
>
> The default behaviour should be to use unicode and raise an error if
> conversion to unicode fails. It should also be possible to use bytes using
> bytes arguments and optional arguments (for getcwd).
>
> - listdir(unicode) -> unicode and rais
On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <[EMAIL PROTECTED]>
wrote:
[1] UTF-8b has a similar property to 8859-1, in that all byte
strings can be
successfully round-tripped. It's not currently implemented in
python core,
but it's a pretty
Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez écrit :
> > - getcwd() -> unicode
> > - getcwd(bytes=True) -> bytes
>
> Please let's not introduce boolean flags like this. How about
> ``getcwdb`` in parallel with the old ``getcwdu``?
Yeah, you're right. So i wrote a new patch: os_
On Mon, Sep 29, 2008 at 6:07 AM, Victor Stinner
<[EMAIL PROTECTED]> wrote:
> The default behaviour should be to use unicode and raise an error if
> conversion to unicode fails. It should also be possible to use bytes using
> bytes arguments and optional arguments (for getcwd).
>
> - listdir(unicod
Patches are already avaible in the issue #3187 (os.listdir):
Le Monday 29 September 2008 14:07:55 Victor Stinner, vous avez écrit :
> - listdir(unicode) -> unicode and raise an error on invalid filename
Need raise_decoding_errors.patch (don't clear Unicode error
> - listdir(bytes) -> bytes
Al
Le Monday 29 September 2008 06:43:55, vous avez écrit :
> It will make users happy, and it's simple enough to implement for
> python 3.0.
I dislike your argument. A "quick and dirty hack" is always faster to
implement than a real solution, but we may hits later new issues if we don't
choose the
Hi,
After reading the previous discussion, here is new proposition.
Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX
(eg. Linux or *BSD) is affected.
Some system are broken, but Python have to be able to open/copy/move/remove
files with an "invalid filename".
The i
Adam Olsen gmail.com> writes:
>
> UTF-8b doesn't work as intended. It produces an invalid unicode
> object (garbage surrogates) that cannot be used with external APIs or
> libraries that require unicode.
At least it works with all Python operations supported by the unicode type
(methods, concat
On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight <[EMAIL PROTECTED]> wrote:
> [1] UTF-8b has a similar property to 8859-1, in that all byte strings can be
> successfully round-tripped. It's not currently implemented in python core,
> but it's a pretty trivial encoding, and is available under the BS
34 matches
Mail list logo