Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Martin v. Löwis
Nick Coghlan schrieb:
 So this is taking something that *already works properly on POSIX
 systems* and making it work on Windows as well.

I doubt it does without side effects. For example, an application that
would go through sys.path, and encode everything with
sys.getfilesystemencoding() currently works, but will break if the patch
is applied and non-mbcs strings are put on sys.path.

Also, what will be the effect on __file__? What value will it have
if the module originates from a sys.path entry that is a non-mbcs
unicode string? I haven't tested the patch, but it looks like
__file__ becomes a unicode string on Windows, and remains a byte
string encoded with the file system encoding elsewhere. That's also
a change in behavior.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 status

2006-09-09 Thread Brett Cannon
On 9/7/06, Neal Norwitz [EMAIL PROTECTED] wrote:
On 9/5/06, Brett Cannon [EMAIL PROTECTED] wrote:  [MAL]  The proper fix would be to introduce a tp_unicode slot and let  this decide what to do, ie. call .__unicode__() methods on instances
  and use the .__name__ on classes. That was my bug reactionand what I said on the bug report.Kind of surprised one doesn't already exist.  I think this would be the right way to go for Python 
2.6. For  Python 2.5, just dropping this .__unicode__ method on exceptions  is probably the right thing to do. Neal, do you want to rip it out or should I?Is removing __unicode__ backwards compatible with 
2.4 for bothinstances and exception classes?Does everyone agree this is the proper approach?I'm not familiarwith this code.Brett, if everyone agrees (ie, remains silent),please fix this and add tests and a NEWS entry.
Done. Even updated PEP 356 for you while I was at it. =)-Brett 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-09 Thread Nick Maclaren
I was hoping to have stopped, but here are a few comments.

I agree with Jan Kanis.  That is the way to tackle this one.

Adam Olsen [EMAIL PROTECTED] wrote:
 
 I don't think we should let this die, at least not yet.  Nick seems to
 be arguing that ANY signal handler is prone to random crashes or
 corruption (due to bugs).  However, we already have a signal handler,
 so we should already be exposed to the random crashes/corruption.

No.  I am afraid that is a common myth and often catastrophic mistake.
In this sort of area, NEVER assume that even apparently unrelated changes
won't cause 'working' code to misbehave.  Yes, Python is already exposed,
but it would be easy to turn a very rare failure into a more common one.

What I was actually arguing for was defensive programming.

 If we're going to rely on signal handling being correct then I think
 we should also rely on write() being correct.  Note that I'm not
 suggesting an API that allows arbitrary signal handlers, but rather
 one that calls write() on an array of prepared file descriptors
 (ignoring errors).

For your interpretation of 'correct'.  The cause of this chaos is that
the C and POSIX standards are inconsistent, even internally, and they
are wildly incompatible.  So, even if things 'work' today, don't bet on
the next release of your favourite system behaving the same way.

It wouldn't matter if there was a de facto standard (i.e. a consensus),
but there isn't.

 Ensuring modifications to that array are atomic would be tricky, but I
 think it would be doable if we use a read-copy-update approach (with
 two alternating signal handler functions).  Not sure how to ensure
 there's no currently running signal handlers in another thread though.
  Maybe have to rip the atomic read/write stuff out of the Linux
 sources to ensure it's *always* defined behavior.

Yes.  But even that wouldn't solve the problem, as that code is very
gcc-specific.

 Looking into the existing signalmodule.c, I see no attempts to ensure
 atomic access to the Handlers data structure.  Is the current code
 broken, at least on non-x86 platforms?

Well, at a quick glance at the actual handler (the riskiest bit):

1) It doesn't check the signal range - bad practice, as systems
do sometimes generate wayward numbers.

2) Handlers[sig_num].tripped = 1; is formally undefined, but
actually pretty safe.  If that breaks, nothing much will work.  It
would be better to make the int sig_atomic_t, as you say.

3) is_tripped++; and Py_AddPendingCall(checksignals_witharg, NULL);
will work only because the handler ignores all signals in subthreads
(which is definitely NOT right, as the comments say).

Despite the implication, the code of Py_AddPendingCall is NOT safe
against simultaneous registration.  It is just plain broken, I am
afraid.  The note starting Darn should be a LOT stronger :-)

[ For example, think of two threads calling the function at exactly
the same time, in almost perfect step.  Oops. ]

I can't honestly promise to put any time into this in the forseeable
future, but will try (sometime).  If anyone wants to tackle this,
please ask me for comments/help/etc.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-09 Thread Gustavo Carneiro
On 9/9/06, Jan Kanis [EMAIL PROTECTED] wrote:
 At the risk of waking up a thread that was already declared dead, but
 perhaps this is usefull.

 So, what happens is pythons signal handler sets a flag and registrers a
 callback. Then the main thread should check the flag and make the callback
 to actually do something with the signal. However the main thread is
 blocked in GTK and can't check the flag.

 Nick Maclaren wrote:
 ...lots of reasons why you can't do anything reliably from within a signal
 handler...

 As far as I understand it, what could work is this:
 -PyGTK registrers a callback.
 -Pythons signal handler does not change at all.
 -All threads that run in the Python interpreter occasionally check the
 flag which the signal handler sets, like the main thread does nowadays. If
 it is set, the thread calls PyGTKs callback. It does not do anything else
 with the signal.
 -PyGTKs callback wakes up the main thread, which actually handles the
 signal just like it does now.

 PyGTKs callback could be called from any thread, but it would be called in
 a normal context, not in a signal handler. As the signal handler does not
 change, the risk of breaking anything or causing chaos is as large/small
 as it is under the current scheme.

 However, PyGTKs problem does get
 solved, as long as there is _a_ thread that returns to the interpreter
 within some timeframe. It seems plausible that this will happen.

  No, it is not plausible at all.  For instance, the GnomeVFS library
usually has a pool of thread, not doing anything, waiting for some VFS
task.  It is likely that a signal will be delivered to one of these
threads, which know nothing about Python, and sit idle most of the
time.

  Regards.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-09 Thread Gustavo Carneiro
On 9/9/06, Adam Olsen [EMAIL PROTECTED] wrote:
 On 9/8/06, Adam Olsen [EMAIL PROTECTED] wrote:
  Ensuring modifications to that array are atomic would be tricky, but I
  think it would be doable if we use a read-copy-update approach (with
  two alternating signal handler functions).  Not sure how to ensure
  there's no currently running signal handlers in another thread though.
   Maybe have to rip the atomic read/write stuff out of the Linux
  sources to ensure it's *always* defined behavior.

 Doh, except that's exactly what sig_atomic_t is for.  Ah well, can't
 win them all.

From the glibc manual:

To avoid uncertainty about interrupting access to a variable, you can
use a particular data type for which access is always atomic:
sig_atomic_t. Reading and writing this data type is guaranteed to
happen in a single instruction, so there's no way for a handler to run
in the middle of an access.


  So, no, this is certainly not the same as linux kernel atomic
operations, which allow you to do more interesting stuff like,
test-and-clear, or decrement-and-test atomically.  glib has those too,
and so does mozilla's NSPR, but only on a few architectures does it do
it without using mutexes.  for instance, i686 onwards don't require
mutexes, only special instructions, but i386 requires mutexes.  And we
all know mutexes in signal handlers cause deadlocks :-(

  And, yes, Py_AddPendingCall and Py_MakePendingCalls are most
certainly not async safe!  Just look at the source code of
Py_MakePendingCalls and you'll see an interesting comment...
Therefore, discussions about signal safety in whatever new API we may
add to Python should be taken with a grain of salt.

  Regards.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-09 Thread Gustavo Carneiro
On 9/9/06, Nick Maclaren [EMAIL PROTECTED] wrote:
 I was hoping to have stopped, but here are a few comments.

 I agree with Jan Kanis.  That is the way to tackle this one.

  Alas, it doesn't work in practice, as I already replied.

[...]
 Despite the implication, the code of Py_AddPendingCall is NOT safe
 against simultaneous registration.  It is just plain broken, I am
 afraid.  The note starting Darn should be a LOT stronger :-)

  Considering that this code has existed for a very long time, and
that it isn't really safe, should we even bother to try to make
signals 100% reliable?

  I remember about a security-related module (bastion?) that first
claimed to allow execution of malicious code while protecting the
system; later, they figured out it wasn't really safe, and couldn't be
safe, so the documentation was simply changed to state not to use that
module if you need real security.

  I see the same problem here.  Python signal handling isn't _really_
100% reliable.  And it would be very hard to make Py_AddPendingCall /
Py_MakePendingCalls completely reliable.

But let's think for a moment.  Do we really _need_ to make Python unix
signal handling 100% reliable?  What are the uses for signals?  I can
only understand a couple of uses: handling of SIGINT for generating
KeyboardInterrupt [1], and handling of fatal errors like SIGSEGV in
order to show a crash dialog and bug reporting tool.  The second use
case doesn't demand 100% reliability.  The second use case is
currently being handled also in recent Ubuntu Linux through
/proc/sys/kernel/crashdump-helper.  Other notable uses that I see of
signals are sending SIGUSR1 or SIGHUP to a daemon to make it reload
its configuration.  But any competent programmer already knows how to
make the program use local sockets instead.


[1] Although ideally Python wouldn't even have KeyboardInterrupt and
just die on Ctrl-C like any normal program.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Steve Holden
Martin v. Löwis wrote:
 Nick Coghlan schrieb:
 
So this is taking something that *already works properly on POSIX
systems* and making it work on Windows as well.
 
 
 I doubt it does without side effects. For example, an application that
 would go through sys.path, and encode everything with
 sys.getfilesystemencoding() currently works, but will break if the patch
 is applied and non-mbcs strings are put on sys.path.
 
 Also, what will be the effect on __file__? What value will it have
 if the module originates from a sys.path entry that is a non-mbcs
 unicode string? I haven't tested the patch, but it looks like
 __file__ becomes a unicode string on Windows, and remains a byte
 string encoded with the file system encoding elsewhere. That's also
 a change in behavior.
 
Just to summarise my feeling having read the words of those more 
familiar with the issues than me: it looks like this should be a 2.6 
enhancement if it's included at all. I'd like to see it go in, but there 
do seem to be problems ensuring consistent behaviour across inconsistent 
platforms.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-09 Thread Jan Kanis
On Sat, 09 Sep 2006 12:59:23 +0200, Gustavo Carneiro  
[EMAIL PROTECTED] wrote:

 On 9/9/06, Jan Kanis [EMAIL PROTECTED] wrote:
 However, PyGTKs problem does get
 solved, as long as there is _a_ thread that returns to the interpreter
 within some timeframe. It seems plausible that this will happen.

   No, it is not plausible at all.  For instance, the GnomeVFS library
 usually has a pool of thread, not doing anything, waiting for some VFS
 task.  It is likely that a signal will be delivered to one of these
 threads, which know nothing about Python, and sit idle most of the
 time.

   Regards.

Well, perhaps it isn't plausible in all cases. However, it is dependant on  
the libraries you're using and debuggable, which broken signal handlers  
apparently aren't. The approach would work if you don't use libraries that  
block threads, and if the libraries that do, co-operate with the  
interpreter. Open source libraries can be made to co-operate, and if you  
don't have the source and a library doesn't work correctly, all bets are  
off anyway.
But having the signal handler itself write to a pipe seems to be a cleaner  
solution, if it can work reliable enough for some value of 'reliable'.

Jan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread David Hopwood
Martin v. Löwis wrote:
 Nick Coghlan schrieb:
 
So this is taking something that *already works properly on POSIX
systems* and making it work on Windows as well.
 
 I doubt it does without side effects. For example, an application that
 would go through sys.path, and encode everything with
 sys.getfilesystemencoding() currently works, but will break if the patch
 is applied and non-mbcs strings are put on sys.path.

Huh? It won't break on any path for which it is not already broken.

You seem to be saying Paths with non-mbcs strings shouldn't work on Windows,
because they haven't worked in the past.

-- 
David Hopwood [EMAIL PROTECTED]



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Martin v. Löwis
David Hopwood schrieb:
 I doubt it does without side effects. For example, an application that
 would go through sys.path, and encode everything with
 sys.getfilesystemencoding() currently works, but will break if the patch
 is applied and non-mbcs strings are put on sys.path.
 
 Huh? It won't break on any path for which it is not already broken.
 
 You seem to be saying Paths with non-mbcs strings shouldn't work on Windows,
 because they haven't worked in the past.

That's not what I'm saying. I'm saying that it shouldn't work in 2.5.x,
because it didn't in 2.5.0. Changing it in 2.6 is fine, along with the
incompatibilities it causes.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Nick Coghlan
David Hopwood wrote:
 Martin v. Löwis wrote:
 Nick Coghlan schrieb:

 So this is taking something that *already works properly on POSIX
 systems* and making it work on Windows as well.
 I doubt it does without side effects. For example, an application that
 would go through sys.path, and encode everything with
 sys.getfilesystemencoding() currently works, but will break if the patch
 is applied and non-mbcs strings are put on sys.path.
 
 Huh? It won't break on any path for which it is not already broken.
 
 You seem to be saying Paths with non-mbcs strings shouldn't work on Windows,
 because they haven't worked in the past.

I think MvL is looking at it from the point of view of consumers of the list 
of strings in sys.path, such as PEP 302 importer and loader objects, and tools 
like module_finder. Currently, the list of values in sys.path is limited to:

1. 8-bit strings
2. Unicode strings containing only characters which can be encoded using the 
default file system encoding

For PEP 302 loaders, it is currently correct for them to take the 8-bit string 
they receive and do path.decode(sys.getfilesystemencoding())

Kristján's patch works nicely for his application because he doesn't have to 
worry about compatibility with existing loaders and utilities. The core 
doesn't have that luxury.

We *might* be able to find a backwards compatible way to do it that could be 
put into 2.5.x, but that is effort that could more profitably be spent 
elsewhere, particularly since the state of the import system in Py3k will be 
for it to be based entirely on Unicode (as GvR pointed out last time this 
topic came up [1]).

Cheers,
Nick.

http://mail.python.org/pipermail/python-dev/2006-June/066225.html



-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Martin v. Löwis
Nick Coghlan schrieb:
 I think MvL is looking at it from the point of view of consumers of the list 
 of strings in sys.path, such as PEP 302 importer and loader objects, and 
 tools 
 like module_finder. Currently, the list of values in sys.path is limited to:

That, and all kinds of inspection tools. For example, when __file__ of a
module object changes to be a Unicode string (which it does under the
proposed patch), then these tools break. They currently don't break in
that way because putting arbitrary Unicode strings on sys.path doesn't
work in the first place.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Interest in a Python 2.3.6?

2006-09-09 Thread Martin v. Löwis
Barry Warsaw schrieb:
 Thoughts?  I don't want to waste my time if nobody thinks a 2.3.6 would
 be useful, but I'm happy to do it if there's community support.  I'll
 also need the usual help with Windows installers and documentation updates.

I personally would consider it a waste of time. Since it wouldn't waste
*my* time, I'm -0 :-)

I think everybody has arranged with whatever quirks Python 2.3 has.
Distributors of Python 2.3 have added whatever patches they think are
absolutely necessary. Making another release could cause confusion;
at worst, it may cause people to special-case people for 2.3.6 in
case the release contains some incompatible change that affects
existing applications.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread David Hopwood
Nick Coghlan wrote:
 David Hopwood wrote:
 Martin v. Löwis wrote:
 Nick Coghlan schrieb:

 So this is taking something that *already works properly on POSIX
 systems* and making it work on Windows as well.

 I doubt it does without side effects. For example, an application that
 would go through sys.path, and encode everything with
 sys.getfilesystemencoding() currently works, but will break if the patch
 is applied and non-mbcs strings are put on sys.path.

 Huh? It won't break on any path for which it is not already broken.

 You seem to be saying Paths with non-mbcs strings shouldn't work on
 Windows, because they haven't worked in the past.
 
 I think MvL is looking at it from the point of view of consumers of the
 list of strings in sys.path, such as PEP 302 importer and loader
 objects, and tools like module_finder. Currently, the list of values in
 sys.path is limited to:
 
 1. 8-bit strings
 2. Unicode strings containing only characters which can be encoded using
 the default file system encoding

On Windows, file system pathnames can contain arbitrary Unicode characters
(well, almost). Despite the existence of ANSI filesystem APIs, and
regardless of what 'sys.getfilesystemencoding()' returns, the underlying
file system encoding for NTFS and FAT filesystems is UTF-16LE.

Thus, either:
 - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding
   on Windows is a bug, or
 - any program that relies on sys.getfilesystemencoding() being able to
   encode arbitrary Windows pathnames has a bug.

We need to decide which of these is the case.

-- 
David Hopwood [EMAIL PROTECTED]



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Martin v. Löwis
David Hopwood schrieb:
 On Windows, file system pathnames can contain arbitrary Unicode characters
 (well, almost). Despite the existence of ANSI filesystem APIs, and
 regardless of what 'sys.getfilesystemencoding()' returns, the underlying
 file system encoding for NTFS and FAT filesystems is UTF-16LE.
 
 Thus, either:
  - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding
on Windows is a bug, or
  - any program that relies on sys.getfilesystemencoding() being able to
encode arbitrary Windows pathnames has a bug.
 
 We need to decide which of these is the case.

There is a third option:
- the operating system has a bug

It is actually this option that rules out the other two.
sys.getfilesystemencoding() returns mbcs on Windows, which means
CP_ACP. The file system encoding is an encoding that converts a
file name into a byte string. Unfortunately, on Windows, there are
file names which cannot be converted into a byte string in a standard
manner. This is an operating system bug (or mis-design; they should
have chosen UTF-8 as the byte encoding of file names, instead of
making it depend on the system locale, but they of course did so
for backwards compatibility with Windows 3.1 and 9x).

As a side note: every encoding in Python is a Unicode encoding;
so there aren't any non-Unicode encodings.

Programs that rely on sys.getfilesystemencoding() being able to
represent arbitrary file names on Windows might have a bug;
programs that rely on sys.getfilesystemencoding() being able
to encode all elements of sys.path do not (atleast not for
Python 2.5 and earlier).

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Interest in a Python 2.3.6?

2006-09-09 Thread Barry Warsaw
On Sep 9, 2006, at 2:10 PM, Martin v. Löwis wrote:

 Barry Warsaw schrieb:
 Thoughts?  I don't want to waste my time if nobody thinks a 2.3.6  
 would
 be useful, but I'm happy to do it if there's community support.  I'll
 also need the usual help with Windows installers and documentation  
 updates.

 I personally would consider it a waste of time. Since it wouldn't  
 waste
 *my* time, I'm -0 :-)

 I think everybody has arranged with whatever quirks Python 2.3 has.
 Distributors of Python 2.3 have added whatever patches they think are
 absolutely necessary. Making another release could cause confusion;
 at worst, it may cause people to special-case people for 2.3.6 in
 case the release contains some incompatible change that affects
 existing applications.

Well, there certainly hasn't been an overwhelming chorus of support  
for the idea, so I think I'll waste my time elsewhere ;).  Consider  
the offer withdrawn.

-Barry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread David Hopwood
Martin v. Löwis wrote:
 David Hopwood schrieb:
 
On Windows, file system pathnames can contain arbitrary Unicode characters
(well, almost). Despite the existence of ANSI filesystem APIs, and
regardless of what 'sys.getfilesystemencoding()' returns, the underlying
file system encoding for NTFS and FAT filesystems is UTF-16LE.

Thus, either:
 - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding
   on Windows is a bug, or
 - any program that relies on sys.getfilesystemencoding() being able to
   encode arbitrary Windows pathnames has a bug.

We need to decide which of these is the case.
 
 There is a third option:
 - the operating system has a bug

This behaviour is by design. If it is a bug, then it is a won't ever fix --
no way, no how bug, that Python must accomodate if it is to properly support
Unicode on Windows.

 It is actually this option that rules out the other two.
 sys.getfilesystemencoding() returns mbcs on Windows, which means
 CP_ACP. The file system encoding is an encoding that converts a
 file name into a byte string. Unfortunately, on Windows, there are
 file names which cannot be converted into a byte string in a standard
 manner. This is an operating system bug (or mis-design; they should
 have chosen UTF-8 as the byte encoding of file names, instead of
 making it depend on the system locale, but they of course did so
 for backwards compatibility with Windows 3.1 and 9x).

Although UTF-8 was invented (in September 1992) technically before the release
of the first version of NT supporting NTFS (NT 3.1 in July 1993), it had not
been invented before the decision to use Unicode in NTFS, or in Windows NT's
file APIs, had been made.

(I believe OS/2 HPFS had not supported Unicode, even though NTFS was otherwise
almost identical to it.)

At that time, the decision to use Unicode at all was quite forward-looking;
the final version of Unicode 1.0 had only been published in June 1992
(although it had been approved earlier; see http://www.unicode.org/history/).

UTF-8 was only officially added to the Unicode standard in an appendix of
Unicode 2.0 (published July 1996), and only given essentially equal status to
UTF-16 and UTF-32 in Unicode 3.0 (September 1999).

 As a side note: every encoding in Python is a Unicode encoding;
 so there aren't any non-Unicode encodings.

It was clear from context that I meant encoding capable of representing
all Unicode characters.

 Programs that rely on sys.getfilesystemencoding() being able to
 represent arbitrary file names on Windows might have a bug;
 programs that rely on sys.getfilesystemencoding() being able
 to encode all elements of sys.path do not (at least not for
 Python 2.5 and earlier).

Elements of sys.path can be Unicode strings in Python 2.5, and should be
pathnames supported by the underlying OS. Where is it documented that there
is any further restriction on them? And why should there be any further
restriction on them?

-- 
David Hopwood [EMAIL PROTECTED]



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Martin v. Löwis
David Hopwood schrieb:
 Elements of sys.path can be Unicode strings in Python 2.5, and should be
 pathnames supported by the underlying OS. Where is it documented that there
 is any further restriction on them? And why should there be any further
 restriction on them?

It's not documented in that detail; if people think it should be
documented more thoroughly, that should be done (contributions are
welcome). Changing the import machinery to deal with Unicode strings
differently cannot be done for Python 2.5, though: it cannot be done
for 2.5.0 as the release candidate has already been published, and there
is no acceptable patch available at this moment. It cannot be added
to 2.5.x as it may reasonably break existing applications.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.4.4 was: Interest in a Python 2.3.6?

2006-09-09 Thread Josiah Carlson

Barry Warsaw [EMAIL PROTECTED] wrote:
 Well, there certainly hasn't been an overwhelming chorus of support  
 for the idea, so I think I'll waste my time elsewhere ;).  Consider  
 the offer withdrawn.

I hope someone tries to fix one of the two bugs I listed that were
problems for 2.3 and 2.4 in 2.4.4:

http://www.python.org/sf/780714
http://www.python.org/sf/1548687

The former involves stack allocation errors in subthreads that exists
even in 2.5, which may not be fixable in Windows, and very likely is not
fixable on linux.

The latter is fixable on all platforms.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode Imports

2006-09-09 Thread Nick Coghlan
David Hopwood wrote:
 Martin v. Löwis wrote:
 Programs that rely on sys.getfilesystemencoding() being able to
 represent arbitrary file names on Windows might have a bug;
 programs that rely on sys.getfilesystemencoding() being able
 to encode all elements of sys.path do not (at least not for
 Python 2.5 and earlier).
 
 Elements of sys.path can be Unicode strings in Python 2.5, and should be
 pathnames supported by the underlying OS. Where is it documented that there
 is any further restriction on them? And why should there be any further
 restriction on them?

There's no suggestion that this limitation shouldn't be fixed - merely that 
fixing it is likely to break some applications which rely on sys.path for 
importing or introspection purposes. A 2.5.x maintenance release typically 
shouldn't break anything that worked correctly on 2.5.0, hence fixing this 
becomes a project for either 2.6 or 3.0.

To put it another way: fixing this is likely to require changes to more than 
just the interpreter core. It will also potentially require changes to all 
applications which currently expect to be able to use 
's.encode(sys.getfilesystemencoding())' to convert any Unicode path entry or 
__file__ attribute to an 8-bit string.

Doing that qualifies as correcting a language design error or limitation, but 
it would require a real stretch of the definition to qualify as a bug fix.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com