Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Ulrich Eckhardt
On Wednesday 10 December 2008, Adam Olsen wrote:
 On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt

 [EMAIL PROTECTED] wrote:
  On Tuesday 09 December 2008, Adam Olsen wrote:
  The only thing separating this from a bikeshed discussion is that a
  bikeshed has many equally good solutions, while we have no good
  solutions.  Instead we're trying to find the least-bad one.  The
  unicode/bytes separation is pretty close to that.  Adding a warning
  gets even closer.  Adding magic makes it worse.
 
  Well, I see two cases:
  1. Converting from an uncertain representation to a known one.
  2. Converting from a known representation to a known one.

 Not quite:
 1. Using a garbage file name locally (within a single process, not
 talking to any libs)
 2. Using a unicode filename everywhere (libs, saved to config files,
 displayed to the user, etc.)

I think there is some misunderstanding. I was referring to conversions and 
whether it is good to perform them implicitly. For that, I saw the above two 
cases.

 On linux the bytes/unicode separation is perfect for this.  You decide
 which approach you're using and use it consistently.  If you mess up
 (mixing bytes and unicode) you'll consistently get an error.

 We currently don't follow this model on windows, so a garbage file
 name gets passed around as if it was unicode, but fails when passed to
 a lib, saved to a config file, is displayed to a user, etc.

I'm not sure I agree with this. Facts I know are:
1. On POSIX systems, there is no reliable encoding for filenames while the 
system APIs use char/byte strings.
2. On MS Windows, the encoding for filenames is Unicode/UTF-16.

Returning Unicode strings from readdir() is wrong because it can't handle the 
case 1 above. Returning byte strings is wrong because it can't handle case 2 
above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, 
worst case, to the locale-dependent MBCS. Returning something different 
depending on the system us also broken because that would make Python code 
that uses this function and assumes a certain type unportable.

Note that this doesn't get much better if you provide a separate readdirb() 
API or one that simply returns a byte string or Unicode string depending on 
its argument. It just shifts the brokenness from readdir() to the code that 
uses it, unless this code makes a distinction between the target systems. 
Since way too many programmers are not aware of the problem, they will not 
handle these systems differently, so code will become non-portable.

What I'd just like some feedback on is the approach to return a distinct type 
(neither a byte string nor a Unicode string) from readdir(). In order to use 
this, a programmer will have to convert it explicitly, otherwise e.g. 
printing it will just produce env_string at 0x01234567. This will 
immediately bump each programmer with their heads on the issue of unknown 
encodings and they will have to make the application-specific choice whether 
an approximation of the filename, an exception or ignoring the file is the 
right choice. Also, it presents the options for doing this conversion in a 
single class, which I personally find much better than providing overloads 
for hundreds of functions.


Sorry for ranting, but I'm a bit confused and desperate, because either I'm 
unable to explain what I mean or I'm really not understanding something that 
everybody else here seems to agree upon. I just know that using a distinct 
path type has helped me in C++ in the past, and I don't see why it shouldn't 
in Python.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**
   Visit our website at http://www.satorlaser.de/
**
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten 
bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen 
Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein 
sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, 
weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte 
Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht 
verantwortlich.

**

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Victor Stinner
Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez écrit :
  Recover after a segfault is dangerous, but my first goal was to get the
  Python backtrace instead just one line: Segmentation fault. It helps a
  lot for debug!
 
  Exactly! That's why it doesn't belong in the Python core. We can't
  guarantee anything about its affects or encourage it.

 Would it be safe to catch SIGSEGV, output a trace, and then exit?
 IE, make the 'first goal' the only goal?

Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to 
display the trace? It would be nice to -at least- use the Python stderr 
(which is written in pure Python for Python3). It would be better if the user 
can setup a callback, like sys.excepthook. But if -as many people wrote- 
Python is totally broken after a segfault, it is maybe not a good idea :-)

I guess that sigsetjmp() and siglongjmp() hack can be avoided in 
Py_EvalFrameEx(), so ceval.c could be unchanged.

New pseudocode:
  set checkpoint
  if error:
 get the backtrace
 display the backtrace
 fast exit (eg. don't call atexit, don't free memory, ...)
  else:
 normal execution

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Maciej Fijalkowski

 If we could calculate how much stack is left we'd have a much more
 robust way of doing recursion limits.  I suppose this could be done by
 reading a byte from each page with a temporary SIGSEGV handler
 installed, but I'm not convinced you can't ask the platform directly
 somehow.  I'd also be considered about thread-safety.


It's something as hard as taking address of local variable at the
beginning of the program and at any arbitrary point. Of course 'how
much is left' means additional arithmetics.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Steve Holden
Ulrich Eckhardt wrote:
 On Wednesday 10 December 2008, Adam Olsen wrote:
 On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt

 [EMAIL PROTECTED] wrote:
 On Tuesday 09 December 2008, Adam Olsen wrote:
 The only thing separating this from a bikeshed discussion is that a
 bikeshed has many equally good solutions, while we have no good
 solutions.  Instead we're trying to find the least-bad one.  The
 unicode/bytes separation is pretty close to that.  Adding a warning
 gets even closer.  Adding magic makes it worse.
 Well, I see two cases:
 1. Converting from an uncertain representation to a known one.
 2. Converting from a known representation to a known one.
 Not quite:
 1. Using a garbage file name locally (within a single process, not
 talking to any libs)
 2. Using a unicode filename everywhere (libs, saved to config files,
 displayed to the user, etc.)
 
 I think there is some misunderstanding. I was referring to conversions and 
 whether it is good to perform them implicitly. For that, I saw the above two 
 cases.
 
 On linux the bytes/unicode separation is perfect for this.  You decide
 which approach you're using and use it consistently.  If you mess up
 (mixing bytes and unicode) you'll consistently get an error.

 We currently don't follow this model on windows, so a garbage file
 name gets passed around as if it was unicode, but fails when passed to
 a lib, saved to a config file, is displayed to a user, etc.
 
 I'm not sure I agree with this. Facts I know are:
 1. On POSIX systems, there is no reliable encoding for filenames while the 
 system APIs use char/byte strings.
 2. On MS Windows, the encoding for filenames is Unicode/UTF-16.
 
 Returning Unicode strings from readdir() is wrong because it can't handle the 
 case 1 above. Returning byte strings is wrong because it can't handle case 2 
 above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, 
 worst case, to the locale-dependent MBCS. Returning something different 
 depending on the system us also broken because that would make Python code 
 that uses this function and assumes a certain type unportable.
 
 Note that this doesn't get much better if you provide a separate readdirb() 
 API or one that simply returns a byte string or Unicode string depending on 
 its argument. It just shifts the brokenness from readdir() to the code that 
 uses it, unless this code makes a distinction between the target systems. 
 Since way too many programmers are not aware of the problem, they will not 
 handle these systems differently, so code will become non-portable.
 
 What I'd just like some feedback on is the approach to return a distinct type 
 (neither a byte string nor a Unicode string) from readdir(). In order to use 
 this, a programmer will have to convert it explicitly, otherwise e.g. 
 printing it will just produce env_string at 0x01234567. This will 
 immediately bump each programmer with their heads on the issue of unknown 
 encodings and they will have to make the application-specific choice whether 
 an approximation of the filename, an exception or ignoring the file is the 
 right choice. Also, it presents the options for doing this conversion in a 
 single class, which I personally find much better than providing overloads 
 for hundreds of functions.
 
 
 Sorry for ranting, but I'm a bit confused and desperate, because either I'm 
 unable to explain what I mean or I'm really not understanding something that 
 everybody else here seems to agree upon. I just know that using a distinct 
 path type has helped me in C++ in the past, and I don't see why it shouldn't 
 in Python.
 
Seems to me this just threatens to add to the confusion.

If you know what your filesystem produces, you can take the appropriate
action to convert it into a type that makes sense to the user. If you
don't, then at least if you have the string in its bytes form you can
re-present it to the filesystem to manipulate the file. What are we
supposed to do with the special type?

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging flow

2008-12-11 Thread Nick Coghlan
Martin v. Löwis wrote:
 Jeffrey Yasskin wrote:
 Was there ever a conclusion to this? I need to merge the patches
 associated with issue 4597 from trunk to all the maintenance branches,
 and I'd like to avoid messing anyone up if possible. If I don't hear
 back, I'll plan to svnmerge directly from trunk to each of the
 branches, and then block my merge to py3k from being merged again to
 release30-maint.
 
 No - you should merge from the py3k branch to the release30-maint branch.

I believe that's difficult when you previously merged from the trunk to
the py3k branch - the merged change to the svnmerge related properties
on the root directory gets in the way when svnmerge attempts to update
them on the maintenance branch.

That's what started this thread, and so far nobody has come up with a
workaround. It seems to me that svnmerge.py should just be able to do a
svn revert on the affected properties in the maintenance branch before
it attempts to modify them, but my svn-fu isn't strong enough for me to
say that for sure.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread skip

Simon Some indictation of what Python was executing when the segfault
Simon occurred would help narrow now the possibilities rapidly.

The Python distribution comes with a Misc/gdbinit file (you can grab it from
the Subversion source tree via the web as well) that defines a pystack
command.  It will work with core files as well as running processes and
should give you a very good idea where your Python code was executing when
the segfault occurred.

-- 
Skip Montanaro - [EMAIL PROTECTED] - http://smontanaro.dyndns.org/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Antoine Pitrou
skip at pobox.com writes:
 
 The Python distribution comes with a Misc/gdbinit file (you can grab it from
 the Subversion source tree via the web as well) that defines a pystack
 command.  It will work with core files as well as running processes and
 should give you a very good idea where your Python code was executing when
 the segfault occurred.

Still, it would be much better if the stack trace could be printed by Python
itself rather than having to resort to gdb wizardry. Especially if the problem
is reported by one of your non-developer users.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread skip

Antoine Still, it would be much better if the stack trace could be
Antoine printed by Python itself rather than having to resort to gdb
Antoine wizardry. Especially if the problem is reported by one of your
Antoine non-developer users.

I understand.  The guy has a problem today for which there is a solution
that I posted.  If he's been meaning to look into the problem and he's
posting to python-dev I presume he knows at least a little about running gdb
if he's operating in a Unix environment.  These two gdb commands

source .gdbinit
pystack

shouldn't be too much of a barrier.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Antoine Pitrou
skip at pobox.com writes:
 
 I understand.  The guy has a problem today for which there is a solution
 that I posted.  If he's been meaning to look into the problem and he's
 posting to python-dev I presume he knows at least a little about running gdb
 if he's operating in a Unix environment.  These two gdb commands
 
 source .gdbinit
 pystack
 
 shouldn't be too much of a barrier.

Well, but sometimes you don't have a core file (because you didn't run ulimit
before launching Python and the crash wasn't expected; if the crash is very
erratic, by the time you've fixed the system limits, you don't manage to
reproduce it anymore, or it takes hours because it's at the end of a very long
workload). Sometimes you don't have the gdbinit file around (for example,
Mandriva doesn't ship it with any Python-related package). Sometimes you are
under Windows.

etc. :-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Ulrich Eckhardt
On Thursday 11 December 2008, Steve Holden wrote:
 Ulrich Eckhardt wrote:
  What I'd just like some feedback on is the approach to return a distinct
  type (neither a byte string nor a Unicode string) from readdir(). In
  order to use this, a programmer will have to convert it explicitly,
  otherwise e.g. printing it will just produce env_string at 0x01234567.
  This will immediately bump each programmer with their heads on the issue
  of unknown encodings and they will have to make the application-specific
  choice whether an approximation of the filename, an exception or ignoring
  the file is the right choice. Also, it presents the options for doing
  this conversion in a single class, which I personally find much better
  than providing overloads for hundreds of functions.
[...]

 Seems to me this just threatens to add to the confusion.

 If you know what your filesystem produces, you can take the appropriate
 action to convert it into a type that makes sense to the user. If you
 don't, then at least if you have the string in its bytes form you can
   ^^^

There are operating systems that don't use bytes to represent a file path, 
namely all the MS Windows variants. Even worse, when you use a byte string 
there, it typically means that you want to use the obsolete encoding that is 
based on codepages.

Why can we not preserve the representation of a path as it is? Why do we 
_have_ to convert it to anything at all, without even knowing if this 
conversion is needed? I just want to do something to a file's content, why 
does its path have to be converted to something and then be converted back in 
order for the system to digest it?

 re-present it to the filesystem to manipulate the file. What are we
 supposed to do with the special type?

You receive from readdir() and pass it to stat(), simple as that. No 
conversions from the native representation needed. If you need a textual 
representation, then you have to convert it and you have to do so explicitly 
according to whatever logic your application requires.

If readdir() returned Unicode text, people would start taking that for 
granted. If it returned bytes, just the same. Returning a completely 
unrelated type will give them enough hint that for this thing they have to 
rethink their assumptions. This runs along the lines of In the face of 
ambiguity, refuse the temptation to guess., as it makes guessing rather 
impossible.

I just don't see a case where using a separate path class would break things. 
Further, the special handling that is required would be made even clearer by 
using such a class.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**
   Visit our website at http://www.satorlaser.de/
**
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten 
bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen 
Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein 
sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, 
weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte 
Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht 
verantwortlich.

**

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Ivan Krstić

Hi Martin,

On Dec 11, 2008, at 12:12 AM, Martin v. Löwis wrote:

Several people already said (essentially) that: -1. I don't think such
code should be added to the Python core, no matter how smart or  
correct

it is.



does your -1 apply only to attempts to resume execution after SIGSEGV,  
or also to the idea of dumping the stack and immediately exiting? The  
former strikes me as crazy talk, while the latter is genuinely useful.


Cheers,

--
Ivan Krstić [EMAIL PROTECTED] | http://radian.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Victor Stinner
Le Thursday 11 December 2008 13:57:03 [EMAIL PROTECTED], vous avez écrit :
 Simon Some indictation of what Python was executing when the segfault
 Simon occurred would help narrow now the possibilities rapidly.

 The Python distribution comes with a Misc/gdbinit file

Hum, do you really run *all* programs in gdb? Most of the time, you don't 
expect a crash (because you trust your softwares). You will have to try to 
reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!).

My new proposition is to display the backtrace instead of just the 
message segmentation fault. It's not a problem if displaying the backtrace 
produces new fault because it's already better than just the 
message segmentation fault. Even with my SIGSEVG handler, you can still use 
gdb because gdb catchs the signal before the program.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Isaac Morland

On Thu, 11 Dec 2008, Ulrich Eckhardt wrote:


On Thursday 11 December 2008, Steve Holden wrote:

Ulrich Eckhardt wrote:
Seems to me this just threatens to add to the confusion.

If you know what your filesystem produces, you can take the appropriate
action to convert it into a type that makes sense to the user. If you
don't, then at least if you have the string in its bytes form you can

  ^^^

There are operating systems that don't use bytes to represent a file path,
namely all the MS Windows variants. Even worse, when you use a byte string
there, it typically means that you want to use the obsolete encoding that is
based on codepages.

Why can we not preserve the representation of a path as it is? Why do we
_have_ to convert it to anything at all, without even knowing if this
conversion is needed? I just want to do something to a file's content, why
does its path have to be converted to something and then be converted back in
order for the system to digest it?


re-present it to the filesystem to manipulate the file. What are we
supposed to do with the special type?


You receive from readdir() and pass it to stat(), simple as that. No
conversions from the native representation needed. If you need a textual
representation, then you have to convert it and you have to do so explicitly
according to whatever logic your application requires.


Not only would this address the issue with the local filesystem, it would 
also provide a principled way to deal with remote filesystems.  For 
example, an FTP interface library for Python could use this type to 
returns paths of the sort actually supported by the raw FTP protocol.


Thinking of the filesystem is actually a misconception - always 
referring to a filesystem opens up all sorts of possibilities.  There is 
a lot of coding to do to allow this, but allowing programs to work with 
paths and files in the local filesystem, remote filesystems, and 
filesystems constructed from others (e.g., by expanding symlinks, changing 
the root similar to chroot, or encoding/unencoding pathnames) would open 
up lots of possibilities, including better test environments.


This is an interesting case of separating byte strings from character 
strings.  As long as the two are conflated, everything appears simple. 
But when they are separated, not only are there two types where before 
there was only one, it turns out that which type is correct in some 
circumstances depends on the platform.  Also, many objects which are byte 
strings at the protocol level are usually or always meant to be character 
strings of some sort, but how to translate them simply cannot be nailed 
down once and for all.


Isaac Morland   CSCF Web Guru
DC 2554C, x36650WWW Software Specialist
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread skip

 The Python distribution comes with a Misc/gdbinit file

Victor Hum, do you really run *all* programs in gdb? Most of the time,
Victor you don't expect a crash (because you trust your softwares). You
Victor will have to try to reproduce the crash, but sometimes it's very
Victor hard (eg. Heisenbugs!).

Please folks!  Get real.  I was trying to help out a guy who responded to
this thread saying that he gets intermittent segfaults in his PyGTK
programs.  I don't presume that he runs his app in gdb.  If he has a core
file this will work.  I apologize profusely for any implication that a set
of gdb commands is in any way superior to your patch.

OTOH, it works today if you have a core file and are running Python at least
as far back as 2.4.  It doesn't require any changes to the interpreter.  I
use it frequently at work (a couple times a month anyway).  We get
notifications of all core files dropped each day.  I make at least a cursory
check of all core files dumped by Python.  For that I use the pystack
command defined in Misc/gdbinit.

Victor My new proposition is to display the backtrace instead of just
Victor the message segmentation fault. It's not a problem if
Victor displaying the backtrace produces new fault because it's already
Victor better than just the message segmentation fault. Even with my
Victor SIGSEVG handler, you can still use gdb because gdb catchs the
Victor signal before the program.

Again, I meant no disrespect to your proposal.  I was *simply trying to help
the guy out*.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Jeffrey Yasskin
On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner
[EMAIL PROTECTED] wrote:
 But if -as many people wrote-
 Python is totally broken after a segfault, it is maybe not a good idea :-)

While it's true that after a segfault or unexpected longjmp, there are
no guarantees whatsoever about the state of the python program, the
program will often just happen to work, and there are at least some
programs I've worked on that would rather take the risk in order to
try to shut down gracefully. For example, an interactive app may want
to give the user a chance to save her (not necessarily corrupted) work
into a new file rather than unconditionally losing it. Or a webserver
might want to catch the segfault, finish replying to the other
requests that were in progress at the time, maybe reply to the request
that caused the segfault, and then restart. Yes there's a possibility
that the events around the segfault exposed some secret internal data
(and they may do so even without segfaulting), but when the
alternative is not replying to the users at all, this may be a risk
the app wants to take. It would be nice for Python to at least expose
the option so that developers (who are consenting adults, remember)
can make their own decisions. It should _not_ be on by default, but
something like sys.dangerous_turn_C_crashes_into_exceptions() would be
useful.

Jeffrey
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging flow

2008-12-11 Thread Jeffrey Yasskin
On Thu, Dec 11, 2008 at 4:18 AM, Nick Coghlan [EMAIL PROTECTED] wrote:
 Martin v. Löwis wrote:
 Jeffrey Yasskin wrote:
 Was there ever a conclusion to this? I need to merge the patches
 associated with issue 4597 from trunk to all the maintenance branches,
 and I'd like to avoid messing anyone up if possible. If I don't hear
 back, I'll plan to svnmerge directly from trunk to each of the
 branches, and then block my merge to py3k from being merged again to
 release30-maint.

 No - you should merge from the py3k branch to the release30-maint branch.

 I believe that's difficult when you previously merged from the trunk to
 the py3k branch - the merged change to the svnmerge related properties
 on the root directory gets in the way when svnmerge attempts to update
 them on the maintenance branch.

 That's what started this thread, and so far nobody has come up with a
 workaround. It seems to me that svnmerge.py should just be able to do a
 svn revert on the affected properties in the maintenance branch before
 it attempts to modify them, but my svn-fu isn't strong enough for me to
 say that for sure.

Yeah, that's why I asked. I tried what Martin suggested with r67698 by
just saying I'd resolved the conflict, which added the single revision
I was merging from to the svnmerge-integrated property. It didn't add
the two original revisions. I don't know enough about how svnmerge
works to know if that's the right outcome or who it's going to cause
trouble for.

Jeffrey
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread James Y Knight


On Dec 11, 2008, at 11:08 AM, Jeffrey Yasskin wrote:


On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner
[EMAIL PROTECTED] wrote:

But if -as many people wrote-
Python is totally broken after a segfault, it is maybe not a good  
idea :-)


While it's true that after a segfault or unexpected longjmp, there are
no guarantees whatsoever about the state of the python program, the
program will often just happen to work, and there are at least some
programs I've worked on that would rather take the risk in order to
try to shut down gracefully.


I ran an interactive game for years (written in C, mind you, not  
python), where the SIGSEGV handler simply recursively reinvoked the  
main loop, after disabling the command that caused a SEGV if it had  
caused a SEGV twice already. It almost always worked and continued  
running without issue. YMMV, of course. :)


James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Benjamin Peterson
On Thu, Dec 11, 2008 at 10:08 AM, Jeffrey Yasskin jyass...@gmail.com wrote:
 On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner
 victor.stin...@haypocalc.com wrote:
 But if -as many people wrote-
 Python is totally broken after a segfault, it is maybe not a good idea :-)

 While it's true that after a segfault or unexpected longjmp, there are
 no guarantees whatsoever about the state of the python program, the
 program will often just happen to work, and there are at least some
 programs I've worked on that would rather take the risk in order to
 try to shut down gracefully. For example, an interactive app may want
 to give the user a chance to save her (not necessarily corrupted) work
 into a new file rather than unconditionally losing it. Or a webserver
 might want to catch the segfault, finish replying to the other
 requests that were in progress at the time, maybe reply to the request
 that caused the segfault, and then restart. Yes there's a possibility
 that the events around the segfault exposed some secret internal data
 (and they may do so even without segfaulting), but when the
 alternative is not replying to the users at all, this may be a risk
 the app wants to take. It would be nice for Python to at least expose
 the option so that developers (who are consenting adults, remember)
 can make their own decisions. It should _not_ be on by default, but
 something like sys.dangerous_turn_C_crashes_into_exceptions() would be
 useful.

Trying to recover (or save work etc.) is incredibility unpredictable,
though. It could very well end up making the situation worse!

I'm -1 on putting this in the core.



-- 
Cheers,
Benjamin Peterson
There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 6:41 AM, Ulrich Eckhardt
eckha...@satorlaser.com wrote:
 On Thursday 11 December 2008, Steve Holden wrote:
 re-present it to the filesystem to manipulate the file. What are we
 supposed to do with the special type?

 You receive from readdir() and pass it to stat(), simple as that. No
 conversions from the native representation needed. If you need a textual
 representation, then you have to convert it and you have to do so explicitly
 according to whatever logic your application requires.

The simplest solution there is to have windows bytes APIs that return
raw UTF-16 bytes (note that windows does NOT guaranteed to be valid
unicode, despite being much more likely than on linux).  The only real
issue I see is that UTF-16 isn't an ASCII superset, so it won't print
nicely.

In other words, bytes can be your special type.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 2:34 AM, Victor Stinner
victor.stin...@haypocalc.com wrote:
 Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez écrit :
  Recover after a segfault is dangerous, but my first goal was to get the
  Python backtrace instead just one line: Segmentation fault. It helps a
  lot for debug!
 
  Exactly! That's why it doesn't belong in the Python core. We can't
  guarantee anything about its affects or encourage it.

 Would it be safe to catch SIGSEGV, output a trace, and then exit?
 IE, make the 'first goal' the only goal?

 Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to
 display the trace? It would be nice to -at least- use the Python stderr
 (which is written in pure Python for Python3). It would be better if the user
 can setup a callback, like sys.excepthook. But if -as many people wrote-
 Python is totally broken after a segfault, it is maybe not a good idea :-)

You have to use the low-level stderr, nothing that invokes Python.
I'd hate to get a second segfault while printing the first.

Just think about how indirect refcounting bugs tend to be.  Another
example is messing up GIL handling.  There's heaps of things for which
we'd want good stack traces, which can't be done from Python.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Daniel Stutzbach
On Thu, Dec 11, 2008 at 12:15 PM, Adam Olsen rha...@gmail.com wrote:

 You have to use the low-level stderr, nothing that invokes Python.
 I'd hate to get a second segfault while printing the first.

 Just think about how indirect refcounting bugs tend to be.  Another
 example is messing up GIL handling.  There's heaps of things for which
 we'd want good stack traces, which can't be done from Python.


+1 on functionality to print a stack trace on a fault
-1 on translating the fault into an exception

I suggest exposing some functions to control the functionality.  Here are
some things the user may wish to control:

1. Disable/enable the functionality altogether
2. Set the file descriptor that the stack trace should be written to
3. Set a file name that should be created and written to instead
4. Specify whether a core dump should be generated
5. Specify a program to run after the stack trace has been printed

#3 combined with #5 would be very useful for automated bug reporting.

For what it's worth, the functionality could be implemented under Windows
using Structured Exception Handling.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging flow

2008-12-11 Thread Martin v. Löwis
 I believe that's difficult when you previously merged from the trunk to
 the py3k branch - the merged change to the svnmerge related properties
 on the root directory gets in the way when svnmerge attempts to update
 them on the maintenance branch.
 
 That's what started this thread, and so far nobody has come up with a
 workaround.

The work-around is fairly straight-forward:

- inspect the conflict file (I forgot its name - something like
  dir-props), and verify that the only conflict is in the missing
  merge info from trunk to py3k
- svn resolved .

 It seems to me that svnmerge.py should just be able to do a
 svn revert on the affected properties in the maintenance branch before
 it attempts to modify them, but my svn-fu isn't strong enough for me to
 say that for sure.

See above. svnmerge overwrites the property after it has conflicted,
so the only additional action to take is to declare that a resolution.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Martin v. Löwis
 On Dec 11, 2008, at 12:12 AM, Martin v. Löwis wrote:
 Several people already said (essentially) that: -1. I don't think such
 code should be added to the Python core, no matter how smart or correct
 it is.
 
 
 does your -1 apply only to attempts to resume execution after SIGSEGV,
 or also to the idea of dumping the stack and immediately exiting? The
 former strikes me as crazy talk, while the latter is genuinely useful.

Only to the former. If it is actually possible to print a stack trace,
that could be useful indeed. I'm then skeptical that this is possible
in the general case (i.e. displaying the full C stack), but displaying
(parts of) the Python stack might be possible. I think it should still
proceed to dump core, so that you can then inspect the core with a
proper debugger.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging flow

2008-12-11 Thread Martin v. Löwis
 Yeah, that's why I asked. I tried what Martin suggested with r67698 by
 just saying I'd resolved the conflict, which added the single revision
 I was merging from to the svnmerge-integrated property. It didn't add
 the two original revisions. 

Can you elaborate? What are the two original revisions it didn't add?

If you are referring to the trunk revisions - that's fine. As far
as svnmerge is concerned, we merge revisions from the 3k branch
to the 3.0 maintenance branch. The original revisions don't exist
on the 3k branch (they have an empty changeset), so it's not a
problem that they didn't get recorded as merged.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Merging flow

2008-12-11 Thread Eric Smith

Nick Coghlan wrote:

Martin v. Löwis wrote:

I believe that's difficult when you previously merged from the trunk to
the py3k branch - the merged change to the svnmerge related properties
on the root directory gets in the way when svnmerge attempts to update
them on the maintenance branch.

That's what started this thread, and so far nobody has come up with a
workaround.

The work-around is fairly straight-forward:

- inspect the conflict file (I forgot its name - something like
  dir-props), and verify that the only conflict is in the missing
  merge info from trunk to py3k
- svn resolved .


Ah, that's the missing piece of info - thanks :)

This should probably go in the dev FAQ somewhere though.


Indeed! Preferably with an example, if someone who understands it has 
the time. I have some changes I've been hold off of checking in until I 
see how someone else handles this.


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Trap SIGSEGV and SIGFPE

2008-12-11 Thread Martin v. Löwis
 The Python distribution comes with a Misc/gdbinit file
 
 Hum, do you really run *all* programs in gdb? Most of the time, you don't 
 expect a crash (because you trust your softwares). You will have to try to 
 reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!).

You don't have to run the program in gdb. You can also use the core dump
that the operating system will generate, and study the crash after it
happened.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Stephen J. Turnbull
Steve Holden writes:
  Ulrich Eckhardt writes:

   What I'd just like some feedback on is the approach to return a
   distinct type (neither a byte string nor a Unicode string) from
   readdir().

This is presumably unacceptable on the grounds that it will break
existing code that does something more or less useful more or less
some of the time.wink

  If you know what your filesystem produces, you can take the appropriate
  action to convert it into a type that makes sense to the user.

Unfortunately, even programmers experienced in I18N like Martin, and
those with intuition-that-has-the-force-of-lawwink like Guido,
express deliberate disbelief on this point.  They say that filesystem
names and environment variable values are text, which is true from the
semantic viewpoint but can't be fully supported by any implementation.

The implementation issue is why you want bytes, but I don't think it
is going to overcome the tide of (semantically-oriented) pragmatism.

  If you don't, then at least if you have the string in its bytes
  form you can re-present it to the filesystem to manipulate the
  file. What are we supposed to do with the special type?

Trivially convert it back to bytes and re-present it to the
filesystem, of course.

I gather that the BFDL's line on this thread of discussion is that
forcing programmers to think about encodings every time they call out
to the OS is unacceptable when most programs will work acceptably
almost all of the time with a rather naive approach.  This means that
almost all Python programs will be technically broken for the
forseeable future, sorry, Ulrich.

And for the same pragmatic reasons, these functions are going to
return strings (ie, Unicode), not bytes, I expect.  Sorry, Steve.

What needs to be determined here is the best way to provide
reliability to those who will go to the effort of asking for it if
it's available.  I don't think just return bytes fits the bill for
the reason above.

What I would like to see is a type that is derived from string (so if
you present it to an API expecting string, it is silently treated as
string), but from which the original bytes can always be extracted on
request.  If the original bytes cannot be sensibly decoded to a
string, then the string field in the object would either contain
something that should normally cause an error in a string API, or some
made-up string (presumably it would attempt to be a more or less
faithful representation of the bytes) at the caller's option.
Probably they'd also contain some metadata useful in guessing
encodings (the read time locale in particular).

These objects probably shouldn't support string-like operations in a
general way (ie, maintaining both the string representation and the
bytes correctly).  Rather, using proper string operations on them
would use the string content and produce strings.  People who really
want to handle mixed-encoding pathnames and the like would have to
keep collections of these objects and handle them in an ad-hoc way.

Unfortunate implementing this is way beyond my skills and time
availability.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] The endless GIL debate: why not remove thread support instead?

2008-12-11 Thread Sturla Molden
Last month there was a discussion on Python-Dev regarding removal of
reference counting to remove the GIL. I hope you forgive me for continuing
the debate.

I think reference counting is a good feature. It prevents huge piles of
garbage from building up. It makes the interpreter run more smoothly. It
is not just important for games and multimedia applications, but also
servers under high load. Python does not pause to look for garbage like
Java or .NET. It only pauses to look for dead reference cycles. This can
be safely turned off temporarily; it can be turned off completely if you
do not create reference cycles. With Java and .NET, no garbage is ever
reclaimed except by the intermittent garbage collection. Python always
reclaims an object when the reference count drops to zero – whether the GC
is enabled or not. This makes Python programs well-behaved. For this
reason, I think removing reference counting is a genuinely bad idea. Even
if the GIL is evil, this remedy is even worse.

I am not a Python core developer; I am a research scientist who use Python
because Matlab is (or used to be) a bad programming language, albeit a
good computing environment. As most people who have worked with scientific
computing know, there are better paradigms for concurrency than threads.
In particular, there are message-passing systems like MPI and Erlang, and
there are autovectorizing compilers for OpenMP and Fortran 90/95. There
are special LAPACK, BLAS and FFT libraries for parallel computer
architectures. There are fork-join systems like cilk and
java.util.concurrent. Threads seem to be used only because mediocre
programmers don't know what else to use.

I genuinely think the use of threads should be discouraged. It leads to
code that are full of bugs and difficult to maintain - race conditions,
deadlocks, and livelocks are common pitfalls. Very few developers are
capable of implementing efficient load-balancing by hand. Multi-threaded
programs tend to scale badly because they are badly written. If the GIL
discourages the abuse of threads, it serves a purpose albeit being evil
like the Linux kernel's BKL.

Python could be better off doing what tcl does. Allow each process to
embed multiple interpreters; run each interpreter in its own thread.
Implement a fast message-passing system between the interpreters (e.g.
copy-on-write by making communicated objects immutable), and Python would
be closer to Erlang than Java.

I thus think the main offender is the thread and threading modules - not
the GIL. Without thread support in the interpreter, there would be no
threads. Without threads, there would be no need for a GIL. Both sources
of evil can be removed by just removing thread support from the Python
interpreter. In addition, it would make Python faster at executing linear
code. Just copy the concurrency model of Erlang instead of Java and get
rid of those nasty threads. In the meanwhile, I'll continue to experiment
with multiprocessing.

Removing reference counting to encourage the use of threads is like
shooting ourselves in the leg twice. That’s my two cents on this issue.

There is another issue to note as well: If you can endure a 200x loss of
efficacy by using Python instead of Fortran, scalability on dual or
quad-core processors may not be that important. Just move the bottlenecks
out of Python and you are much better off.


Regards,
Sturla Molden


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote:
 Unfortunately, even programmers experienced in I18N like Martin, and
 those with intuition-that-has-the-force-of-lawwink like Guido,
 express deliberate disbelief on this point.  They say that filesystem
 names and environment variable values are text, which is true from the
 semantic viewpoint but can't be fully supported by any implementation.

With all the focus on backup tools and file managers I think we've
lost perspective.  They're an important use case, but hardly the
dominant one.

Please, as a user, if your app is creating new files, do NOT use
bytes!  You have no excuse for creating garbage, and garbage doesn't
help the user any.  Getting the encoding right, use the unicode APIs,
and don't pass the buck on to everything else.

The fact that the unicode is easier is a bonus for doing the right thing.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Toshio Kuratomi
Adam Olsen wrote:
 On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org 
 wrote:
 Unfortunately, even programmers experienced in I18N like Martin, and
 those with intuition-that-has-the-force-of-lawwink like Guido,
 express deliberate disbelief on this point.  They say that filesystem
 names and environment variable values are text, which is true from the
 semantic viewpoint but can't be fully supported by any implementation.
 
 With all the focus on backup tools and file managers I think we've
 lost perspective.  They're an important use case, but hardly the
 dominant one.
 
 Please, as a user, if your app is creating new files, do NOT use
 bytes!  You have no excuse for creating garbage, and garbage doesn't
 help the user any.  Getting the encoding right, use the unicode APIs,
 and don't pass the buck on to everything else.
 
Uhmmm That's good advice but doesn't solve any problems :-(.  No
matter what I create, the filenames will be bytes when the next person
reads them in.  If my locale is shift-js and the person I'm sharing the
file with uses utf-8 things won't work.  Even if my locale is utf-8
(since I come from a European nation) and their locale is utf-16
(because they're from an Asian nation) the Unicode API won't work.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 10:41 PM, Toshio Kuratomi a.bad...@gmail.com wrote:
 Adam Olsen wrote:
 On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org 
 wrote:
 Unfortunately, even programmers experienced in I18N like Martin, and
 those with intuition-that-has-the-force-of-lawwink like Guido,
 express deliberate disbelief on this point.  They say that filesystem
 names and environment variable values are text, which is true from the
 semantic viewpoint but can't be fully supported by any implementation.

 With all the focus on backup tools and file managers I think we've
 lost perspective.  They're an important use case, but hardly the
 dominant one.

 Please, as a user, if your app is creating new files, do NOT use
 bytes!  You have no excuse for creating garbage, and garbage doesn't
 help the user any.  Getting the encoding right, use the unicode APIs,
 and don't pass the buck on to everything else.

 Uhmmm That's good advice but doesn't solve any problems :-(.  No
 matter what I create, the filenames will be bytes when the next person
 reads them in.  If my locale is shift-js and the person I'm sharing the
 file with uses utf-8 things won't work.  Even if my locale is utf-8
 (since I come from a European nation) and their locale is utf-16
 (because they're from an Asian nation) the Unicode API won't work.

So you'll open up the dir and find this collection:

??.txt
.png
???.html
.html
???.png
??.txt
??.txt
??.txt

A half-broken setup is still a broken setup.  Eventually you have to
tell people to stop screwing around and pick one encoding.

I doubt that UTF-16 is used very much (other than on windows).  I
haven't found any statistics on what distros use, but did find this
one of the web itself:
http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html

I can't wait for next year's statistics.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Adam Olsen
On Thu, Dec 11, 2008 at 11:25 PM, Curt Hagenlocher c...@hagenlocher.org wrote:
 On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen rha...@gmail.com wrote:

 I doubt that UTF-16 is used very much (other than on windows).

 There's this other obscure platform called Java... ;)

Sorry, I should have said for interchange. :)

(CPython doesn't use UTF-8 internally either.  It uses UTF-16 or UTF-32.)


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Toshio Kuratomi
Adam Olsen wrote:

 A half-broken setup is still a broken setup.  Eventually you have to
 tell people to stop screwing around and pick one encoding.
 
But it's not a broken setup.  It's the way the world is because people
share things with each other.

 I doubt that UTF-16 is used very much (other than on windows).  I
 haven't found any statistics on what distros use, but did find this
 one of the web itself:
 http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html
 
UTF-16 is popular in Asian locales for the same reason that shift-js and
big-5 are hanging in there.  utf-8 takes many more bytes to encode Asian
Unicode characters than utf-16.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-11 Thread Toshio Kuratomi
Adam Olsen wrote:
 As a data point, firefox (when pointed at my home dir) DOES skip over
 garbage files.
 
 
That's not true.  However, it looks like Firefox is actually broken.
Take a look at this screenshot:
  firefox.png

That shows a directory with a folder that's not decodable in my utf-8
locale.  What's interesting to note is that I actually have two
nondecodable folders there but only one of them showed up.  So firefox
is inconsistent with its treatment, rendering some non-decodable files
and ignoring others.

Also interesting, if you point your browser at:
  http://toshio.fedorapeople.org/u/

You should see two other test files.  They're both
(one-half)(enyei).html but one's encoded in utf-8 and the other in
latin-1.  Firefox has some bugs in it related to this.  For instance, if
you mouseover the two links you'll see that firefox displays the same
symbolic names for each of the files (even though they're in two
different encodings).  Sometimes firefox is able to load both files and
sometimes it only loads one of them.  Firefox seems to be translating
the characters from ASCII percent encoding of bytes into their unicode
symbols and back to utf-8 in some circumstances related to whether it
has the pages in its cache or not.  In this case, it should be leaving
things as percent encoded bytes as it's the only way that apache is
going to know what to retrieve.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com