Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Wednesday 10 December 2008, Adam Olsen wrote: On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt [EMAIL PROTECTED] wrote: On Tuesday 09 December 2008, Adam Olsen wrote: The only thing separating this from a bikeshed discussion is that a bikeshed has many equally good solutions, while we have no good solutions. Instead we're trying to find the least-bad one. The unicode/bytes separation is pretty close to that. Adding a warning gets even closer. Adding magic makes it worse. Well, I see two cases: 1. Converting from an uncertain representation to a known one. 2. Converting from a known representation to a known one. Not quite: 1. Using a garbage file name locally (within a single process, not talking to any libs) 2. Using a unicode filename everywhere (libs, saved to config files, displayed to the user, etc.) I think there is some misunderstanding. I was referring to conversions and whether it is good to perform them implicitly. For that, I saw the above two cases. On linux the bytes/unicode separation is perfect for this. You decide which approach you're using and use it consistently. If you mess up (mixing bytes and unicode) you'll consistently get an error. We currently don't follow this model on windows, so a garbage file name gets passed around as if it was unicode, but fails when passed to a lib, saved to a config file, is displayed to a user, etc. I'm not sure I agree with this. Facts I know are: 1. On POSIX systems, there is no reliable encoding for filenames while the system APIs use char/byte strings. 2. On MS Windows, the encoding for filenames is Unicode/UTF-16. Returning Unicode strings from readdir() is wrong because it can't handle the case 1 above. Returning byte strings is wrong because it can't handle case 2 above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, worst case, to the locale-dependent MBCS. Returning something different depending on the system us also broken because that would make Python code that uses this function and assumes a certain type unportable. Note that this doesn't get much better if you provide a separate readdirb() API or one that simply returns a byte string or Unicode string depending on its argument. It just shifts the brokenness from readdir() to the code that uses it, unless this code makes a distinction between the target systems. Since way too many programmers are not aware of the problem, they will not handle these systems differently, so code will become non-portable. What I'd just like some feedback on is the approach to return a distinct type (neither a byte string nor a Unicode string) from readdir(). In order to use this, a programmer will have to convert it explicitly, otherwise e.g. printing it will just produce env_string at 0x01234567. This will immediately bump each programmer with their heads on the issue of unknown encodings and they will have to make the application-specific choice whether an approximation of the filename, an exception or ignoring the file is the right choice. Also, it presents the options for doing this conversion in a single class, which I personally find much better than providing overloads for hundreds of functions. Sorry for ranting, but I'm a bit confused and desperate, because either I'm unable to explain what I mean or I'm really not understanding something that everybody else here seems to agree upon. I just know that using a distinct path type has helped me in C++ in the past, and I don't see why it shouldn't in Python. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 ** Visit our website at http://www.satorlaser.de/ ** Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden. E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich. ** ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez écrit : Recover after a segfault is dangerous, but my first goal was to get the Python backtrace instead just one line: Segmentation fault. It helps a lot for debug! Exactly! That's why it doesn't belong in the Python core. We can't guarantee anything about its affects or encourage it. Would it be safe to catch SIGSEGV, output a trace, and then exit? IE, make the 'first goal' the only goal? Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to display the trace? It would be nice to -at least- use the Python stderr (which is written in pure Python for Python3). It would be better if the user can setup a callback, like sys.excepthook. But if -as many people wrote- Python is totally broken after a segfault, it is maybe not a good idea :-) I guess that sigsetjmp() and siglongjmp() hack can be avoided in Py_EvalFrameEx(), so ceval.c could be unchanged. New pseudocode: set checkpoint if error: get the backtrace display the backtrace fast exit (eg. don't call atexit, don't free memory, ...) else: normal execution -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
If we could calculate how much stack is left we'd have a much more robust way of doing recursion limits. I suppose this could be done by reading a byte from each page with a temporary SIGSEGV handler installed, but I'm not convinced you can't ask the platform directly somehow. I'd also be considered about thread-safety. It's something as hard as taking address of local variable at the beginning of the program and at any arbitrary point. Of course 'how much is left' means additional arithmetics. Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Ulrich Eckhardt wrote: On Wednesday 10 December 2008, Adam Olsen wrote: On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt [EMAIL PROTECTED] wrote: On Tuesday 09 December 2008, Adam Olsen wrote: The only thing separating this from a bikeshed discussion is that a bikeshed has many equally good solutions, while we have no good solutions. Instead we're trying to find the least-bad one. The unicode/bytes separation is pretty close to that. Adding a warning gets even closer. Adding magic makes it worse. Well, I see two cases: 1. Converting from an uncertain representation to a known one. 2. Converting from a known representation to a known one. Not quite: 1. Using a garbage file name locally (within a single process, not talking to any libs) 2. Using a unicode filename everywhere (libs, saved to config files, displayed to the user, etc.) I think there is some misunderstanding. I was referring to conversions and whether it is good to perform them implicitly. For that, I saw the above two cases. On linux the bytes/unicode separation is perfect for this. You decide which approach you're using and use it consistently. If you mess up (mixing bytes and unicode) you'll consistently get an error. We currently don't follow this model on windows, so a garbage file name gets passed around as if it was unicode, but fails when passed to a lib, saved to a config file, is displayed to a user, etc. I'm not sure I agree with this. Facts I know are: 1. On POSIX systems, there is no reliable encoding for filenames while the system APIs use char/byte strings. 2. On MS Windows, the encoding for filenames is Unicode/UTF-16. Returning Unicode strings from readdir() is wrong because it can't handle the case 1 above. Returning byte strings is wrong because it can't handle case 2 above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, worst case, to the locale-dependent MBCS. Returning something different depending on the system us also broken because that would make Python code that uses this function and assumes a certain type unportable. Note that this doesn't get much better if you provide a separate readdirb() API or one that simply returns a byte string or Unicode string depending on its argument. It just shifts the brokenness from readdir() to the code that uses it, unless this code makes a distinction between the target systems. Since way too many programmers are not aware of the problem, they will not handle these systems differently, so code will become non-portable. What I'd just like some feedback on is the approach to return a distinct type (neither a byte string nor a Unicode string) from readdir(). In order to use this, a programmer will have to convert it explicitly, otherwise e.g. printing it will just produce env_string at 0x01234567. This will immediately bump each programmer with their heads on the issue of unknown encodings and they will have to make the application-specific choice whether an approximation of the filename, an exception or ignoring the file is the right choice. Also, it presents the options for doing this conversion in a single class, which I personally find much better than providing overloads for hundreds of functions. Sorry for ranting, but I'm a bit confused and desperate, because either I'm unable to explain what I mean or I'm really not understanding something that everybody else here seems to agree upon. I just know that using a distinct path type has helped me in C++ in the past, and I don't see why it shouldn't in Python. Seems to me this just threatens to add to the confusion. If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes sense to the user. If you don't, then at least if you have the string in its bytes form you can re-present it to the filesystem to manipulate the file. What are we supposed to do with the special type? regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Merging flow
Martin v. Löwis wrote: Jeffrey Yasskin wrote: Was there ever a conclusion to this? I need to merge the patches associated with issue 4597 from trunk to all the maintenance branches, and I'd like to avoid messing anyone up if possible. If I don't hear back, I'll plan to svnmerge directly from trunk to each of the branches, and then block my merge to py3k from being merged again to release30-maint. No - you should merge from the py3k branch to the release30-maint branch. I believe that's difficult when you previously merged from the trunk to the py3k branch - the merged change to the svnmerge related properties on the root directory gets in the way when svnmerge attempts to update them on the maintenance branch. That's what started this thread, and so far nobody has come up with a workaround. It seems to me that svnmerge.py should just be able to do a svn revert on the affected properties in the maintenance branch before it attempts to modify them, but my svn-fu isn't strong enough for me to say that for sure. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
Simon Some indictation of what Python was executing when the segfault Simon occurred would help narrow now the possibilities rapidly. The Python distribution comes with a Misc/gdbinit file (you can grab it from the Subversion source tree via the web as well) that defines a pystack command. It will work with core files as well as running processes and should give you a very good idea where your Python code was executing when the segfault occurred. -- Skip Montanaro - [EMAIL PROTECTED] - http://smontanaro.dyndns.org/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
skip at pobox.com writes: The Python distribution comes with a Misc/gdbinit file (you can grab it from the Subversion source tree via the web as well) that defines a pystack command. It will work with core files as well as running processes and should give you a very good idea where your Python code was executing when the segfault occurred. Still, it would be much better if the stack trace could be printed by Python itself rather than having to resort to gdb wizardry. Especially if the problem is reported by one of your non-developer users. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
Antoine Still, it would be much better if the stack trace could be Antoine printed by Python itself rather than having to resort to gdb Antoine wizardry. Especially if the problem is reported by one of your Antoine non-developer users. I understand. The guy has a problem today for which there is a solution that I posted. If he's been meaning to look into the problem and he's posting to python-dev I presume he knows at least a little about running gdb if he's operating in a Unix environment. These two gdb commands source .gdbinit pystack shouldn't be too much of a barrier. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
skip at pobox.com writes: I understand. The guy has a problem today for which there is a solution that I posted. If he's been meaning to look into the problem and he's posting to python-dev I presume he knows at least a little about running gdb if he's operating in a Unix environment. These two gdb commands source .gdbinit pystack shouldn't be too much of a barrier. Well, but sometimes you don't have a core file (because you didn't run ulimit before launching Python and the crash wasn't expected; if the crash is very erratic, by the time you've fixed the system limits, you don't manage to reproduce it anymore, or it takes hours because it's at the end of a very long workload). Sometimes you don't have the gdbinit file around (for example, Mandriva doesn't ship it with any Python-related package). Sometimes you are under Windows. etc. :-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Thursday 11 December 2008, Steve Holden wrote: Ulrich Eckhardt wrote: What I'd just like some feedback on is the approach to return a distinct type (neither a byte string nor a Unicode string) from readdir(). In order to use this, a programmer will have to convert it explicitly, otherwise e.g. printing it will just produce env_string at 0x01234567. This will immediately bump each programmer with their heads on the issue of unknown encodings and they will have to make the application-specific choice whether an approximation of the filename, an exception or ignoring the file is the right choice. Also, it presents the options for doing this conversion in a single class, which I personally find much better than providing overloads for hundreds of functions. [...] Seems to me this just threatens to add to the confusion. If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes sense to the user. If you don't, then at least if you have the string in its bytes form you can ^^^ There are operating systems that don't use bytes to represent a file path, namely all the MS Windows variants. Even worse, when you use a byte string there, it typically means that you want to use the obsolete encoding that is based on codepages. Why can we not preserve the representation of a path as it is? Why do we _have_ to convert it to anything at all, without even knowing if this conversion is needed? I just want to do something to a file's content, why does its path have to be converted to something and then be converted back in order for the system to digest it? re-present it to the filesystem to manipulate the file. What are we supposed to do with the special type? You receive from readdir() and pass it to stat(), simple as that. No conversions from the native representation needed. If you need a textual representation, then you have to convert it and you have to do so explicitly according to whatever logic your application requires. If readdir() returned Unicode text, people would start taking that for granted. If it returned bytes, just the same. Returning a completely unrelated type will give them enough hint that for this thing they have to rethink their assumptions. This runs along the lines of In the face of ambiguity, refuse the temptation to guess., as it makes guessing rather impossible. I just don't see a case where using a separate path class would break things. Further, the special handling that is required would be made even clearer by using such a class. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 ** Visit our website at http://www.satorlaser.de/ ** Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden. E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich. ** ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
Hi Martin, On Dec 11, 2008, at 12:12 AM, Martin v. Löwis wrote: Several people already said (essentially) that: -1. I don't think such code should be added to the Python core, no matter how smart or correct it is. does your -1 apply only to attempts to resume execution after SIGSEGV, or also to the idea of dumping the stack and immediately exiting? The former strikes me as crazy talk, while the latter is genuinely useful. Cheers, -- Ivan Krstić [EMAIL PROTECTED] | http://radian.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
Le Thursday 11 December 2008 13:57:03 [EMAIL PROTECTED], vous avez écrit : Simon Some indictation of what Python was executing when the segfault Simon occurred would help narrow now the possibilities rapidly. The Python distribution comes with a Misc/gdbinit file Hum, do you really run *all* programs in gdb? Most of the time, you don't expect a crash (because you trust your softwares). You will have to try to reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!). My new proposition is to display the backtrace instead of just the message segmentation fault. It's not a problem if displaying the backtrace produces new fault because it's already better than just the message segmentation fault. Even with my SIGSEVG handler, you can still use gdb because gdb catchs the signal before the program. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Thu, 11 Dec 2008, Ulrich Eckhardt wrote: On Thursday 11 December 2008, Steve Holden wrote: Ulrich Eckhardt wrote: Seems to me this just threatens to add to the confusion. If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes sense to the user. If you don't, then at least if you have the string in its bytes form you can ^^^ There are operating systems that don't use bytes to represent a file path, namely all the MS Windows variants. Even worse, when you use a byte string there, it typically means that you want to use the obsolete encoding that is based on codepages. Why can we not preserve the representation of a path as it is? Why do we _have_ to convert it to anything at all, without even knowing if this conversion is needed? I just want to do something to a file's content, why does its path have to be converted to something and then be converted back in order for the system to digest it? re-present it to the filesystem to manipulate the file. What are we supposed to do with the special type? You receive from readdir() and pass it to stat(), simple as that. No conversions from the native representation needed. If you need a textual representation, then you have to convert it and you have to do so explicitly according to whatever logic your application requires. Not only would this address the issue with the local filesystem, it would also provide a principled way to deal with remote filesystems. For example, an FTP interface library for Python could use this type to returns paths of the sort actually supported by the raw FTP protocol. Thinking of the filesystem is actually a misconception - always referring to a filesystem opens up all sorts of possibilities. There is a lot of coding to do to allow this, but allowing programs to work with paths and files in the local filesystem, remote filesystems, and filesystems constructed from others (e.g., by expanding symlinks, changing the root similar to chroot, or encoding/unencoding pathnames) would open up lots of possibilities, including better test environments. This is an interesting case of separating byte strings from character strings. As long as the two are conflated, everything appears simple. But when they are separated, not only are there two types where before there was only one, it turns out that which type is correct in some circumstances depends on the platform. Also, many objects which are byte strings at the protocol level are usually or always meant to be character strings of some sort, but how to translate them simply cannot be nailed down once and for all. Isaac Morland CSCF Web Guru DC 2554C, x36650WWW Software Specialist ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
The Python distribution comes with a Misc/gdbinit file Victor Hum, do you really run *all* programs in gdb? Most of the time, Victor you don't expect a crash (because you trust your softwares). You Victor will have to try to reproduce the crash, but sometimes it's very Victor hard (eg. Heisenbugs!). Please folks! Get real. I was trying to help out a guy who responded to this thread saying that he gets intermittent segfaults in his PyGTK programs. I don't presume that he runs his app in gdb. If he has a core file this will work. I apologize profusely for any implication that a set of gdb commands is in any way superior to your patch. OTOH, it works today if you have a core file and are running Python at least as far back as 2.4. It doesn't require any changes to the interpreter. I use it frequently at work (a couple times a month anyway). We get notifications of all core files dropped each day. I make at least a cursory check of all core files dumped by Python. For that I use the pystack command defined in Misc/gdbinit. Victor My new proposition is to display the backtrace instead of just Victor the message segmentation fault. It's not a problem if Victor displaying the backtrace produces new fault because it's already Victor better than just the message segmentation fault. Even with my Victor SIGSEVG handler, you can still use gdb because gdb catchs the Victor signal before the program. Again, I meant no disrespect to your proposal. I was *simply trying to help the guy out*. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner [EMAIL PROTECTED] wrote: But if -as many people wrote- Python is totally broken after a segfault, it is maybe not a good idea :-) While it's true that after a segfault or unexpected longjmp, there are no guarantees whatsoever about the state of the python program, the program will often just happen to work, and there are at least some programs I've worked on that would rather take the risk in order to try to shut down gracefully. For example, an interactive app may want to give the user a chance to save her (not necessarily corrupted) work into a new file rather than unconditionally losing it. Or a webserver might want to catch the segfault, finish replying to the other requests that were in progress at the time, maybe reply to the request that caused the segfault, and then restart. Yes there's a possibility that the events around the segfault exposed some secret internal data (and they may do so even without segfaulting), but when the alternative is not replying to the users at all, this may be a risk the app wants to take. It would be nice for Python to at least expose the option so that developers (who are consenting adults, remember) can make their own decisions. It should _not_ be on by default, but something like sys.dangerous_turn_C_crashes_into_exceptions() would be useful. Jeffrey ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Merging flow
On Thu, Dec 11, 2008 at 4:18 AM, Nick Coghlan [EMAIL PROTECTED] wrote: Martin v. Löwis wrote: Jeffrey Yasskin wrote: Was there ever a conclusion to this? I need to merge the patches associated with issue 4597 from trunk to all the maintenance branches, and I'd like to avoid messing anyone up if possible. If I don't hear back, I'll plan to svnmerge directly from trunk to each of the branches, and then block my merge to py3k from being merged again to release30-maint. No - you should merge from the py3k branch to the release30-maint branch. I believe that's difficult when you previously merged from the trunk to the py3k branch - the merged change to the svnmerge related properties on the root directory gets in the way when svnmerge attempts to update them on the maintenance branch. That's what started this thread, and so far nobody has come up with a workaround. It seems to me that svnmerge.py should just be able to do a svn revert on the affected properties in the maintenance branch before it attempts to modify them, but my svn-fu isn't strong enough for me to say that for sure. Yeah, that's why I asked. I tried what Martin suggested with r67698 by just saying I'd resolved the conflict, which added the single revision I was merging from to the svnmerge-integrated property. It didn't add the two original revisions. I don't know enough about how svnmerge works to know if that's the right outcome or who it's going to cause trouble for. Jeffrey ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
On Dec 11, 2008, at 11:08 AM, Jeffrey Yasskin wrote: On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner [EMAIL PROTECTED] wrote: But if -as many people wrote- Python is totally broken after a segfault, it is maybe not a good idea :-) While it's true that after a segfault or unexpected longjmp, there are no guarantees whatsoever about the state of the python program, the program will often just happen to work, and there are at least some programs I've worked on that would rather take the risk in order to try to shut down gracefully. I ran an interactive game for years (written in C, mind you, not python), where the SIGSEGV handler simply recursively reinvoked the main loop, after disabling the command that caused a SEGV if it had caused a SEGV twice already. It almost always worked and continued running without issue. YMMV, of course. :) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
On Thu, Dec 11, 2008 at 10:08 AM, Jeffrey Yasskin jyass...@gmail.com wrote: On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner victor.stin...@haypocalc.com wrote: But if -as many people wrote- Python is totally broken after a segfault, it is maybe not a good idea :-) While it's true that after a segfault or unexpected longjmp, there are no guarantees whatsoever about the state of the python program, the program will often just happen to work, and there are at least some programs I've worked on that would rather take the risk in order to try to shut down gracefully. For example, an interactive app may want to give the user a chance to save her (not necessarily corrupted) work into a new file rather than unconditionally losing it. Or a webserver might want to catch the segfault, finish replying to the other requests that were in progress at the time, maybe reply to the request that caused the segfault, and then restart. Yes there's a possibility that the events around the segfault exposed some secret internal data (and they may do so even without segfaulting), but when the alternative is not replying to the users at all, this may be a risk the app wants to take. It would be nice for Python to at least expose the option so that developers (who are consenting adults, remember) can make their own decisions. It should _not_ be on by default, but something like sys.dangerous_turn_C_crashes_into_exceptions() would be useful. Trying to recover (or save work etc.) is incredibility unpredictable, though. It could very well end up making the situation worse! I'm -1 on putting this in the core. -- Cheers, Benjamin Peterson There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Thu, Dec 11, 2008 at 6:41 AM, Ulrich Eckhardt eckha...@satorlaser.com wrote: On Thursday 11 December 2008, Steve Holden wrote: re-present it to the filesystem to manipulate the file. What are we supposed to do with the special type? You receive from readdir() and pass it to stat(), simple as that. No conversions from the native representation needed. If you need a textual representation, then you have to convert it and you have to do so explicitly according to whatever logic your application requires. The simplest solution there is to have windows bytes APIs that return raw UTF-16 bytes (note that windows does NOT guaranteed to be valid unicode, despite being much more likely than on linux). The only real issue I see is that UTF-16 isn't an ASCII superset, so it won't print nicely. In other words, bytes can be your special type. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
On Thu, Dec 11, 2008 at 2:34 AM, Victor Stinner victor.stin...@haypocalc.com wrote: Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez écrit : Recover after a segfault is dangerous, but my first goal was to get the Python backtrace instead just one line: Segmentation fault. It helps a lot for debug! Exactly! That's why it doesn't belong in the Python core. We can't guarantee anything about its affects or encourage it. Would it be safe to catch SIGSEGV, output a trace, and then exit? IE, make the 'first goal' the only goal? Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to display the trace? It would be nice to -at least- use the Python stderr (which is written in pure Python for Python3). It would be better if the user can setup a callback, like sys.excepthook. But if -as many people wrote- Python is totally broken after a segfault, it is maybe not a good idea :-) You have to use the low-level stderr, nothing that invokes Python. I'd hate to get a second segfault while printing the first. Just think about how indirect refcounting bugs tend to be. Another example is messing up GIL handling. There's heaps of things for which we'd want good stack traces, which can't be done from Python. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
On Thu, Dec 11, 2008 at 12:15 PM, Adam Olsen rha...@gmail.com wrote: You have to use the low-level stderr, nothing that invokes Python. I'd hate to get a second segfault while printing the first. Just think about how indirect refcounting bugs tend to be. Another example is messing up GIL handling. There's heaps of things for which we'd want good stack traces, which can't be done from Python. +1 on functionality to print a stack trace on a fault -1 on translating the fault into an exception I suggest exposing some functions to control the functionality. Here are some things the user may wish to control: 1. Disable/enable the functionality altogether 2. Set the file descriptor that the stack trace should be written to 3. Set a file name that should be created and written to instead 4. Specify whether a core dump should be generated 5. Specify a program to run after the stack trace has been printed #3 combined with #5 would be very useful for automated bug reporting. For what it's worth, the functionality could be implemented under Windows using Structured Exception Handling. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Merging flow
I believe that's difficult when you previously merged from the trunk to the py3k branch - the merged change to the svnmerge related properties on the root directory gets in the way when svnmerge attempts to update them on the maintenance branch. That's what started this thread, and so far nobody has come up with a workaround. The work-around is fairly straight-forward: - inspect the conflict file (I forgot its name - something like dir-props), and verify that the only conflict is in the missing merge info from trunk to py3k - svn resolved . It seems to me that svnmerge.py should just be able to do a svn revert on the affected properties in the maintenance branch before it attempts to modify them, but my svn-fu isn't strong enough for me to say that for sure. See above. svnmerge overwrites the property after it has conflicted, so the only additional action to take is to declare that a resolution. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
On Dec 11, 2008, at 12:12 AM, Martin v. Löwis wrote: Several people already said (essentially) that: -1. I don't think such code should be added to the Python core, no matter how smart or correct it is. does your -1 apply only to attempts to resume execution after SIGSEGV, or also to the idea of dumping the stack and immediately exiting? The former strikes me as crazy talk, while the latter is genuinely useful. Only to the former. If it is actually possible to print a stack trace, that could be useful indeed. I'm then skeptical that this is possible in the general case (i.e. displaying the full C stack), but displaying (parts of) the Python stack might be possible. I think it should still proceed to dump core, so that you can then inspect the core with a proper debugger. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Merging flow
Yeah, that's why I asked. I tried what Martin suggested with r67698 by just saying I'd resolved the conflict, which added the single revision I was merging from to the svnmerge-integrated property. It didn't add the two original revisions. Can you elaborate? What are the two original revisions it didn't add? If you are referring to the trunk revisions - that's fine. As far as svnmerge is concerned, we merge revisions from the 3k branch to the 3.0 maintenance branch. The original revisions don't exist on the 3k branch (they have an empty changeset), so it's not a problem that they didn't get recorded as merged. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Merging flow
Nick Coghlan wrote: Martin v. Löwis wrote: I believe that's difficult when you previously merged from the trunk to the py3k branch - the merged change to the svnmerge related properties on the root directory gets in the way when svnmerge attempts to update them on the maintenance branch. That's what started this thread, and so far nobody has come up with a workaround. The work-around is fairly straight-forward: - inspect the conflict file (I forgot its name - something like dir-props), and verify that the only conflict is in the missing merge info from trunk to py3k - svn resolved . Ah, that's the missing piece of info - thanks :) This should probably go in the dev FAQ somewhere though. Indeed! Preferably with an example, if someone who understands it has the time. I have some changes I've been hold off of checking in until I see how someone else handles this. Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Trap SIGSEGV and SIGFPE
The Python distribution comes with a Misc/gdbinit file Hum, do you really run *all* programs in gdb? Most of the time, you don't expect a crash (because you trust your softwares). You will have to try to reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!). You don't have to run the program in gdb. You can also use the core dump that the operating system will generate, and study the crash after it happened. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Steve Holden writes: Ulrich Eckhardt writes: What I'd just like some feedback on is the approach to return a distinct type (neither a byte string nor a Unicode string) from readdir(). This is presumably unacceptable on the grounds that it will break existing code that does something more or less useful more or less some of the time.wink If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes sense to the user. Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink like Guido, express deliberate disbelief on this point. They say that filesystem names and environment variable values are text, which is true from the semantic viewpoint but can't be fully supported by any implementation. The implementation issue is why you want bytes, but I don't think it is going to overcome the tide of (semantically-oriented) pragmatism. If you don't, then at least if you have the string in its bytes form you can re-present it to the filesystem to manipulate the file. What are we supposed to do with the special type? Trivially convert it back to bytes and re-present it to the filesystem, of course. I gather that the BFDL's line on this thread of discussion is that forcing programmers to think about encodings every time they call out to the OS is unacceptable when most programs will work acceptably almost all of the time with a rather naive approach. This means that almost all Python programs will be technically broken for the forseeable future, sorry, Ulrich. And for the same pragmatic reasons, these functions are going to return strings (ie, Unicode), not bytes, I expect. Sorry, Steve. What needs to be determined here is the best way to provide reliability to those who will go to the effort of asking for it if it's available. I don't think just return bytes fits the bill for the reason above. What I would like to see is a type that is derived from string (so if you present it to an API expecting string, it is silently treated as string), but from which the original bytes can always be extracted on request. If the original bytes cannot be sensibly decoded to a string, then the string field in the object would either contain something that should normally cause an error in a string API, or some made-up string (presumably it would attempt to be a more or less faithful representation of the bytes) at the caller's option. Probably they'd also contain some metadata useful in guessing encodings (the read time locale in particular). These objects probably shouldn't support string-like operations in a general way (ie, maintaining both the string representation and the bytes correctly). Rather, using proper string operations on them would use the string content and produce strings. People who really want to handle mixed-encoding pathnames and the like would have to keep collections of these objects and handle them in an ad-hoc way. Unfortunate implementing this is way beyond my skills and time availability. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] The endless GIL debate: why not remove thread support instead?
Last month there was a discussion on Python-Dev regarding removal of reference counting to remove the GIL. I hope you forgive me for continuing the debate. I think reference counting is a good feature. It prevents huge piles of garbage from building up. It makes the interpreter run more smoothly. It is not just important for games and multimedia applications, but also servers under high load. Python does not pause to look for garbage like Java or .NET. It only pauses to look for dead reference cycles. This can be safely turned off temporarily; it can be turned off completely if you do not create reference cycles. With Java and .NET, no garbage is ever reclaimed except by the intermittent garbage collection. Python always reclaims an object when the reference count drops to zero whether the GC is enabled or not. This makes Python programs well-behaved. For this reason, I think removing reference counting is a genuinely bad idea. Even if the GIL is evil, this remedy is even worse. I am not a Python core developer; I am a research scientist who use Python because Matlab is (or used to be) a bad programming language, albeit a good computing environment. As most people who have worked with scientific computing know, there are better paradigms for concurrency than threads. In particular, there are message-passing systems like MPI and Erlang, and there are autovectorizing compilers for OpenMP and Fortran 90/95. There are special LAPACK, BLAS and FFT libraries for parallel computer architectures. There are fork-join systems like cilk and java.util.concurrent. Threads seem to be used only because mediocre programmers don't know what else to use. I genuinely think the use of threads should be discouraged. It leads to code that are full of bugs and difficult to maintain - race conditions, deadlocks, and livelocks are common pitfalls. Very few developers are capable of implementing efficient load-balancing by hand. Multi-threaded programs tend to scale badly because they are badly written. If the GIL discourages the abuse of threads, it serves a purpose albeit being evil like the Linux kernel's BKL. Python could be better off doing what tcl does. Allow each process to embed multiple interpreters; run each interpreter in its own thread. Implement a fast message-passing system between the interpreters (e.g. copy-on-write by making communicated objects immutable), and Python would be closer to Erlang than Java. I thus think the main offender is the thread and threading modules - not the GIL. Without thread support in the interpreter, there would be no threads. Without threads, there would be no need for a GIL. Both sources of evil can be removed by just removing thread support from the Python interpreter. In addition, it would make Python faster at executing linear code. Just copy the concurrency model of Erlang instead of Java and get rid of those nasty threads. In the meanwhile, I'll continue to experiment with multiprocessing. Removing reference counting to encourage the use of threads is like shooting ourselves in the leg twice. Thats my two cents on this issue. There is another issue to note as well: If you can endure a 200x loss of efficacy by using Python instead of Fortran, scalability on dual or quad-core processors may not be that important. Just move the bottlenecks out of Python and you are much better off. Regards, Sturla Molden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote: Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink like Guido, express deliberate disbelief on this point. They say that filesystem names and environment variable values are text, which is true from the semantic viewpoint but can't be fully supported by any implementation. With all the focus on backup tools and file managers I think we've lost perspective. They're an important use case, but hardly the dominant one. Please, as a user, if your app is creating new files, do NOT use bytes! You have no excuse for creating garbage, and garbage doesn't help the user any. Getting the encoding right, use the unicode APIs, and don't pass the buck on to everything else. The fact that the unicode is easier is a bonus for doing the right thing. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Adam Olsen wrote: On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote: Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink like Guido, express deliberate disbelief on this point. They say that filesystem names and environment variable values are text, which is true from the semantic viewpoint but can't be fully supported by any implementation. With all the focus on backup tools and file managers I think we've lost perspective. They're an important use case, but hardly the dominant one. Please, as a user, if your app is creating new files, do NOT use bytes! You have no excuse for creating garbage, and garbage doesn't help the user any. Getting the encoding right, use the unicode APIs, and don't pass the buck on to everything else. Uhmmm That's good advice but doesn't solve any problems :-(. No matter what I create, the filenames will be bytes when the next person reads them in. If my locale is shift-js and the person I'm sharing the file with uses utf-8 things won't work. Even if my locale is utf-8 (since I come from a European nation) and their locale is utf-16 (because they're from an Asian nation) the Unicode API won't work. -Toshio signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Thu, Dec 11, 2008 at 10:41 PM, Toshio Kuratomi a.bad...@gmail.com wrote: Adam Olsen wrote: On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull step...@xemacs.org wrote: Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-lawwink like Guido, express deliberate disbelief on this point. They say that filesystem names and environment variable values are text, which is true from the semantic viewpoint but can't be fully supported by any implementation. With all the focus on backup tools and file managers I think we've lost perspective. They're an important use case, but hardly the dominant one. Please, as a user, if your app is creating new files, do NOT use bytes! You have no excuse for creating garbage, and garbage doesn't help the user any. Getting the encoding right, use the unicode APIs, and don't pass the buck on to everything else. Uhmmm That's good advice but doesn't solve any problems :-(. No matter what I create, the filenames will be bytes when the next person reads them in. If my locale is shift-js and the person I'm sharing the file with uses utf-8 things won't work. Even if my locale is utf-8 (since I come from a European nation) and their locale is utf-16 (because they're from an Asian nation) the Unicode API won't work. So you'll open up the dir and find this collection: ??.txt .png ???.html .html ???.png ??.txt ??.txt ??.txt A half-broken setup is still a broken setup. Eventually you have to tell people to stop screwing around and pick one encoding. I doubt that UTF-16 is used very much (other than on windows). I haven't found any statistics on what distros use, but did find this one of the web itself: http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html I can't wait for next year's statistics. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
On Thu, Dec 11, 2008 at 11:25 PM, Curt Hagenlocher c...@hagenlocher.org wrote: On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen rha...@gmail.com wrote: I doubt that UTF-16 is used very much (other than on windows). There's this other obscure platform called Java... ;) Sorry, I should have said for interchange. :) (CPython doesn't use UTF-8 internally either. It uses UTF-16 or UTF-32.) -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Adam Olsen wrote: A half-broken setup is still a broken setup. Eventually you have to tell people to stop screwing around and pick one encoding. But it's not a broken setup. It's the way the world is because people share things with each other. I doubt that UTF-16 is used very much (other than on windows). I haven't found any statistics on what distros use, but did find this one of the web itself: http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html UTF-16 is popular in Asian locales for the same reason that shift-js and big-5 are hanging in there. utf-8 takes many more bytes to encode Asian Unicode characters than utf-16. -Toshio signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-3.0, unicode, and os.environ
Adam Olsen wrote: As a data point, firefox (when pointed at my home dir) DOES skip over garbage files. That's not true. However, it looks like Firefox is actually broken. Take a look at this screenshot: firefox.png That shows a directory with a folder that's not decodable in my utf-8 locale. What's interesting to note is that I actually have two nondecodable folders there but only one of them showed up. So firefox is inconsistent with its treatment, rendering some non-decodable files and ignoring others. Also interesting, if you point your browser at: http://toshio.fedorapeople.org/u/ You should see two other test files. They're both (one-half)(enyei).html but one's encoded in utf-8 and the other in latin-1. Firefox has some bugs in it related to this. For instance, if you mouseover the two links you'll see that firefox displays the same symbolic names for each of the files (even though they're in two different encodings). Sometimes firefox is able to load both files and sometimes it only loads one of them. Firefox seems to be translating the characters from ASCII percent encoding of bytes into their unicode symbols and back to utf-8 in some circumstances related to whether it has the pages in its cache or not. In this case, it should be leaving things as percent encoded bytes as it's the only way that apache is going to know what to retrieve. -Toshio signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com