Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. Löwis" wrote:
> For Python 3, one proposed solution is to provide two sets of APIs: a
> byte-oriented one, and a character-oriented one, where the
> character-oriented one would be limited to not being able to represent
> all data accurately. Unfortunately, for Windows, the situation would
> be exactly the opposite: the byte-oriented interface cannot represent
> all data; only the character-oriented API can. As a consequence,
> libraries and applications that want to support all user data in a
> cross-platform manner have to accept mish-mash of bytes and characters
> exactly in the way that caused endless troubles for Python 2.x.
Is the second part of this actually true? My understanding may be
flawed, but surely all Unicode data can be converted to and from bytes
using UTF-8? Obviously not all byte sequences are valid UTF-8, but
this doesn't prevent one from creating an arbitrary Unicode string
using "utf-8 bytes".decode("utf-8"). Given this, can't people who
must have access to all files / environment data just use the bytes
interface?
Disclosure: My gut reaction is that the solution described in the PEP
is a hack, but I'm hardly a character encoding expert. My feeling is
that the correct solution is to either standardise on the bytes
interface as the lowest common denominator, or to add a Path type (and
I guess an EnvironmentalData type) and use the new type to attempt to
hide the differences.
Schiavo
Simon
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman wrote: > The goal of Unicode users everywhere is to use Unicode for everything, no? > After all, all "real" file should have Unicode based names, and the only > proper byte sequences that should exist are UTF-8 encoding Unicode bytes. > (Cheek to tongue: Get out of here!) Humour aside :), the expectation that filenames are Unicode data simply doesn't agree with the reality of POSIX file systems. I think an approach similar to that adopted by glib [1] could work -- i.e. use the bytes API and provide some tools to assist application developers in converting them to and from Unicode strings (these tools are then where all the guess work about what encoding to use can live). [1] http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html Schiavo Simon ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Fri, Apr 24, 2009 at 11:22 AM, Glenn Linderman wrote: > Oh clearly it is a hack. The right solution of a Path type (and friends) > was discarded in earlier discussion, because it would impact too much > existing code. The use of bytes would be annoying in the context of py3, > where things that you want to display are in str (Unicode). So there is no > solution that allows the use of str, and the robustness of bytes, and is > 100% compatible with existing practice. Hence the desire is to find a hack > that is "good enough". At least, that is my understanding and synopsis. What about keeping the bytes interface (utf-8 encoded Unicode on Windows) and adding a Path type (and friends) interface that mirrors it? > (Sorry Simon, but it is still the same thread, anyway.) Python discussions do seem to womble through a rather large set of mailing lists and news groups. :) Schiavo Simon ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
2009/4/24 Simon Cross : > On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman wrote: >> The goal of Unicode users everywhere is to use Unicode for everything, no? >> After all, all "real" file should have Unicode based names, and the only >> proper byte sequences that should exist are UTF-8 encoding Unicode bytes. >> (Cheek to tongue: Get out of here!) > > Humour aside :), the expectation that filenames are Unicode data > simply doesn't agree with the reality of POSIX file systems. However, it *does* agree with the reality of Windows file systems. The fundamental problem here is that there is a strong OS disparity - for Windows, the OS uses Unicode, for POSIX, the OS uses bytes. Traditionally, Python has been happy to expose OS differences, and let application code address platform portability issues. But this is such a fundamental area, that doing so is problematic - it could easily result in *more* code being OS-specific (in subtle, only-affects-non-Latin-alphabet-using-users manners) rather than less. That is why it makes sense to have *some* means of normalising things in a way that does the best it can. The raw bytes interfaces should be available for POSIX users writing low-level code that *must* handle all possible nightmare scenarios[1], but Martin's proposal is designed to handle "the majority of cases" in a platform-independent way. To that end, a string-based interface makes sense, as frankly that's how "normal" users think of filenames. The rest of Martin's proposal seems to follow the same sort of practical approach. Paul. [1] Maybe there's a need for a Unicode interface on Windows that doesn't do *any* encoding, even in the face of garbled Unicode - I don't know low-level details well enough to be sure here. But the same principle applies, that "get the raw data, regardless" is a low-level OS-specific operation, and should not be the one used in day-to-day programming. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On approximately 4/24/2009 12:59 AM, came the following characters from
the keyboard of Simon Cross:
On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. Löwis" wrote:
For Python 3, one proposed solution is to provide two sets of APIs: a
byte-oriented one, and a character-oriented one, where the
character-oriented one would be limited to not being able to represent
all data accurately. Unfortunately, for Windows, the situation would
be exactly the opposite: the byte-oriented interface cannot represent
all data; only the character-oriented API can. As a consequence,
libraries and applications that want to support all user data in a
cross-platform manner have to accept mish-mash of bytes and characters
exactly in the way that caused endless troubles for Python 2.x.
Is the second part of this actually true? My understanding may be
flawed, but surely all Unicode data can be converted to and from bytes
using UTF-8? Obviously not all byte sequences are valid UTF-8, but
this doesn't prevent one from creating an arbitrary Unicode string
using "utf-8 bytes".decode("utf-8"). Given this, can't people who
must have access to all files / environment data just use the bytes
interface?
Disclosure: My gut reaction is that the solution described in the PEP
is a hack, but I'm hardly a character encoding expert. My feeling is
that the correct solution is to either standardise on the bytes
interface as the lowest common denominator, or to add a Path type (and
I guess an EnvironmentalData type) and use the new type to attempt to
hide the differences.
Oh clearly it is a hack. The right solution of a Path type (and
friends) was discarded in earlier discussion, because it would impact
too much existing code. The use of bytes would be annoying in the
context of py3, where things that you want to display are in str
(Unicode). So there is no solution that allows the use of str, and the
robustness of bytes, and is 100% compatible with existing practice.
Hence the desire is to find a hack that is "good enough". At least,
that is my understanding and synopsis.
I never saw MvL's original message with the PEP delivered to my mailbox,
but some of the replies came there, so I found and extensively replied
to it using the Google group / usenet. My reply never showed up here
and no one has commented on it either... Should I repost via the mailing
list? I think so... I'll just paste it in here, with one tweak I
noticed after I sent it fixed... (Sorry Simon, but it is still the same
thread, anyway.) (Sorry to others, if my original reply was seen, and
just wasn't worth replying to.)
On Apr 21, 11:50 pm, "Martin v. Löwis" wrote:
> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.
Basically the scheme doesn't work. Aside from that, it is very close.
There are tons of encoding schemes that could work... they don't have
to include half-surrogates or bytes. What they have to do, is make
sure that they are uniformly applied to all appropriate strings.
The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or was funny-
decoded from a bytes API... and thus, there is no means of reliably
ascertaining whether a particular filename str should be passed to a
str API, or funny-encoded back to bytes.
The assumption in the 2nd Discussion paragraph may hold for a large
percentage of cases, maybe even including some number of 9s, but it is
not guaranteed, and cannot be enforced, therefore there are cases that
could fail. Whether those failure cases are a concern or not is an
open question. Picking a character (I don't find U+F01xx in the
Unicode standard, so I don't know what it is) that is obscure, and
unlikely to be used in "real" file names, might help the heuristic
nature of the encoding and decoding avoid most conflicts, but provides
no guarantee that data puns will not occur in practice. Today's
obscure character is tomorrows commonly used character, perhaps.
Someone not on this list may be happily using that character for their
own nefarious, incompatible purpose.
As I realized in the email-sig, in talking about decoding corrupted
headers, there is only one way to guarantee this... to encode _all_
character sequences, from _all_ interfaces. Basically it requires
reserving an escape character (I'll use ? in these examples -- yes, an
ASCII question mark -- happens to be illegal in Windows filenames so
all the better on that platform, but the specific character doesn't
matter... avoiding / \ and . is probably good, though).
So the rules would be, when obtaining a file name from the bytes OS
interface, that doesn't properly decode according to UTF-8, decode it
by placing a ? at the beginning, then for each decodable UTF-8
sequence, add a Unicode character -- unless the character is ?, in
which case you add two ??, and for each non-decodable byte sequenc
[Python-Dev] version for blender Vista
Can you tell me which installer of Python I need to work with Blender and Windows Vista Home Premium? Thanks! Yuma Scott___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] version for blender Vista
From: http://mail.python.org/mailman/listinfo/python-dev About Python-Dev ***Do not post general Python questions to this list. For help with Python please see the Python help page.*** On this list the key Python developers discuss the future of the language and its implementation. Topics include Python design issues, release mechanics, and maintenance of existing releases. On Fri, Apr 24, 2009 at 7:04 PM, Yuma Scott wrote: > > Can you tell me which installer of Python I need to work with > Blender and Windows Vista Home Premium? > Thanks! > Yuma Scott > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/orsenthil%40gmail.com > > -- -- Senthil ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 24, 2009, at 8:00 AM, Paul Moore wrote: However, it *does* agree with the reality of Windows file systems. The fundamental problem here is that there is a strong OS disparity - for Windows, the OS uses Unicode, for POSIX, the OS uses bytes. It's unfortunately the case that this isn't *precisely* true. Windows uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit sequences. Neither one is required by the operating system to be a proper unicode encoding. The main difference is that there is already a widely accepted way to decode a improperly-encoded 16-bit-sequence with the utf-16 codec: simply leave the lone surrogate pairs in place. James ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Fri, Apr 24, 2009, Paul Moore wrote: > 2009/4/24 Simon Cross : >> >> Humour aside :), the expectation that filenames are Unicode data >> simply doesn't agree with the reality of POSIX file systems. > > However, it *does* agree with the reality of Windows file systems. The > fundamental problem here is that there is a strong OS disparity - for > Windows, the OS uses Unicode, for POSIX, the OS uses bytes. > Traditionally, Python has been happy to expose OS differences, and let > application code address platform portability issues. But this is such > a fundamental area, that doing so is problematic - it could easily > result in *more* code being OS-specific (in subtle, > only-affects-non-Latin-alphabet-using-users manners) rather than less. The part that I haven't seen clearly addressed so far is what happens when disks get mounted across OSes (e.g. NFS). While I agree that there should be a layer on top that can handle "most" situations, it also seems clear that the raw layer needs to be readily accessible. -- Aahz ([email protected]) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Aahz pythoncraft.com> writes: > > The part that I haven't seen clearly addressed so far is what happens > when disks get mounted across OSes (e.g. NFS). Unless there's some kind of native NFS API for file access, it is hopelessly out of scope for Python. We use whatever the C library exports to us, and don't have any control over filesystem details. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Dates in python-dev
Hi, I've recently subscribed to this list and received my first "Summary of Python tracker Issues". What I find annoying are the dates, for example: ACTIVITY SUMMARY (04/17/09 - 04/24/09) 3 x double-digits (have we learned nothing from Y2K? :-)) with the _middle_ ones changing fastest! I know it's the US standard, but Python is global. Could we have an 'international' style instead, say, year-month-day: ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) Thank you for your attention, etc. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
2009/4/24 Antoine Pitrou : > Aahz pythoncraft.com> writes: >> >> The part that I haven't seen clearly addressed so far is what happens >> when disks get mounted across OSes (e.g. NFS). > > Unless there's some kind of native NFS API for file access, it is hopelessly > out > of scope for Python. We use whatever the C library exports to us, and don't > have > any control over filesystem details. For "raw" level stuff (bytes on Unix, Unicode-nearly (:-)) on Windows) that's right. Resist the temptation to guess and all that. For the level Martin is (as far as I can tell) aiming at [1], we need some defined rules on how to behave (relatively) sanely. Windows is fairly easy - "nearly-Unicode" to Unicode isn't too bad. But on Unix, you're dealing with bytes-to-Unicode in the absence of a clearly stated encoding - which is a known can of worms... In my view: The pros for Martin's proposal are a uniform cross-platform interface, and a user-friendly API for the common case. The cons are subtle and complex corner cases, and lack of agreement on the validity of the proposed encoding in those cases. The fact that the bytes APIs won't go away probably mitigates the cons to a large extent (again, in my view...) Paul. [1] Actually, all the PEP says is "With this PEP, a uniform treatment of these data as characters becomes possible." An argument as to why this is a good thing would be a useful addition to the PEP. At the moment it's more or less treated as self-evident - which I agree with, but which clearly the Unix people here are not as certain of. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (04/17/09 - 04/24/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2227 open (+32) / 15427 closed (+17) / 17654 total (+49) Open issues with patches: 865 Average duration of open issues: 641 days. Median duration of open issues: 395 days. Open Issues Breakdown open 2175 (+31) pending52 ( +1) Issues Created Or Reopened (51) ___ Builtin round function is sometimes inaccurate for floats04/18/09 CLOSED http://bugs.python.org/issue1869reopened marketdickinson patch logging to file + encoding 04/20/09 CLOSED http://bugs.python.org/issue5170reopened shamilbi IDLE cannot find windows chm file04/17/09 http://bugs.python.org/issue5783created rhettinger patch raw deflate format and zlib module 04/17/09 http://bugs.python.org/issue5784created phr Condition.wait() does not respect its timeout04/18/09 CLOSED http://bugs.python.org/issue5785created Kjir len(reversed([1,2,3])) does not work anymore in 2.6.204/19/09 http://bugs.python.org/issue5786reopened rhettinger object.__getattribute__(super, '__bases__') crashes the interpre 04/19/09 CLOSED http://bugs.python.org/issue5787reopened alexer datetime.timedelta is inconvenient to use... 04/18/09 http://bugs.python.org/issue5788created bquinlan patch powerset recipe listed twice in itertools docs 04/19/09 CLOSED http://bugs.python.org/issue5789created stevenjd easy itertools.izip python code has a typo04/19/09 CLOSED http://bugs.python.org/issue5790created stevenjd title information of unicodedata is wrong in some cases 04/19/09 CLOSED http://bugs.python.org/issue5791created cfbolz Enable short float repr() on Solaris/x86 04/19/09 http://bugs.python.org/issue5792created marketdickinson easy Rationalize isdigit / isalpha / tolower / ... uses throughout Py 04/19/09 http://bugs.python.org/issue5793created marketdickinson easy pickle/cPickle of recursive tuples create pickles that cPickle c 04/19/09 http://bugs.python.org/issue5794created cwitty test_distutils failure on the ppc Debian buildbot04/19/09 CLOSED http://bugs.python.org/issue5795created pitrou test_posix, test_pty crash under Windows 04/19/09 CLOSED http://bugs.python.org/issue5796created pitrou patch there is en exception om Create User page04/20/09 http://bugs.python.org/issue5797created nabeel test_asynchat fails on Mac OSX 04/20/09 http://bugs.python.org/issue5798created cartman Change ntpath functions to implicitly support UNC paths 04/20/09 http://bugs.python.org/issue5799created larry patch
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
James Y Knight writes: > It's unfortunately the case that this isn't *precisely* true. Windows > uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit > sequences. Including U+FFFE and U+ "not a character nowhere nohow"? Just when I was thinking Microsoft would actually nail one ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dates in python-dev
2009-04-24 18:29:29 MRAB napisał(a): > Hi, > > I've recently subscribed to this list and received my first "Summary of > Python tracker Issues". What I find annoying are the dates, for example: > > ACTIVITY SUMMARY (04/17/09 - 04/24/09) > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the > _middle_ ones changing fastest! > > I know it's the US standard, but Python is global. Could we have an > 'international' style instead, say, year-month-day: > > ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) +1. ISO 8601 should be mandatory. -- Arfrever Frehtes Taifersar Arahesis signature.asc Description: This is a digitally signed message part. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dates in python-dev
On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote: > I've recently subscribed to this list and received my first "Summary of > Python tracker Issues". What I find annoying are the dates, for example: > > ACTIVITY SUMMARY (04/17/09 - 04/24/09) > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the > _middle_ ones changing fastest! > > I know it's the US standard, but Python is global. Could we have an > 'international' style instead, say, year-month-day: > > ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) +1000 from me! Oleg. -- Oleg Broytmannhttp://phd.pp.ru/[email protected] Programmers don't die, they just GOSUB without RETURN. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Paul Moore writes: > The pros for Martin's proposal are a uniform cross-platform interface, > and a user-friendly API for the common case. A more accurate phrasing would be "... a user-friendly API for those who feel very lucky today." Which is the common case, of course, but spins a little differently. > [1] Actually, all the PEP says is "With this PEP, a uniform > treatment of these data as characters becomes possible." An > argument as to why this is a good thing would be a useful addition > to the PEP. At the moment it's more or less treated as self-evident > - which I agree with, but which clearly the Unix people here are > not as certain of. Well, the problem is that both parts are false. If you didn't start with a valid string in a known encoding, you shouldn't treat it as characters because it's not. Hand it to a careful API, and you'll get an Exception raised in your face. And that's precisely why it's not obviously a good thing. Careful clients will have to treat it as "transcoded bytes", and so the people who develop those clients get no benefit. OTOH, at least some of those who feel lucky and use it naively are going to turn out to be wrong. That said, I'm +0 on the PEP as is. It's a little bit better than the current situation in that developers who would otherwise just punt on dealing with the other world (ie, Windows for Unix hackers, and Unix for Windows coders) will have a unified interface so it'll maybe work automagically (when you're luck :-) in that other world, too. And if somebody comes up with an idea of true genius for handling the underlying problem, or even just a slight practical improvement, then everybody who uses this API can benefit simply by upgrading Python. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dates in python-dev
Followups directed to Tracker-Discuss, where the people who can do something about it are hanging out. (They're here too, but I'm pretty sure they'd rather discuss this issue on that list.) Arfrever Frehtes Taifersar Arahesis writes: > 2009-04-24 18:29:29 MRAB napisał(a): > > Hi, > > > > I've recently subscribed to this list and received my first "Summary of > > Python tracker Issues". What I find annoying are the dates, for example: > > > > ACTIVITY SUMMARY (04/17/09 - 04/24/09) > > > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the > > _middle_ ones changing fastest! > > > > I know it's the US standard, but Python is global. Could we have an > > 'international' style instead, say, year-month-day: > > > > ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) > > +1. > ISO 8601 should be mandatory. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Stephen J. Turnbull xemacs.org> writes: > > Well, the problem is that both parts are false. If you didn't start > with a valid string in a known encoding, you shouldn't treat it as > characters because it's not. Hand it to a careful API, and you'll get > an Exception raised in your face. Which "careful API" are you talking about? > OTOH, at least some of those who feel lucky and use it > naively are going to turn out to be wrong. Why will they turn out to be wrong? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PyEval_Call* convenience functions
Is there a reason that the PyEval_CallFunction() and PyEval_CallMethod() convenience functions remain undocumented? (i.e., would a doc-and-test patch to correct this be rejected?) I didn't see any mention of this coming up in python-dev before. Also, despite its name, PyEval_CallMethod() is quite useful for calling module-level functions or classes (given that it's just a PyObject_GetAttrString plus the implementation of PyEval_CallFunction). Is there any reason (beyond its undocumented status) to believe this use case would ever be deprecated? Thanks. -- Tim Lesher ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Tracker-discuss] Dates in python-dev
http://psf.upfronthosting.co.za/roundup/meta/issue274 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Tuples and underorderable types
Does anyone have any ideas about what to do with issue 5830 and handling the problem in a general way (not just for sched)? The basic problem is that decorate/compare/undecorate patterns no longer work when the primary sort keys are equal and the secondary keys are unorderable (which is now the case for many callables). >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)] >>> tasks.sort() Traceback (most recent call last): ... TypeError: unorderable types: function() < function() Would it make sense to provide a default ordering whenever the types are the same? def object.__lt__(self, other): if type(self) == type(other): return id(self) < id(other) raise TypeError Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
Raymond Hettinger rcn.com> writes: > > Would it make sense to provide a default ordering whenever the types are > the same? This doesn't work when they are not the same :-) Instead, you could make the decorating a bit more sophisticated: decorated = [(key, id(value), value) for key, value in blah(values)] or even: decorated = [(key, n, value) for n, key, value in enumerate(blah(values))] ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
Would it make sense to provide a default ordering whenever the types are the same? This doesn't work when they are not the same :-) _ ~ @ @ \_/ Instead, you could make the decorating a bit more sophisticated: decorated = [(key, id(value), value) for key, value in blah(values)] or even: decorated = [(key, n, value) for n, key, value in enumerate(blah(values))] I already do something along those lines in heapq.nsmallest() and nlargest() to preserve sort stability. The real issue isn't how to fix one particular module. The problem is that a basic python pattern is now broken in a way that may not readily surface during testing. I'm wondering if there is something we can do to mitigate the issue in a general way. It bites that the venerable technique of tuple sorting has lost some of its mojo. This may be an unintended consequence of eliminating default comparisons. Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
Raymond Hettinger wrote: > Would it make sense to provide a default ordering whenever the types are > the same? > >def object.__lt__(self, other): >if type(self) == type(other): > return id(self) < id(other) >raise TypeError No. This only makes it more difficult for someone wanting to behave smartly with incomparable types. I can easily imagine someone wanting incomparable objects to be treated as equal wrt. sorting. I am thinking especially with respect to keeping the sort stable. I think many developers would be surprised to find, >>> a = >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)] >>> tasks.sort() >>> assert tasks[0][1]() == 0 , is not guaranteed. Moreover, I fail to see your point in general as a bug if you accept that there is not all objects can be total ordered. We shouldn't be patching the object base class because of legacy code that relied on sorting tuples; this code should be updated to either use a key function. -Scott -- Scott Dial [email protected] [email protected] ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Antoine Pitrou writes: > Stephen J. Turnbull xemacs.org> writes: > > > > Well, the problem is that both parts are false. If you didn't start > > with a valid string in a known encoding, you shouldn't treat it as > > characters because it's not. Hand it to a careful API, and you'll get > > an Exception raised in your face. > > Which "careful API" are you talking about? > > > OTOH, at least some of those who feel lucky and use it > > naively are going to turn out to be wrong. > > Why will they turn out to be wrong? To quote the PEP: """ While providing a uniform API to non-decodable bytes, this interface has the limitation that chosen representation only "works" if the data get converted back to bytes with the python-escape error handler also. Encoding the data with the locale's encoding and the (default) strict error handler will raise an exception, encoding them with UTF-8 will produce non-sensical data. For most applications, we assume that they eventually pass data received from a system interface back into the same system interfaces. """ But you can't know that. These are now "just strings", which could end up in pickles and other persistent objects, be passed across network interfaces (remote copy, for example), etc, etc, and there is no way to guarantee that the recipient will understand the rules, unless the application encapsulates them in some kind of representation that says "I look like a Unicode but I'm really just encoded bytes." But the whole point is to turn them into plain old strings so people *don't have to bother* keeping track. As I already said, this is no worse than the current situation, but it gives the impression that Python has a standard "solution". (Yes, I know Martin doesn't claim it's a solution to any of those problems. The point is user perception.) I have to wonder whether having a standard way of not solving any problems is better than having no standard way of not solving any problems. It may be, and it probably can't hurt, which is why I'm +0. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
> I'm wondering if there is something we can do to mitigate > the issue in a general way. It bites that the venerable technique > of tuple sorting has lost some of its mojo. This may be > an unintended consequence of eliminating default comparisons. I would discourage use of the decorate/sort/undecorate pattern, and encourage use of the key= argument. Or, if you really need to decorate into a tuple, still pass a key= argument. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
Raymond Hettinger wrote:
> The problem is that a basic python pattern is now broken
> in a way that may not readily surface during testing.
>
> I'm wondering if there is something we can do to mitigate
> the issue in a general way. It bites that the venerable technique
> of tuple sorting has lost some of its mojo. This may be
> an unintended consequence of eliminating default comparisons.
There could be a high performance, non-lame version of the mapping
pattern below available in the stdlib (or at least in the docs):
keymap = {type(lambda: 1) : id}
def decorate_helper(tup):
return tuple(keymap[type(i)](i) if type(i) in keymap else i for i in tup)
tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
tasks.sort(key=decorate_helper)
This works when comparing different types too, but then some care must
be taken to avoid bad surprises:
keymap[type(1j)] = abs
imaginary_tasks = [(10j, lambda: 0), (20, lambda: 1), (10+1j, lambda: 2)]
imaginary_tasks.sort(key=decorate_helper) # not so bad if intended
mixed_tasks = [(lambda: 0,), (0.0,), (2**32,)]
mixed_tasks.sort(key=decorate_helper) # oops, not the same order as in 2.x
Regards,
Daniel
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyEval_Call* convenience functions
Tim Lesher schrieb: > Is there a reason that the PyEval_CallFunction() and > PyEval_CallMethod() convenience functions remain undocumented? (i.e., > would a doc-and-test patch to correct this be rejected?) > > I didn't see any mention of this coming up in python-dev before. > > Also, despite its name, PyEval_CallMethod() is quite useful for > calling module-level functions or classes (given that it's just a > PyObject_GetAttrString plus the implementation of > PyEval_CallFunction). Is there any reason (beyond its undocumented > status) to believe this use case would ever be deprecated? FWIW, there's also PyObject_CallMethod(); all PyObject_Call* variants are documented, but none of the PyEval_Call* functions are. I actually don't know why we have two sets of these, with partially conflicting definitions; perhaps someone else can shed some light? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
On Fri, Apr 24, 2009, Raymond Hettinger wrote: > > I'm wondering if there is something we can do to mitigate the issue in > a general way. It bites that the venerable technique of tuple sorting > has lost some of its mojo. This may be an unintended consequence of > eliminating default comparisons. My understanding was that this was entirely an *intended* consequence of eliminating default comparisons. Not so much in the sense that it was desired by itself, but that the whole discussion of whether to keep moving forward in stripping out default comparisons explicitly revolved around whether this kind of difficulty warranted the overall simplification we now have (I don't remember off-hand whether this specific case was discussed, though). I think that anyone who wants to suggest reverting to some kind of default comparison behavior needs to write up a PEP and clearly summarize all previous discussion prior to 3.0 release, then go through the usual grind of starting with python-ideas before coming back to python-dev. -- Aahz ([email protected]) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
2009/4/22 "Martin v. Löwis" : > To convert non-decodable bytes, a new error handler "python-escape" is > introduced, which decodes non-decodable bytes using into a private-use > character U+F01xx, which is believed to not conflict with private-use > characters that currently exist in Python codecs. Why not use U+DCxx for non-UTF-8 encodings too? Overall I like the PEP: I think it's the best proposal so far that doesn't put an heavy burden on applications that only want to do simple things with the API. -- Lino Mastrodomenico ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On approximately 4/24/2009 11:40 AM, came the following characters from the keyboard of Stephen J. Turnbull: Antoine Pitrou writes: > Stephen J. Turnbull xemacs.org> writes: > > > > Well, the problem is that both parts are false. If you didn't start > > with a valid string in a known encoding, you shouldn't treat it as > > characters because it's not. Hand it to a careful API, and you'll get > > an Exception raised in your face. > > Which "careful API" are you talking about? > > > OTOH, at least some of those who feel lucky and use it > > naively are going to turn out to be wrong. > > Why will they turn out to be wrong? Because the encoding is not reliably reversible. That is why I proposed one that is. To quote the PEP: """ While providing a uniform API to non-decodable bytes, this interface has the limitation that chosen representation only "works" if the data get converted back to bytes with the python-escape error handler also. Encoding the data with the locale's encoding and the (default) strict error handler will raise an exception, encoding them with UTF-8 will produce non-sensical data. For most applications, we assume that they eventually pass data received from a system interface back into the same system interfaces. """ And so my encoding (1) doesn't alter the data stream for any valid Windows file name, and where the naivest of users reside (2) doesn't alter the data stream for any Posix file name that was encoded as UTF-8 sequences and doesn't contain ? characters in the file name [I perceive the use of ? in file names to be rare on Posix, because of experience, and because of the other problems caused by such use] (3) doesn't introduce data puns within applications that are correctly coded to know the encoding occurs. The encoding technique in the PEP not only can produce data puns, thus not being reversible, it provides no reliable mechanism to know that this has occurred. But you can't know that. These are now "just strings", which could end up in pickles and other persistent objects, be passed across network interfaces (remote copy, for example), etc, etc, and there is no way to guarantee that the recipient will understand the rules, unless the application encapsulates them in some kind of representation that says "I look like a Unicode but I'm really just encoded bytes." This could happen. Well-formed programs need to use the encoding at the boundaries. Python could encapsulate its interfaces to the file system, but cannot encapsulate other interfaces. Fortunately, something that is pickled, would probably be unpicked by Python, and therefore all would be well. But any interface that expects a file name, and is not encapsulated by Python, must be encapsulated by the application. But the whole point is to turn them into plain old strings so people *don't have to bother* keeping track. And if that is the point, it isn't worth doing. If the point is that it can minimize the amount of existing, file name manipulation code that uses string manipulations, that must be reworked to be functional during a 2to3 migration, then it can be worth doing. But I think it should be done with an encoding that doesn't introduce undetectable data puns, whether mine or some different encoding with that characteristic, but not the one presently in the PEP, because it does introduce undetectable data puns. As I already said, this is no worse than the current situation, but it gives the impression that Python has a standard "solution". (Yes, I know Martin doesn't claim it's a solution to any of those problems. The point is user perception.) I have to wonder whether having a standard way of not solving any problems is better than having no standard way of not solving any problems. It may be, and it probably can't hurt, which is why I'm +0. Interesting phraseology there, Stephen! I'm +1 on the concept, -1 on the PEP, due solely to the lack of a reversible encoding. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dates in python-dev
On approximately 4/24/2009 10:06 AM, came the following characters from the keyboard of Oleg Broytmann: On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote: I've recently subscribed to this list and received my first "Summary of Python tracker Issues". What I find annoying are the dates, for example: ACTIVITY SUMMARY (04/17/09 - 04/24/09) 3 x double-digits (have we learned nothing from Y2K? :-)) with the _middle_ ones changing fastest! I know it's the US standard, but Python is global. Could we have an 'international' style instead, say, year-month-day: ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) +1000 from me! Oleg. You missed a prime opportunity, Oleg... +2000 from me! -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
> Why not use U+DCxx for non-UTF-8 encodings too? I thought of that, and was tricked into believing that only U+DC8x is a half surrogate. Now I see that you are right, and have fixed the PEP accordingly. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of Python tracker Issues
Python tracker wrote: [snip] In going through this, I notice a lot of effort by Mark Dickenson and others to get some details of numbers computation and display right in time for 3.1. As a certain-to-be beneficiary, I want to thank all who contributed. Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of Python tracker Issues
On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy wrote: > In going through this, I notice a lot of effort by Mark Dickenson and others Many others, but Eric Smith's name needs to be in big lights here. There's no way the short float repr would have been ready for 3.1 if Eric hadn't shown an interest in this at PyCon, and then taken on the major internal replumbing job this entailed for all of Python's string formatting. > 3.1. As a certain-to-be beneficiary, I want to thank all who contributed. Glad you like it! Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of Python tracker Issues
Mark Dickinson wrote: On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy wrote: In going through this, I notice a lot of effort by Mark Dickenson and others Many others, but Eric Smith's name needs to be in big lights here. There's no way the short float repr would have been ready for 3.1 if Eric hadn't shown an interest in this at PyCon, and then taken on the major internal replumbing job this entailed for all of Python's string formatting. Not to get too much into a mutual admiration mode, but Mark did the parts involving hard thinking. 3.1. As a certain-to-be beneficiary, I want to thank all who contributed. Glad you like it! Me, too. I think it's going to be great once we get it all straightened out. And I think we're close! Eric. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecating PyOS_ascii_formatd
Eric Smith wrote: Assuming that Mark's and my changes in the py3k-short-float-repr branch get checked in shortly, I'd like to deprecate PyOS_ascii_formatd. Its functionality is largely being replaced by PyOS_double_to_string, which we're introducing on our branch. We've checked the changes in, and everything looks good as far as I can tell. My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in 3.2. Having heard no dissent, I'd like to go ahead and deprecate this API. What are the mechanics of deprecating this? Just documentation, or is there something I should do in the code to generate a warning? Any pointers to examples would be great. The 2.7 situation is tricker, because we're not planning on backporting the short-float-repr work back to 2.7. In 2.7 I guess we'll leave PyOS_ascii_formatd around, unfortunately. I backported the new API to 2.7, so I'll also deprecate PyOS_ascii_formatd there. Eric. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecating PyOS_ascii_formatd
2009/4/24 Eric Smith : >> My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in >> 3.2. > > Having heard no dissent, I'd like to go ahead and deprecate this API. What > are the mechanics of deprecating this? Just documentation, or is there > something I should do in the code to generate a warning? Any pointers to > examples would be great. You can use PyErr_WarnEx(). -- Regards, Benjamin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Glenn Linderman wrote: > On approximately 4/24/2009 11:40 AM, came the following characters from > And so my encoding (1) doesn't alter the data stream for any valid > Windows file name, and where the naivest of users reside (2) doesn't > alter the data stream for any Posix file name that was encoded as UTF-8 > sequences and doesn't contain ? characters in the file name [I perceive > the use of ? in file names to be rare on Posix, because of experience, > and because of the other problems caused by such use] (3) doesn't > introduce data puns within applications that are correctly coded to know > the encoding occurs. The encoding technique in the PEP not only can > produce data puns, thus not being reversible, it provides no reliable > mechanism to know that this has occurred. > Uhm Not arguing with your goals but '?' is unfortunately reasonably easy to get into a filename. For instance, I've had to download a lot of scratch built packages from our buildsystem recently. Scratch builds have url's with query strings in them so:: wget 'http://koji.fedoraproject.org/koji/getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm' Which results in the filename: getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm -Toshio signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
Raymond Hettinger wrote: Does anyone have any ideas about what to do with issue 5830 and handling the problem in a general way (not just for sched)? The basic problem is that decorate/compare/undecorate patterns no longer work when the primary sort keys are equal and the secondary keys are unorderable (which is now the case for many callables). >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)] >>> tasks.sort() Traceback (most recent call last): ... TypeError: unorderable types: function() < function() Would it make sense to provide a default ordering whenever the types are the same? def object.__lt__(self, other): if type(self) == type(other): return id(self) < id(other) raise TypeError The immediate problem with this is that 'same type', or not, is sometimes a somewhat arbitrary implementation detail. In 2.x, 40 could be int or long, depending on the build. In 3.0, that difference disappeared. User-defined and builtin functions are different classes for implementation, not conceptual reasons. (This could potentially bite what I understand to be your r71844/5 fix.) Unbound methods used to be the same class as bound methods (as I remember). In 3.0, the wrapping disappeared and they are the same thing as the underlying function. In 2.x, ascii text and binary data might both be str. Now they might be str and bytes. Universal ordering and default ordering by id was broken (and doomed) when Guido decided that complex numbers should not be comparable either lexicographically or by id. Your proposed object.__lt__ would reverse that decision, unless, of course, complex was special-cased (again) to over-ride it, but then we would be back to the 2.x situation of mixed rules and exceptions. Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
2009/4/24 Stephen J. Turnbull : > Paul Moore writes: > > > The pros for Martin's proposal are a uniform cross-platform interface, > > and a user-friendly API for the common case. > > A more accurate phrasing would be "... a user-friendly API for those > who feel very lucky today." Which is the common case, of course, but > spins a little differently. Sorry, but I think you're misrepresenting things. I'd have probably let you off if you'd missed out the "very" - but I do think that it's the common case. Consider: - Windows systems where broken Unicode (lone surrogates or whatever) isn't involved - Unix systems where the user's stated filesystem encoding is correct Can you honestly say that this isn't the vast majority of real-world environments? (IIRC, you are based in Japan, so it may well be true that the likelihood of problems is a lot higher where you are than where I am - the UK - but I suspect that averaging out, things are generally as above). > > [1] Actually, all the PEP says is "With this PEP, a uniform > > treatment of these data as characters becomes possible." An > > argument as to why this is a good thing would be a useful addition > > to the PEP. At the moment it's more or less treated as self-evident > > - which I agree with, but which clearly the Unix people here are > > not as certain of. > > Well, the problem is that both parts are false. I can't work out which "parts" you are referring to here. > If you didn't start > with a valid string in a known encoding, you shouldn't treat it as > characters because it's not. Again, that's the purist argument. If you have a string (of bytes, I guess) and a 99% certain guess as to the correct encoding, then I'd argue that, as long as (a) it's not mission-critical (lives or backups depend on it) and (b) you have a means of failing relatively gracefully, you have every reason to make the assumption about encoding. After all, what's the alternative? Ultimately, you have a byte string and no encoding. You make some assumption, or you can do hardly anything. What use is "Processing file \x66\x6f\x6f" as a progress indicator for a program that scans a directory? (That was "foo" for people who can't read latin-1 written in hex :-)) > Hand it to a careful API, and you'll get > an Exception raised in your face. And that's precisely why it's not > obviously a good thing. Careful clients will have to treat it as > "transcoded bytes", and so the people who develop those clients get no > benefit. OTOH, at least some of those who feel lucky and use it > naively are going to turn out to be wrong. But 99% of the time, "it" is a perfectly acceptable string. (Percentage invented out of thin air, admitted :-)) Remember, only when the system encounters an undecodable byte sequence, would a technically invalid string be generated - and as far as I can tell, the main case when that would happen is on Unix, if the user specifies UTF-8 as the encoding, and the actual filesystem uses something else, *and* there's a file with a name whose byte sequence is invalid UTF-8. I'm *really* struggling to see that as a common scenario. Admittedly, there are other, possibly more common, cases where the string translation is valid, but semantically not what the user expects - user says CP1251, but filesystem is CP850, say. As a UK Windows user, I'm used to seeing CP850 vs CP1251 confusions like this - "£" replaced with ú is the common case. It happens occasionally, and occasionally causes code to behave unexpectedly. But it doesn't reformat my hard drive and the alternative (having to be extra-careful to tell every program precisely which encoding I'm using in every situation) would make programs effectively unusable. > That said, I'm +0 on the PEP as is. So I'm largely preaching to the converted here. After all, lukewarm acceptance from someone with experience of Asian encoding issues is pretty much the equivalent of resounding support from someone who only ever works in English! :-) Paul. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
I would discourage use of the decorate/sort/undecorate pattern, and encourage use of the key= argument. Or, if you really need to decorate into a tuple, still pass a key= argument. The bug report was actually about the sched module which used heapq to prioritize tuples consisting of times, priorities, and actions. I fixed and closed the original bug a few hours ago but had a thought that the pattern itself may be ubiquitious (especially with heapq). ISTM that other bugs like this are lurking about. But all of you guys seem to think the status quo is fine, so that's the end of it. Cheers, Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] "Length of str " changes after passed in Python 2.5
--- I have the following code: # len(all_svs) = 10 # the I call a function with 2 list parameters def proc_line(line,all_svs) : # inside the function the length of the list "all_svs" is 1 more -> 11 # I had to workaround it for i in range(len(all_svs) - 1 ) :# some how the length of all_svs is incremented !!! -- Is this a compiler bug ?? Or is it because of my first try of Python Thanks ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Length of str " changes after passed in Python 2.5
On Fri, Apr 24, 2009, [email protected] wrote: > > I have the following code: > # len(all_svs) = 10 > > # the I call a function with 2 list parameters > def proc_line(line,all_svs) : > > # inside the function the length of the list "all_svs" is 1 more -> 11 > # I had to workaround it This sounds like a usage question. Please use comp.lang.python (or possibly the tutor mailing list). -- Aahz ([email protected]) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On Apr 24, 2009, at 6:05 PM, Paul Moore wrote: - Windows systems where broken Unicode (lone surrogates or whatever) isn't involved - Unix systems where the user's stated filesystem encoding is correct Can you honestly say that this isn't the vast majority of real-world environments? (IIRC, you are based in Japan, so it may well be true that the likelihood of problems is a lot higher where you are than where I am - the UK - but I suspect that averaging out, things are generally as above). In my experience, it is normal on most unix systems that some programs (mostly daemons) are running in default "POSIX" locale, others (most user programs) are running in the "en_US.utf-8" locale, and some luddite users have set themselves to "en_US.8859-1". All running on the same system. James ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Toshio Kuratomi wrote: Glenn Linderman wrote: On approximately 4/24/2009 11:40 AM, came the following characters from And so my encoding (1) doesn't alter the data stream for any valid Windows file name, and where the naivest of users reside (2) doesn't alter the data stream for any Posix file name that was encoded as UTF-8 sequences and doesn't contain ? characters in the file name [I perceive the use of ? in file names to be rare on Posix, because of experience, and because of the other problems caused by such use] (3) doesn't introduce data puns within applications that are correctly coded to know the encoding occurs. The encoding technique in the PEP not only can produce data puns, thus not being reversible, it provides no reliable mechanism to know that this has occurred. Uhm Not arguing with your goals but '?' is unfortunately reasonably easy to get into a filename. For instance, I've had to download a lot of scratch built packages from our buildsystem recently. Scratch builds have url's with query strings in them so:: Is NUL \0 allowed in POSIX file names? If not, could that be used as an escape char. If it is not legal, then custom translated strings that escape in the wild would raise a red flag as soon as something else tried to use them. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tuples and underorderable types
Raymond Hettinger wrote: I would discourage use of the decorate/sort/undecorate pattern, and encourage use of the key= argument. Or, if you really need to decorate into a tuple, still pass a key= argument. The bug report was actually about the sched module which used heapq to prioritize tuples consisting of times, priorities, and actions. I fixed and closed the original bug a few hours ago but had a thought that the pattern itself may be ubiquitious (especially with heapq). ISTM that other bugs like this are lurking about. But all of you guys seem to think the status quo is fine, so that's the end of it. If you define the bug as the sched module not being updated to the 3.0 order, then there are possibly more. I notice that most of the heapq functions do not take a key function argument. Has or will this change in the future? Or is making key-decorated tuples the responsibility of the user? (I can see that a key func would work better with PriQueue class where the key func is passed just once.) tjr ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Terry Reedy wrote: > Is NUL \0 allowed in POSIX file names? If not, could that be used as an > escape char. If it is not legal, then custom translated strings that > escape in the wild would raise a red flag as soon as something else > tried to use them. > AFAIK NUL should be okay but I haven't read a specification to reach that conclusion. Is that a proposal? Should I go find someone who has read the relevant standards to find out? -Toshio signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On 24Apr2009 18:20, Toshio Kuratomi wrote: | Terry Reedy wrote: | > Is NUL \0 allowed in POSIX file names? If not, could that be used as an | > escape char. If it is not legal, then custom translated strings that | > escape in the wild would raise a red flag as soon as something else | > tried to use them. | > | AFAIK NUL should be okay but I haven't read a specification to reach | that conclusion. Is that a proposal? Should I go find someone who has | read the relevant standards to find out? NUL cannot occur in a POSIX file path, if for no other reason than that the API uses C strings, which are NUL terminated. So, yes, you could use NUL as an escape character if you're sure you're never dealing with _non_POSIX pathnames:-) Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ | I'm the female partner of a climber (I don't climb) and until now, I was | under the impression that climbers are cool people, but alas, you had to | ruin it for me. *REAL* climbers are crude, impolite, solitary, abrupt, arrogant. Sport climbers are cool. - Rene Tio in rec.climbing ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
