2009/4/25 James Y Knight :
> On Apr 24, 2009, at 6:05 PM, Paul Moore wrote:
>>
>> - Windows systems where broken Unicode (lone surrogates or whatever)
>> isn't involved
>> - Unix systems where the user's stated filesystem encoding is correct
>>
>> Can you honestly say that this isn't the vast major
Benjamin Peterson wrote:
2009/4/24 Eric Smith :
My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in
3.2.
Having heard no dissent, I'd like to go ahead and deprecate this API. What
are the mechanics of deprecating this? Just documentation, or is there
something I should do in
Cameron Simpson wrote:
> On 22Apr2009 08:50, Martin v. Löwis wrote:
> | File names, environment variables, and command line arguments are
> | defined as being character data in POSIX;
>
> Specific citation please? I'd like to check the specifics of this.
For example, on environment variables:
h
> | 2. Even if they were taken away (which the PEP does not propose to do),
> |it would be easy to emulate them for applications that want them.
> |For example, listdir could be wrapped as
> |
> |def listdir_b(bytestring):
> |fse = sys.getfilesystemencoding()
>
> Alas, no
No,
Simon Cross wrote:
>> Unfortunately, for Windows, the situation would
>> be exactly the opposite: the byte-oriented interface cannot represent
>> all data; only the character-oriented API can.
>
> Is the second part of this actually true? My understanding may be
> flawed, but surely all Unicode da
> The problem with this, and other preceding schemes that have been
> discussed here, is that there is no means of ascertaining whether a
> particular file name str was obtained from a str API, or was funny-
> decoded from a bytes API... and thus, there is no means of reliably
> ascertaining whethe
> Humour aside :), the expectation that filenames are Unicode data
> simply doesn't agree with the reality of POSIX file systems. I think
> an approach similar to that adopted by glib [1] could work
Are you saying that the approach presented in the PEP will not work?
I believe it would work no ma
> The part that I haven't seen clearly addressed so far is what happens
> when disks get mounted across OSes (e.g. NFS).
>
> While I agree that there should be a layer on top that can handle "most"
> situations, it also seems clear that the raw layer needs to be readily
> accessible.
Indeed, with
> [1] Actually, all the PEP says is "With this PEP, a uniform treatment
> of these data as characters becomes
> possible." An argument as to why this is a good thing would be a
> useful addition to the PEP. At the moment it's more or less treated as
> self-evident - which I agree with, but which cl
> Because the encoding is not reliably reversible.
Why do you say that? The encoding is completely reversible
(unless we disagree on what "reversible" means).
> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a
> reversible encoding.
Then please provide an example for a setup whe
> Following on from that, would this (under Martin's proposal) result in
> programs receiving encoded strings, or just semantically-incorrect
> ones?
Not sure I understand the question - what is an "encoded string"?
As you analyse below, sometimes, the current (2.x) file system encoding
will do t
> If the bytes are mapped to single half surrogate codes instead of the
> normal pairs (low+high), then I can see that decoding could never be
> ambiguous and encoding could produce the original bytes.
I was confused by Markus Kuhn's original UTF-8b specification. I have
now changed the PEP to avo
Martin v. Löwis wrote:
If the bytes are mapped to single half surrogate codes instead of the
normal pairs (low+high), then I can see that decoding could never be
ambiguous and encoding could produce the original bytes.
I was confused by Markus Kuhn's original UTF-8b specification. I have
now ch
2009/4/25 "Martin v. Löwis" :
>> Following on from that, would this (under Martin's proposal) result in
>> programs receiving encoded strings, or just semantically-incorrect
>> ones?
>
> Not sure I understand the question - what is an "encoded string"?
Sorry. I was struggling to come up with termi
> OK, looks like my analysis matches yours, except that I wasn't sure if
> the third case (a string that "likely wasn't intended") could result
> in exceptions. From what you're saying, it sounds like it would
> actually be similar to the second case - I'm not clear on how
> surrogates work, though
> The only drawback I can see is if the UTF-8 bytes actually decode to a
> half surrogate. However, half surrogates should really only occur in
> UTF-16 (as I understand it), so they shouldn't be encoded in UTF-8
> anyway!
Right: that's the rationale for UTF-8b. Encoding half surrogates
violates p
Thanks for writing this PEP 383, MvL. I recently ran into this
problem in Python 2.x in the Tahoe project [1]. The Tahoe project
should be considered a good use case showing what some people need.
For example, the assumption that a file will later be written back
into the same local file
On Sat, Apr 25, 2009 at 05:00:17PM +0200, "Martin v. L?wis" wrote:
> I recognize that for other languages (without trivial transliterations)
> the problem is more severe, and people are more likely to create
> files with Cyrillic, or Japanese, names (say) if the systems accepts
> them at all.
I
On Sat, Apr 25, 2009 at 10:00, "Martin v. Löwis" wrote:
> On decoding, there is a guarantee that it decodes successfully. There is
> also a guarantee that the result will re-encode successfully, and yield
> the same byte string.
>
> If you pass a different string into encoding, you still may get
>
> I see two main user-oriented use cases for the resulting Unicode
> strings this PEP will produce on all systems: displaying a list of
> filenames for the user to select from (an open file dialog), and
> allowing a user to edit or supply a filename (a save dialog or a
> rename control).
There are
Paul Moore gmail.com> writes:
> But those
> people are also the *least* likely people to contribute on an
> English-speaking list, I guess (Sincere apologies if everyone but
> me on this list happens to actually be fluent English-speaking
> Russians )
Actually, we're all Finnish.
Regards,
Ånto
On Sat, Apr 25, 2009 at 11:33, "Martin v. Löwis" wrote:
> If the user has the locale setup in way that matches his keyboard,
> it should work all fine - and will already, even without the PEP.
> If the user enters a character that doesn't directly map to a
> good file name, you get an exception, a
Martin v. Löwis wrote:
I see two main user-oriented use cases for the resulting Unicode
strings this PEP will produce on all systems: displaying a list of
filenames for the user to select from (an open file dialog), and
allowing a user to edit or supply a filename (a save dialog or a
rename contr
-On [20090425 11:01], Paul Moore (p.f.mo...@gmail.com) wrote:
>PS Unfortunately, I suspect that the biggest group of people likely to
>be hit badly by this is people using non-latin scripts. And arguing
>probabilities without real data is optimistic at best. But those
>people are als
You might want to note in the PEP that the problem that's being solved
is known as the "loop and a half" problem.
http://www.cs.duke.edu/~ola/patterns/plopd/loops.html#loop-and-a-half
raymond.hettinger wrote:
Author: raymond.hettinger
Date: Sun Apr 26 02:34:36 2009
New Revision: 71946
Log:
Re
On 25Apr2009 14:07, "Martin v. Löwis" wrote:
| Cameron Simpson wrote:
| > On 22Apr2009 08:50, Martin v. Löwis wrote:
| > | File names, environment variables, and command line arguments are
| > | defined as being character data in POSIX;
| >
| > Specific citation please? I'd like to check the spe
26 matches
Mail list logo