> msvcrt ships with the operating system - I'd call that a conforming
> implementation.
Yes, but it's not part of the operating system interface; Microsoft
documents it as "for future use only by system-level components".
> I still regard handling argv as anything other the raw bytes that come
>
On 9/28/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Nicholas Bastin schrieb:
> > On 9/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >> argc/argv does not exist on Windows (that you seem to see it
> >> anyway is an illusion), and if it did exist, it would be characters,
> >> not bytes
On 9/27/07, Nicholas Bastin <[EMAIL PROTECTED]> wrote:
>
> On 9/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > argc/argv does not exist on Windows (that you seem to see it
> > anyway is an illusion), and if it did exist, it would be characters,
> > not bytes.
>
> Of course it exists on Win
Nicholas Bastin schrieb:
> On 9/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> argc/argv does not exist on Windows (that you seem to see it
>> anyway is an illusion), and if it did exist, it would be characters,
>> not bytes.
>
> Of course it exists on Windows. argc/argv are defined by th
On 9/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> argc/argv does not exist on Windows (that you seem to see it
> anyway is an illusion), and if it did exist, it would be characters,
> not bytes.
Of course it exists on Windows. argc/argv are defined by the C
standard, and say what you wil
> The filesystem is unrelated to sys.argv, except for the need to pass
> filenames through argv. If the filesystem is using bytes rather than
> characters, then sys.argv must offer the same option, or else certain
> scripts will (under some rare circumstances) fail.
The same holds for file names
On 9/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Zitat von Jim Jewett <[EMAIL PROTECTED]>:
>
> > On 9/21/07, Paul Moore <[EMAIL PROTECTED]> wrote:
> >> On 21/09/2007, Jim Jewett <[EMAIL PROTECTED]> wrote:
[The original context, expressed with some detail by Michael Urman in
http://mail.p
Dnia 21-09-2007, Pt o godzinie 10:00 -0400, Jim Jewett napisał(a):
> Is it reasonable to expose sys.argv.buffer?
> (Since this would be bytes rather than text, I assume this would be a
> single array, rather than a list of already separated arguments.)
On Unix the arguments are already separated
Zitat von Jim Jewett <[EMAIL PROTECTED]>:
> On 9/21/07, Paul Moore <[EMAIL PROTECTED]> wrote:
>> On 21/09/2007, Jim Jewett <[EMAIL PROTECTED]> wrote:
>> > (Outside ASCII), if you treat sys.argv as text, that is probably
>> > impossible without filesystem support. Before python even sees the
>> >
"Michael Urman" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
| If there's not something straightforward to put in the ... below that
| would allow simple iteration and processing of all files passed on the
| command line, preferably interchangeably on both unix (where filenames
|
On 21/09/2007, Jim Jewett <[EMAIL PROTECTED]> wrote:
> If you are using text (as opposed to bytes), then À can be either
> U+00C0 or . If the file system makes a distinction,
> then it is using bytes, and any program interacting with it needs* to
> use bytes too.
OK. I don't know enough about Uni
On 9/21/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> (Outside ASCII), if you treat sys.argv as text, that is probably
> impossible without filesystem support. Before python even sees the
> data, the terminal itself is allowed to change between canonical
> equivalents, which have different binary re
Jean-Paul Calderone schrieb:
> On Fri, 21 Sep 2007 10:00:38 -0400, Jim Jewett <[EMAIL PROTECTED]> wrote:
>> [snip]
>>
>>It does sound like we need a way to get to the original bytes, similar
>>to sys.stdin.buffer. Is it reasonable to expose sys.argv.buffer?
>>(Since this would be bytes rather than
On 9/21/07, Paul Moore <[EMAIL PROTECTED]> wrote:
> On 21/09/2007, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > (Outside ASCII), if you treat sys.argv as text, that is probably
> > impossible without filesystem support. Before python even sees the
> > data, the terminal itself is allowed to change be
On Fri, 21 Sep 2007 10:00:38 -0400, Jim Jewett <[EMAIL PROTECTED]> wrote:
> [snip]
>
>It does sound like we need a way to get to the original bytes, similar
>to sys.stdin.buffer. Is it reasonable to expose sys.argv.buffer?
>(Since this would be bytes rather than text, I assume this would be a
>sin
On 9/18/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 9/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > ... given that defenc is now always UTF-8, won't exposing
> > it in the public typedef then just be an attractive nuisance?
> *ALL* fields of the struct def are strictly internal.
Is t
On 21/09/2007, Jim Jewett <[EMAIL PROTECTED]> wrote:
> (Outside ASCII), if you treat sys.argv as text, that is probably
> impossible without filesystem support. Before python even sees the
> data, the terminal itself is allowed to change between canonical
> equivalents, which have different binary
On 9/18/07, James Y Knight <[EMAIL PROTECTED]> wrote:
> On Sep 18, 2007, at 11:11 AM, Guido van Rossum wrote:
> One of the more common things to do with command line arguments is
> open them. So, it'd really be nice if:
> python -c 'import sys; open(sys.argv[1])' [some filename]
> would always w
> On Linux, filenames are *byte* string and not *character* string.
That's not true, although this is a wide-spread misunderstanding.
The POSIX standard defines that the file names must be a superset
of the portable character set, which includes things such as '/',
which is the path separator.
>
Victor Stinner writes:
> On Thursday 13 September 2007 18:22:12 Marcin 'Qrczak' Kowalczyk wrote:
> > What should happen when a command line argument or an environment
> > variable is not decodable using the system encoding (on Unix where
> > from the OS point of view it is an array of bytes)?
Hi,
On Thursday 13 September 2007 18:22:12 Marcin 'Qrczak' Kowalczyk wrote:
> What should happen when a command line argument or an environment
> variable is not decodable using the system encoding (on Unix where
> from the OS point of view it is an array of bytes)?
On Linux, filenames are *byte*
James Y Knight writes:
> iso-2022 or some other abomination. This has upsides (simple, doesn't
> trample on PUA codepoints, only needs one new codec, never throws
> exception in the above example, and really is correct much of the
> time), and downsides (if the system locale is iso-2022,
On 9/18/07, James Y Knight <[EMAIL PROTECTED]> wrote:
>
> On Sep 18, 2007, at 11:11 AM, Guido van Rossum wrote:
> > If they contain
> > non-ASCII bytes I am currently in favor os doing a best-effort
> > decoding using the default locale encoding, replacing errors with '?'
> > rather than throwing e
On Sep 18, 2007, at 11:11 AM, Guido van Rossum wrote:
> If they contain
> non-ASCII bytes I am currently in favor os doing a best-effort
> decoding using the default locale encoding, replacing errors with '?'
> rather than throwing exception.
One of the more common things to do with command line
On 9/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> On 9/18/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > On 9/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > > On 9/18/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
>
> > > > There's no UTF-8 in Python's internal string encoding.
>
> > >
On 9/18/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 9/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > On 9/18/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> > > There's no UTF-8 in Python's internal string encoding.
> > (At least as of a few days ago)
> > In Python 3 there is; st
On 9/18/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Guido has stated that the
> internal representation used by Python strings is a sequence of
> Unicode code units, not characters. I don't think that's reached the
> status of "pronouncement" yet, but you will probably need a PEP to get
>
On 9/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> On 9/18/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
>
> > There's no UTF-8 in Python's internal string encoding. What are you
> > talking about?
>
> (At least as of a few days ago)
>
> In Python 3 there is; strings are unicode. A PyUnicod
On 9/18/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> There's no UTF-8 in Python's internal string encoding. What are you
> talking about?
(At least as of a few days ago)
In Python 3 there is; strings are unicode. A PyUnicodeObject object
has two encodings that you can grab from a point
> "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes:
>> > This is wrong: UTF-8 is specified for PUA. PUA is no special from the
>> > point of view of UTF-8.
>
>> It is from the point of view of the Unicode standard, specifically v5.
>> Please see section 16.5, especially about the
On 9/17/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Note that some people are currently arguing that sys.argv should be an
> array of bytes objects, and Guido has not yet said "no".
Then let me say "no" now. I'd be happy to support a lower-level API
for getting at the actual bytes in the
Dnia 18-09-2007, Wt o godzinie 13:08 +0900, Stephen J. Turnbull
napisał(a):
> > This is wrong: UTF-8 is specified for PUA. PUA is no special from the
> > point of view of UTF-8.
>
> It is from the point of view of the Unicode standard, specifically v5.
> Please see section 16.5, especially abou
> "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes:
>> When a codec encounters something it can't handle, whether it's a
>> valid character in a legacy encoding, a private use character in a
>> UTF, or an invalid sequence of code units, it throws an exception
>> specifying the charac
> "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes:
>> > Well, for any scheme which attempts to modify UTF-8 by accepting
>> > arbitrary byte strings is used, *something* must be interpreted
>> > differently than in real UTF-8.
>> Wrong. In my scheme everything ends up in the PUA
On 16-Sep-07, at 4:03 PM, Greg Ewing wrote:
> Paul Moore wrote:
>> On 15/09/2007, Gregory P. Smith <[EMAIL PROTECTED]> wrote:
>>
>>> similarly for the environment. os.environ dict
>>> should be bytes object keys and values
>>
>> You can't have bytes as keys - the type isn't hashable...
>
> Has th
Dnia 16-09-2007, N o godzinie 16:13 +0900, Stephen J. Turnbull
napisał(a):
> When a codec encounters something it can't handle, whether it's a
> valid character in a legacy encoding, a private use character in a
> UTF, or an invalid sequence of code units, it throws an exception
> specifying the c
> Yes. I'm recovering from moving from Japan to California, and will be
> busy until the beginning of October, I'll get started on it then. For
> this kind of thing, what is the deadline for submission of a patch?
> Before the alpha, early beta?
Either would work fine, unless somebody else does
Dnia 15-09-2007, So o godzinie 09:13 +0900, Stephen J. Turnbull
napisał(a):
> > Well, for any scheme which attempts to modify UTF-8 by accepting
> > arbitrary byte strings is used, *something* must be interpreted
> > differently than in real UTF-8.
>
> Wrong. In my scheme everything ends up i
"Martin v. Löwis" writes:
> > The basic idea is to allocate code points in private space as-needed.
>
> Ok, thanks. Would you be interested in implementing that scheme?
Yes. I'm recovering from moving from Japan to California, and will be
busy until the beginning of October, I'll get started
Paul Moore wrote:
> On 15/09/2007, Gregory P. Smith <[EMAIL PROTECTED]> wrote:
>
>>similarly for the environment. os.environ dict
>>should be bytes object keys and values
>
> You can't have bytes as keys - the type isn't hashable...
Has there been any consensus reached yet on whether
there will
Gregory P. Smith wrote:
> argv is the C/C++ name for bytes, lets not
> confuse people.
C has never made a clear distinction between characters
and bytes, using the type 'char' for both. It got away
with it for the same reason that Python did until
unicode came along. I'm pretty sure most people us
On 9/16/07, Paul Moore <[EMAIL PROTECTED]> wrote:
> On 16/09/2007, Fred Drake <[EMAIL PROTECTED]> wrote:
> > On Sep 15, 2007, at 10:00 PM, Nicholas Bastin wrote:
> > > Then lets stop beating around the bush and implement an immutable
> > > bytes type. Why put ourselves through contortions trying t
On 16/09/2007, Fred Drake <[EMAIL PROTECTED]> wrote:
> On Sep 15, 2007, at 10:00 PM, Nicholas Bastin wrote:
> > Then lets stop beating around the bush and implement an immutable
> > bytes type. Why put ourselves through contortions trying to jam a
> > square peg into a round hole and not just deci
> The basic idea is to allocate code points in private space as-needed.
Ok, thanks. Would you be interested in implementing that scheme?
Regards,
Martin
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3
"Martin v. Löwis" writes:
> > What I'm suggesting is to provide a way for processes to record and
> > communicate that information without needing to provide a "source
> > encoding" slot for strings, and which is able to handle strings
> > containing unrecognized (including corrupt) characters
On Sep 15, 2007, at 10:00 PM, Nicholas Bastin wrote:
> Then lets stop beating around the bush and implement an immutable
> bytes type. Why put ourselves through contortions trying to jam a
> square peg into a round hole and not just decide to make a round peg?
+42
-Fred
--
Fred Drake
> > You can't have bytes as keys - the type isn't hashable...
>
> That's why people keep arguing for an immutable bytes types. I keep
> seeing long discussions that end up with a tortured mechanism for making
> the keys unicode. Why don't we just bite the bullet and make things
> easier and have
On 9/15/07, Paul Moore <[EMAIL PROTECTED]> wrote:
> On 15/09/2007, Gregory P. Smith <[EMAIL PROTECTED]> wrote:
> > similarly for the environment. os.environ dict
> > should be bytes object keys and values
>
> You can't have bytes as keys - the type isn't hashable...
Then lets stop beating around
On Sat, Sep 15, 2007, Paul Moore wrote:
> On 15/09/2007, Gregory P. Smith <[EMAIL PROTECTED]> wrote:
>>
>> similarly for the environment. os.environ dict
>> should be bytes object keys and values
>
> You can't have bytes as keys - the type isn't hashable...
That's why people keep arguing for an
On Fri, Sep 14, 2007, "Martin v. L??wis" wrote:
>Hagen:
>>
>> And what if we skillfully conserve unknown bytes in a private use or
>> surrogate area and the application author actually knows the encoding
>> and wants correctly decoded strings?
>
> They can easily roundtrip that then to the encodi
On 9/15/07, Paul Moore <[EMAIL PROTECTED]> wrote:
> On 15/09/2007, Gregory P. Smith <[EMAIL PROTECTED]> wrote:
> > similarly for the environment. os.environ dict
> > should be bytes object keys and values
>
> You can't have bytes as keys - the type isn't hashable...
ugh, yeah. as much as i hate
On 15/09/2007, Gregory P. Smith <[EMAIL PROTECTED]> wrote:
> similarly for the environment. os.environ dict
> should be bytes object keys and values
You can't have bytes as keys - the type isn't hashable...
Paul
___
Python-3000 mailing list
Python-3000
On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Hagen Fürstenau wrote:
> > sys.argv could be of type bytes and sys.arguments (or whatever) could be
> > a function taking an encoding parameter (which defaults to UTF-8) and
> > returning strings.
> >
> > Of course that's backwards incompatible an
>> sys.argv could be of type bytes and sys.arguments (or whatever) could be
>> a function taking an encoding parameter (which defaults to UTF-8) and
>> returning strings.
>>
> It would be pretty disruptive to ask everyone to change
> their habit of thinking of sys.argv as a list of strings.
The
> What I'm suggesting is to provide a way for processes to record and
> communicate that information without needing to provide a "source
> encoding" slot for strings, and which is able to handle strings
> containing unrecognized (including corrupt) characters from multiple
> source encodings.
Can
Greg Ewing writes:
> Stephen J. Turnbull wrote:
> > You chose the context of round-tripping *across
> > encodings*, not me. Please stick with your context.
>
> Maybe we have different ideas of what the problem is. I thought
> the problem is to take arbitrary byte sequences coming in as
>
Hagen Fürstenau writes:
> And what if we skillfully conserve unknown bytes in a private use or
> surrogate area and the application author actually knows the encoding
> and wants correctly decoded strings?
This is what my proposal would do, but my proposal would would return
a string, not by
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes:
>> And it *is* needed, because these characters by assumption
>> are not present in Unicode at all. (More precisely, they may be
>> present, but the tables we happen to have don't have mappings for
>> them.)
> They are present! For UTF
On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > Great idea, but sys.argv doesn't need to be magic for this approach to work.
>
> Are you sure? I thought part of the problem was that
> if an argv entry couldn't be decoded, you got an error
> too soon to do anything ab
Guido van Rossum wrote:
> Great idea, but sys.argv doesn't need to be magic for this approach to work.
Are you sure? I thought part of the problem was that
if an argv entry couldn't be decoded, you got an error
too soon to do anything about it. Making sys.argv lazy
would avoid that.
--
Greg
_
On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> It would be pretty disruptive to ask everyone to change
> their habit of thinking of sys.argv as a list of strings.
Indeed.
> I would suggest doing it the other way around -- have
> sys.argv be an object that automatically converts to
> unicode
Hagen Fürstenau wrote:
> sys.argv could be of type bytes and sys.arguments (or whatever) could be
> a function taking an encoding parameter (which defaults to UTF-8) and
> returning strings.
>
> Of course that's backwards incompatible and I'm not sure if it's too
> late for something like this
Stephen J. Turnbull wrote:
> You chose the context of round-tripping *across
> encodings*, not me. Please stick with your context.
Maybe we have different ideas of what the problem is.
I thought the problem is to take arbitrary byte sequences
coming in as command-line args and represent them as
u
On 9/14/07, Hagen Fürstenau <[EMAIL PROTECTED]> wrote:
> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?
> sys.argv could be of type bytes and sys.arguments (or wha
> They can easily roundtrip that then to the encoding that it should have:
>
> good_string = sys.argv[bad_string_index].\
>encode(sys.argv_encoding, "pua-replace").decode(real_encoding)
To me this doesn't look easier than sys.arguments() in the standard case
and sys.arguments(encoding="whate
> Are you sure that "strings in an unknown encoding" are conceptually
> strings and not rather bytes?
For file names, most definitely. For command line arguments, I am
fairly sure: the argc/argv calling convention does not allow for
arbitrary bytes.
> And what if we skillfully conserve unknown by
> That is not a concern. However, it is fundamentally the wrong thing to
> do. Most people rightfully view command line arguments and file names
> as strings, as they use the keyboard to enter them, and the computer
> uses letters from a font to display them. They are not bytes
> conceptually - the
> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?
I think people will complain if command line arguments aren't strings,
and they will complain even more so if fi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Sep 14, 2007, at 5:15 AM, Hagen Fürstenau wrote:
> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?
>
> sys.argv co
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Sep 14, 2007, at 1:08 AM, Greg Ewing wrote:
> Stephen J. Turnbull wrote:
>> You can't win that, because Unicode is the only encoding that
>> attempts
>> to guarantee even the possibility of round-tripping.
>
> Rubbish -- I can do print [ord(c) fo
Is it too unreasonable to keep the byte strings we get from the OS as
byte strings in Python (since we're not sure about their encoding) and
offer functions for getting strings?
sys.argv could be of type bytes and sys.arguments (or whatever) could be
a function taking an encoding parameter (whi
Greg Ewing writes:
> Stephen J. Turnbull wrote:
> > You can't win that, because Unicode is the only encoding that attempts
> > to guarantee even the possibility of round-tripping.
>
> Rubbish -- I can do print [ord(c) for c in my_unicode_string]
> and get perfect round-trippability if I wan
Dnia 13-09-2007, Cz o godzinie 23:41 -0400, James Y Knight napisał(a):
> Here's a suggestion I made on the SBCL dev list a while back, in
> response to the same issues.
After a second thought, this (escaping undecodable UTF-8 bytes by
unpaired low surrogates) might be a good idea.
(I don't rem
Dnia 14-09-2007, Pt o godzinie 15:02 +0900, Stephen J. Turnbull
napisał(a):
> > PUA already has a representation in UTF-8, so this is more incompatible
> > with UTF-8 than needed,
>
> Hm? It's not incompatible at all, and we're not interested in a
> representation in UTF-8, but rather in UTF-1
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes:
>> This means that a way of handling such points is very useful, and
>> as long as there's enough PUA space, the approach I suggested can
>> handle all of these various issues.
> PUA already has a representation in UTF-8, so this is more
Stephen J. Turnbull wrote:
> You can't win that, because Unicode is the only encoding that attempts
> to guarantee even the possibility of round-tripping.
Rubbish -- I can do print [ord(c) for c in my_unicode_string]
and get perfect round-trippability if I want.
You can ask people to use pre-exis
Greg Ewing writes:
> Stephen J. Turnbull wrote:
> > What should happen internally is that all undecodable characters
> > (which PUA characters are by definition for standard codecs) are
> > mapped to unused codepoints in the PUA, chosen by Python.
>
> You mean chosen dynamically?
Yes.
>
Stephen J. Turnbull wrote:
> What should
> happen internally is that all undecodable characters (which PUA
> characters are by definition for standard codecs) are mapped to unused
> codepoints in the PUA, chosen by Python.
You mean chosen dynamically? What happens if these PUA
characters get encod
On Sep 13, 2007, at 12:22 PM, Marcin 'Qrczak' Kowalczyk wrote:
> What should happen when a command line argument or an environment
> variable is not decodable using the system encoding (on Unix where
> from the OS point of view it is an array of bytes)?
Here's a suggestion I made on the SBCL dev l
Dnia 14-09-2007, Pt o godzinie 06:12 +0900, Stephen J. Turnbull
napisał(a):
> This means that a way of handling such points
> is very useful, and as long as there's enough PUA space, the approach
> I suggested can handle all of these various issues.
PUA already has a representation in UTF-8, so t
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes:
>> Of course, if the input data already contains PUA characters,
>> there would be an ambiguity. We can rule this out for most codecs,
>> as they don't support PUA characters. The major exception would
>> be UTF-8,
> Most codecs other t
Dnia 13-09-2007, Cz o godzinie 19:08 +0200, "Martin v. Löwis"
napisał(a):
> Of course, if the input data already contains PUA characters,
> there would be an ambiguity. We can rule this out for most codecs,
> as they don't support PUA characters. The major exception would
> be UTF-8,
Most codecs
> > We would make a list of all interfaces that use the PUA error
> > handler: file names, environment variables, command line
> > arguments.
>
> In general, I don't consider this an error.
I don't, either. However, given the current codec design, this is
the least intrusive way to enhance "al
"Martin v. Löwis" writes:
> One "universal" solution is to use Unicode private-use-area
> characters.
+1
> Of course, if the input data already contains PUA characters,
> there would be an ambiguity.
That may be true in the implementation, but it shouldn't. What should
happen internally i
> Yes, I have noticed this too. Environment variables, command line
> arguments, locale properties, TZ names, and so on, are often given as
> 8-bit strings in who knows what encoding. I'm not sure what the
> solution is, but we need one.
One "universal" solution is to use Unicode private-use-area
Yes, I have noticed this too. Environment variables, command line
arguments, locale properties, TZ names, and so on, are often given as
8-bit strings in who knows what encoding. I'm not sure what the
solution is, but we need one. I'm guessing one thing we need to do is
research how various systems
What should happen when a command line argument or an environment
variable is not decodable using the system encoding (on Unix where
from the OS point of view it is an array of bytes)?
This is an unfortunate side effect of switching to Unicode. It's
unfortunate because often the data is only passe
87 matches
Mail list logo