> What I'm suggesting is to provide a way for processes to record and
> communicate that information without needing to provide a "source
> encoding" slot for strings, and which is able to handle strings
> containing unrecognized (including corrupt) characters from multiple
> source encodings.
Can
On 2007-09-14, Martin v. Löwis wrote:
> >> That's a sorted dict. PEP 3115 wants an insertion-ordered dict.
> >> You're not the first to confuse them. ;)
> >
> > Hmmm, I'd not come across that terminology distinction before.
> > I guess I'll have to rename mine then.
>
> I think "insertion-ordered"
Greg Ewing writes:
> Stephen J. Turnbull wrote:
> > You chose the context of round-tripping *across
> > encodings*, not me. Please stick with your context.
>
> Maybe we have different ideas of what the problem is. I thought
> the problem is to take arbitrary byte sequences coming in as
>
Hagen Fürstenau writes:
> And what if we skillfully conserve unknown bytes in a private use or
> surrogate area and the application author actually knows the encoding
> and wants correctly decoded strings?
This is what my proposal would do, but my proposal would would return
a string, not by
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes:
>> And it *is* needed, because these characters by assumption
>> are not present in Unicode at all. (More precisely, they may be
>> present, but the tables we happen to have don't have mappings for
>> them.)
> They are present! For UTF
On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > Great idea, but sys.argv doesn't need to be magic for this approach to work.
>
> Are you sure? I thought part of the problem was that
> if an argv entry couldn't be decoded, you got an error
> too soon to do anything ab
Guido van Rossum wrote:
> Great idea, but sys.argv doesn't need to be magic for this approach to work.
Are you sure? I thought part of the problem was that
if an argv entry couldn't be decoded, you got an error
too soon to do anything about it. Making sys.argv lazy
would avoid that.
--
Greg
_
On 9/14/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> It would be pretty disruptive to ask everyone to change
> their habit of thinking of sys.argv as a list of strings.
Indeed.
> I would suggest doing it the other way around -- have
> sys.argv be an object that automatically converts to
> unicode
Hagen Fürstenau wrote:
> sys.argv could be of type bytes and sys.arguments (or whatever) could be
> a function taking an encoding parameter (which defaults to UTF-8) and
> returning strings.
>
> Of course that's backwards incompatible and I'm not sure if it's too
> late for something like this
Stephen J. Turnbull wrote:
> You chose the context of round-tripping *across
> encodings*, not me. Please stick with your context.
Maybe we have different ideas of what the problem is.
I thought the problem is to take arbitrary byte sequences
coming in as command-line args and represent them as
u
>> That's a sorted dict. PEP 3115 wants an insertion-ordered dict.
>> You're not the first to confuse them. ;)
>
> Hmmm, I'd not come across that terminology distinction before.
> I guess I'll have to rename mine then.
I think "insertion-ordered" is over-specification, just to make
the distincti
Mark Summerfield wrote:
(Personally I've never needed an insertion-ordered dict.)
Then you've never programmed in PHP I take it. PHP's one-size-fits-all
data structure is an insertion-ordered dict; PHP programmers use it
everywhere a Python programmer might use a dict /or/ a list. I've had
On 2007-09-14, Adam Olsen wrote:
> On 9/14/07, Mark Summerfield <[EMAIL PROTECTED]> wrote:
> > On 2007-09-14, Nicko van Someren wrote:
> > > On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> > > > Is there any chance that an ordered dict will be added to Python 3's
> > > > library?
> > >
> > > It
On 9/14/07, Mark Summerfield <[EMAIL PROTECTED]> wrote:
> On 2007-09-14, Nicko van Someren wrote:
> > On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> > > Is there any chance that an ordered dict will be added to Python 3's
> > > library?
> >
> > It would make sense, since one of the primary jus
On 2007-09-14, Nicko van Someren wrote:
> On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> > Is there any chance that an ordered dict will be added to Python 3's
> > library?
>
> It would make sense, since one of the primary justifications for the
> new metaclass system (PEP 3115) is to allow th
On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> Is there any chance that an ordered dict will be added to Python 3's
> library?
It would make sense, since one of the primary justifications for the
new metaclass system (PEP 3115) is to allow the metaclass to provide
order-preserving diction
On 9/14/07, Hagen Fürstenau <[EMAIL PROTECTED]> wrote:
> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?
> sys.argv could be of type bytes and sys.arguments (or wha
> They can easily roundtrip that then to the encoding that it should have:
>
> good_string = sys.argv[bad_string_index].\
>encode(sys.argv_encoding, "pua-replace").decode(real_encoding)
To me this doesn't look easier than sys.arguments() in the standard case
and sys.arguments(encoding="whate
> Are you sure that "strings in an unknown encoding" are conceptually
> strings and not rather bytes?
For file names, most definitely. For command line arguments, I am
fairly sure: the argc/argv calling convention does not allow for
arbitrary bytes.
> And what if we skillfully conserve unknown by
> That is not a concern. However, it is fundamentally the wrong thing to
> do. Most people rightfully view command line arguments and file names
> as strings, as they use the keyboard to enter them, and the computer
> uses letters from a font to display them. They are not bytes
> conceptually - the
> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?
I think people will complain if command line arguments aren't strings,
and they will complain even more so if fi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Sep 14, 2007, at 5:15 AM, Hagen Fürstenau wrote:
> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?
>
> sys.argv co
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Sep 14, 2007, at 1:08 AM, Greg Ewing wrote:
> Stephen J. Turnbull wrote:
>> You can't win that, because Unicode is the only encoding that
>> attempts
>> to guarantee even the possibility of round-tripping.
>
> Rubbish -- I can do print [ord(c) fo
Greg Ewing wrote:
> [EMAIL PROTECTED] wrote:
>> I was just thinking about the folks at places like FermiLab and CERN. ;-)
>
> Those guys probably need picoseconds...
With the suggested %f format character and the mention of Fermilab and
CERN, I started thinking about femtoseconds :)
Cheers,
Nic
Is it too unreasonable to keep the byte strings we get from the OS as
byte strings in Python (since we're not sure about their encoding) and
offer functions for getting strings?
sys.argv could be of type bytes and sys.arguments (or whatever) could be
a function taking an encoding parameter (whi
Greg Ewing writes:
> Stephen J. Turnbull wrote:
> > You can't win that, because Unicode is the only encoding that attempts
> > to guarantee even the possibility of round-tripping.
>
> Rubbish -- I can do print [ord(c) for c in my_unicode_string]
> and get perfect round-trippability if I wan
Dnia 13-09-2007, Cz o godzinie 23:41 -0400, James Y Knight napisał(a):
> Here's a suggestion I made on the SBCL dev list a while back, in
> response to the same issues.
After a second thought, this (escaping undecodable UTF-8 bytes by
unpaired low surrogates) might be a good idea.
(I don't rem
Dnia 14-09-2007, Pt o godzinie 15:02 +0900, Stephen J. Turnbull
napisał(a):
> > PUA already has a representation in UTF-8, so this is more incompatible
> > with UTF-8 than needed,
>
> Hm? It's not incompatible at all, and we're not interested in a
> representation in UTF-8, but rather in UTF-1
28 matches
Mail list logo