Martin v. Löwis wrote:
>> I can assure you
>> that most of the documents that I work with are not in CP436 - they are
>> a combination of ASCII, ISO8859-1, and UTF-8. I would also guess that
>> this is true of many Windows XP (US-English) users. So, for me and users
>> like me, Python is going t
Josiah Carlson wrote:
> "Anders J. Munch" <[EMAIL PROTECTED]> wrote:
> > I don't expect file methods and systems calls to map one to one, but
> > you're right, the first time the length is needed, that's an extra
> > system call.
>
> Every time the length is needed, a system call is required
> (y
On Mon, 11 Sep 2006 18:16:15 -0700, "Paul Prescod" wrote:
> UTF-8 with BOM is the Microsoft preferred format.
I believe this is a gloss. Microsoft uses UTF-16. Because
the basic character unit is larger than one byte it is crucial
for interoperability to prefix a string of UTF-16 text with an
i
On 9/13/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Fredrik Lundh schrieb:
> > just noticed that PEP 3100 says that PyString_AsEncodedString and
> > PyString_AsDecodedString is to be removed, but it doesn't mention
> > any other PyString (or PyUnicode) functions.
> > how large changes can w
"John S. Yates, Jr." <[EMAIL PROTECTED]> writes:
> It is a mistake on Microsoft's part to fail to strip the BOM
> during conversion to UTF-8. There is no MEANINGFUL definition
> of BOM in a UTF-8 string. But instead of stripping the wrapper
> and converting only the text payload Microsoft lazily
Jim Jewett schrieb:
>> For example, PyString_From{String[AndSize]|Format} would either:
>> - have to grow an encoding argument
>> - assume a default encoding (either ASCII or UTF-8)
>> - change its signature to operate on Py_UNICODE* (although
>> we don't have literals for these) or
>> - be remov
"Anders J. Munch" <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
> > "Anders J. Munch" <[EMAIL PROTECTED]> wrote:
> > > I don't expect file methods and systems calls to map one to one, but
> > > you're right, the first time the length is needed, that's an extra
> > > system call.
> >
> > Ever
"John S. Yates, Jr." <[EMAIL PROTECTED]> wrote:
>
> On Mon, 11 Sep 2006 18:16:15 -0700, "Paul Prescod" wrote:
>
> > UTF-8 with BOM is the Microsoft preferred format.
>
> I believe this is a gloss. Microsoft uses UTF-16. Because
> the basic character unit is larger than one byte it is crucial
On 9/13/06, John S. Yates, Jr. <[EMAIL PROTECTED]> wrote:
On Mon, 11 Sep 2006 18:16:15 -0700, "Paul Prescod" wrote:> UTF-8 with BOM is the Microsoft preferred format.It is a mistake on Microsoft's part to fail to strip the BOMduring conversion to UTF-8. There is no MEANINGFUL definition
of BOM in
On 9/13/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > Should encoding be an attribute of the string?
> No. A Python string is a sequence of Unicode characters.
> Even if it was created by converting from some other encoding,
> that original encoding gets lost when doing the conversion
> (ju
Jim Jewett schrieb:
> Simply not encoding/decoding until required would save quite a bit of
> time and space -- but then the object would need some way of
> indicating which encoding it is in.
Try implementing that some time. You'll find it will be incredibly
complex and unmaintainable. Start with
On 9/13/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Jim Jewett schrieb:
> > Simply not encoding/decoding until required would save quite a bit of
> > time and space -- but then the object would need some way of
> > indicating which encoding it is in.
> Try implementing that some time. You'l
On 9/11/06, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
>
> > All sorts of things are different when reading stdin vs. opening a
> > filename. e.g. stdin may be a pipe.
>
> Which suggests that if anything is going to try
> to guess the encoding, it would be better for it
> to st
Jim Jewett schrieb:
> Simply delegate such methods to a hidden per-encoding subclass.
>
> The UTF-8 methods will indeed be complex, unless the solution is
> simply "someone called indexing/slicing/len, so I have to recode after
> all."
>
> The Latin-1 encoding will have no such problem.
I'm not
On 9/13/06, John S. Yates, Jr. <[EMAIL PROTECTED]> wrote:
> It is a mistake on Microsoft's part to fail to strip the BOM
> during conversion to UTF-8.
John, you're mistaken about the reason this BOM is here.
In Notepad at least, the BOM is intentionally generated when writing
the file. It's not
BJörn Lindqvist <[EMAIL PROTECTED]> wrote:
>>> The idea of a standard edu library though is a GREAT one.
>>> [...]
>> I disagree for two reasons:
>>
>> 1) Even a single line of boilerplate is too much
>> when you're trying to pare things down to the
>> bare minimum for a beginner.
>>
>> 2) It tea
Le mercredi 13 septembre 2006 à 09:41 -0700, Josiah Carlson a écrit :
> And is generally ignored, as per unicode spec; it's a "zero width
> non-breaking space" - an invisible character with no effect on wrapping
> or otherwise.
Well it would be better if Py3K (with all strings unicode) makes thin
Antoine Pitrou wrote:
> Le mercredi 13 septembre 2006 à 09:41 -0700, Josiah Carlson a écrit :
>> And is generally ignored, as per unicode spec; it's a "zero width
>> non-breaking space" - an invisible character with no effect on wrapping
>> or otherwise.
>
> Well it would be better if Py3K (with a
Jason Orendorff wrote:
> On 9/13/06, John S. Yates, Jr. <[EMAIL PROTECTED]> wrote:
>> It is a mistake on Microsoft's part to fail to strip the BOM
>> during conversion to UTF-8.
>
> John, you're mistaken about the reason this BOM is here.
>
> In Notepad at least, the BOM is intentionally generate
Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
>
> Le mercredi 13 septembre 2006 à 09:41 -0700, Josiah Carlson a écrit :
> > And is generally ignored, as per unicode spec; it's a "zero width
> > non-breaking space" - an invisible character with no effect on wrapping
> > or otherwise.
>
> Well it w
Jason Orendorff wrote:
> On 9/13/06, John S. Yates, Jr. <[EMAIL PROTECTED]> wrote:
>
>>It is a mistake on Microsoft's part to fail to strip the BOM
>>during conversion to UTF-8.
>
> John, you're mistaken about the reason this BOM is here.
>
> In Notepad at least, the BOM is intentionally generat
Hi,
Le mercredi 13 septembre 2006 à 16:14 -0700, Josiah Carlson a écrit :
> In any case, I believe that the above behavior is correct for the
> context. Why? Because utf-8 has no endianness, its 'generic' decoding
> spelling of 'utf-8' is analagous to all three 'utf-16', 'utf-16-be', and
> 'utf
22 matches
Mail list logo