On 12 January 2018 at 14:55, Steve Dower wrote:
> On 12Jan2018 0342, Random832 wrote:
>>
>> On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote:
>>>
>>> The way of solving this issue in Python is using an error handler. The
>>> "surrogateescape" error handler is specially designed for lossless
Executive summary: we already do.
Nathaniel suggests we should conform to the WHAT-WG standard. But
AFAGCT[1], there is no such thing as "WHATWG versions of legacy
encodings". The document at https://encoding.spec.whatwg.org/ has the
following normative specifications (capitalized words are pres
On 12Jan2018 0342, Random832 wrote:
On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote:
The way of solving this issue in Python is using an error handler. The
"surrogateescape" error handler is specially designed for lossless
reversible decoding. It maps every unassigned byte in the range
0x
I certainly have code that joins __module__ with __name__ to create a
fully-qualified name (with special handling for those builtins that are
not in builtins), and IIUC __qualname__ doesn't normally include the
module name either (it's intended for nested types/functions).
Can we make it visib
I like the idea of having a fully qualified name that "works" (can be
resolved).
I don't think that repr() should change, right?
Can this change break the backward compatibility somehow?
Victor
Le 11 janv. 2018 21:00, "Serhiy Storchaka" a écrit :
> Currently the classes of functions (implemen
On 2018-01-11 19:42, Rob Speer wrote:
> The question is rather: how often does web-XXX mojibake happen?
Very often. Particularly web-1252 mixed up with UTF-8.
My ftfy library is tested on data from Twitter and the Common Crawl,
both prime sources of mojibake. One common mojibake sequence is w
On Thu, Jan 11, 2018, at 14:55, Rob Speer wrote:
> There is one more difference I have found between Python's encodings and
> WHATWG's. In Python's codepage 1255, b'\xca' is undefined. In WHATWG's, it
> maps to U+05BA HEBREW POINT HOLAM HASER FOR VAV. I haven't tracked down
> what the Unicode Conso
Currently the classes of functions (implemented in Python and builtin),
methods, and different type of descriptors, generators, etc have the
__module__ attribute equal to "builtins" and the name that can't be
used for accessing the class.
>>> def f(): pass
...
>>> type(f)
>>> type(f).__modul
On Thu, 11 Jan 2018 at 11:43 Random832 wrote:
> Maybe we need a new error handler that maps unassigned bytes in the range
> 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the
> encodings being discussed have behavior other than the "normal" version of
> the encoding plus wh
> The question is rather: how often does web-XXX mojibake happen?
Very often. Particularly web-1252 mixed up with UTF-8.
My ftfy library is tested on data from Twitter and the Common Crawl, both
prime sources of mojibake. One common mojibake sequence is when a right
curly quote is encoded as UTF-
On Thu, Jan 11, 2018, at 03:58, M.-A. Lemburg wrote:
> There's a problem with these encodings: they are mostly meant
> for decoding (broken) data, but as soon as we have them in the stdlib,
> people will also start using them for encoding data, producing more
> corrupted data.
Is it really corrupt
On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote:
> The way of solving this issue in Python is using an error handler. The
> "surrogateescape" error handler is specially designed for lossless
> reversible decoding. It maps every unassigned byte in the range
> 0x80-0xff to a single characte
On Thu, 11 Jan 2018 05:18:43 -0800
Nathaniel Smith wrote:
> I'm not an expert here or anything, but from what we've been hearing it
> sounds like it must be used by all standard-compliant HTML parsers. I don't
> *like* the standard much, but I don't think that the stdlib should refuse
> to handle
On Jan 11, 2018 4:05 AM, "Antoine Pitrou" wrote:
Define "widely used". If web-XXX is a superset of windows-XXX, then
perhaps web-XXX is "used" in the sense of "used to decode valid
windows-XXX data" (but windows-XXX could be used just as well to
decode the same data). The question is rather: ho
On Wed, 10 Jan 2018 16:24:33 -0800
Chris Barker wrote:
> On Wed, Jan 10, 2018 at 11:04 AM, M.-A. Lemburg wrote:
>
> > I don't believe it's a good strategy to create the confusion that
> > WHATWG is introducing by using the same names for non-standard
> > encodings.
> >
>
> agreed.
>
>
> > P
Op 11 jan. 2018 10:56 schreef "Serhiy Storchaka" :
09.01.18 23:15, Rob Speer пише:
>
>
> For the sake of discussion, let's call this encoding "web-1252". WHATWG
> calls it "windows-1252",
I'd suggest to name it then
"whatwg-windows-152".
and in general
"whatwg-" + whatgwgs_name_of_encoding
S
09.01.18 23:15, Rob Speer пише:
There is an encoding with no name of its own. It's supported by every
current web browser and standardized by WHATWG. It's so prevalent that
if you ask a Web browser to decode "iso-8859-1" or "windows-1252", you
will get this encoding _instead_. It is probably th
On 11.01.2018 10:01, Chris Angelico wrote:
> On Thu, Jan 11, 2018 at 7:58 PM, M.-A. Lemburg wrote:
>> On 11.01.2018 01:22, Nick Coghlan wrote:
>>> On 11 January 2018 at 05:04, M.-A. Lemburg wrote:
For the stdlib, I think we should stick to standards and
not go for spreading non-standard
On Thu, Jan 11, 2018 at 7:58 PM, M.-A. Lemburg wrote:
> On 11.01.2018 01:22, Nick Coghlan wrote:
>> On 11 January 2018 at 05:04, M.-A. Lemburg wrote:
>>> For the stdlib, I think we should stick to standards and
>>> not go for spreading non-standard ones.
>>>
>>> So -1 on adding WHATWG encodings t
On 11.01.2018 01:22, Nick Coghlan wrote:
> On 11 January 2018 at 05:04, M.-A. Lemburg wrote:
>> For the stdlib, I think we should stick to standards and
>> not go for spreading non-standard ones.
>>
>> So -1 on adding WHATWG encodings to the stdlib.
>
> We already support HTML5 in the standard li
20 matches
Mail list logo