I'm going to push back on the idea that this should only be used for decoding, not encoding.
The use case I started with -- showing people how to fix mojibake using Python -- would *only* use these codecs in the encoding direction. To fix the most common case of mojibake, you encode it as web-1252 and decode it as UTF-8 (because you got the data from someone who did the opposite). I have implemented some decode-only codecs (such as CESU-8), for exactly the reason of "why would you want more text in this encoding", but the situation is different here. On Wed, 17 Jan 2018 at 13:00 Chris Barker <chris.bar...@noaa.gov> wrote: > On Tue, Jan 16, 2018 at 9:30 PM, Stephen J. Turnbull < > turnbull.stephen...@u.tsukuba.ac.jp> wrote: > >> In what context? WHAT-WG's encoding standard is *all about browsers*. >> If a codec is feeding text into a process that renders them all as >> glyphs for a human to look at, that's one thing. The codec doesn't >> want to fatal there, and the likely fallback glyph is something from >> the control glyphs block if even windows-125x doesn't have a glyph >> there. I guess it sort of makes sense. >> > > sure it does -- and python is not a browser, and python itself has > nothigni visual -- but we sure want to be abel to write code that produces > visual representations of maybe messy text... > > if you're feeding a program > > ... > >> the codec has no idea when or how that's >> going to get interpreted. > > > sure -- which is why others have suggested that if WATWG is supported, > then it *should* only be used for encoding, not encoding. But we are > supposed to be consenting adults here -- I see no reason to prevent > encoding -- maybe it would be useful for testing??? > > (as with JSON data, which I believe is >> "supposed" to be UTF-8, but many developers use the legacy charsets >> they're used to and which are often embedded in the underlying >> databases etc, ditto XML), > > > OK -- if developers do the wrong thing, then they do the wrong thing -- we > can't prevent that! > > And Python's lovely "text is unicode" model actually makes that hard to do > wong. But we do need a way to decode messy text, and then send it off to > JSON or whatever properly encoded. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/