Hi Steve, do you know of a definite resource for Windows code pages on MSDN or another official MS website ?
I tried to find some links, but only got these ancient ones: https://msdn.microsoft.com/en-us/library/cc195054.aspx (this version of cp1252 doesn't even have the euro sign yet) Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 19 2018) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ On 19.01.2018 18:17, M.-A. Lemburg wrote: > On 19.01.2018 17:24, Random832 wrote: >> On Fri, Jan 19, 2018, at 08:30, M.-A. Lemburg wrote: >>>> Someone did discover that Microsoft's current implementations of the >>>> windows-* encodings matches the WHAT-WG spec, rather than the Unicode >>>> spec that Microsoft originally wrote. >>> >>> No, MS implements somethings called "best fit encodings" >>> and these are different than what WHATWG uses. >> >> NO. I made this absolutely clear in my previous message, best fit mappings >> can be clearly distinguished from regular mappings by the behavior of the >> native conversion functions with certain argument flags (the mapping of 0xA0 >> to some private use character in cp932, for example, is a best-fit mapping >> in the decoding direction - but is treated as a regular mapping for encoding >> purposes), and the mapping of 0x81 to U+0081 in cp1252 etc is NOT a best fit >> mapping or in any way different from the rest of the mappings. >> >> We are not talking about implementing the best fit mappings. We are talking >> about real regular mappings that actually exist in these codepages that were >> for some unknown reason not included in the files published by Unicode. > > I only know the best fit encoding maps that are available > on the Unicode site. > > If I read your comment correctly, you are saying that MS has > moved away from the standard code pages towards something > else - perhaps even something other than the best fit encodings > listed on the Unicode site ? > > Do you have some references for this ? > > Note that the Windows code page codecs implemented in Python > are all based on the Unicode mapping files and those were > created by MS. > >>> https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130%28v=vs.85%29.aspx >>> >>> unfortunately uses the above mentioned best fit encodings, >>> but this can and should be switched off by specifying the >>> WC_NO_BEST_FIT_CHARS for anything that requires validation >>> or needs to be interoperable: >> >> Specifying this flag (and MB_ERR_INVALID_CHARS in the other direction) in >> fact does not disable the mappings we are discussing. > > Interesting. The CP1252 mapping clearly defines 0x80 to map > to undefined, whereas the bestfit1252 maps it to 0x0081: > > http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT > http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt > > Same for the example you gave for CP932: > > http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT > http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit932.txt > > So at least following the documentation you'd expect the function > to implement the regular mappings. > _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/