Martin v. Löwis wrote: > Walter Dörwald wrote: >>>> The best way to throughly test the patch is of course to check it in. ;) >>> Is it too risky? ;) >> At least I'd like to get a second review of the patch. > > I've reviewed it, and am likely to check it in.
Great! > I notice that the > patch still has problems. In particular, it is limited to "DBCS" > (and SBCS) character sets in the strict sense; general "MBCS" > character sets are not supported. There are a few of these, most > notably the ISO-2022 ones, UTF-8, and GB18030 (can't be bothered > to look up the code page numbers for them right now). True, but there's no IsMBCSLeadByte(). And passing MB_ERR_INVALID_CHARS in a call to MultiByteToWideChar() doesn't help either, because AFAICT there's no information about the error location. What could work would be to try MultiByteToWideChar() with various string lengths to try to determine whether the error is due to an incomplete byte sequence or invalid data. But that sounds ugly and slow to me. > What I don't know is whether any Windows locale uses a "true" > MBCS character set as its "ANSI" code page. > > The approach taken in the patch could be extended to GB18030 and > UTF-8 in principle, Would that mean that we'd have to determine the active code page and implement the incomplete byte sequence detection ourselves? > but can't possibly work for ISO-2022. So does that mean that IsDBCSLeadByte() returns garbage in this case? Servus, Walter _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com