Patches item #1455898, was opened at 2006-03-22 16:31 Message generated for change (Comment added) made by ocean-city You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1455898&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Windows Group: Python 2.4 Status: Open Resolution: None Priority: 7 Submitted By: ocean-city (ocean-city) Assigned to: Walter Dörwald (doerwalter) Summary: patch for mbcs codecs Initial Comment: Hello. I have noticed mbcs codecs sometimes generates broken string. I'm using Windows(Japanese) so mbcs is mapped to cp932 (close to shift_jis) When I run the attached script "a.zip", the entry "Error 00007"'s message becomes broken like attached file "b.txt". I think this happens because the string passed to PyUnicode_DecodeMBCS() sometimes terminates with leading byte, and MultiByteToWideChar() counts it for size of result string.buffer size. I hope attached patch "mbcs.patch" may fix the problem. It would be nice if this bug will be fixed in 2.4.3... Thank you. ---------------------------------------------------------------------- >Comment By: ocean-city (ocean-city) Date: 2006-03-23 11:51 Message: Logged In: YES user_id=1200846 Hello. This is my final patch. (mbcs4.patch) - mbcs3a.patch: _mbsbtype depends on locale not system ANSI code page. so probably it's not good to use it with MultiByteToWideChar. - mbcs3b.patch: CharNext may cause buffer overflow. and this patch always calls CharPrev but it's not needed if string is not terminated with "potensial" lead byte. I hope this is stable enough to commit on repositry. Thank you. ---------------------------------------------------------------------- Comment By: ocean-city (ocean-city) Date: 2006-03-22 23:36 Message: Logged In: YES user_id=1200846 Sorry, I was stupid. MSDN (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_0o2t.asp) saids, > IsDBCSLeadByte can only indicate a potential lead byte value. IsDBCSLeadByte was returning 1 for some trail byte (ex: "æ´"[1]) The patch "mbcs3a.patch" worked for me, but _mbsbtype is probably compiler specific. Is that OK? The patch "mbcs3b.patch" also worked for me and it only uses Win32API, but I don't have enough faith on this implementation... ---------------------------------------------------------------------- Comment By: ocean-city (ocean-city) Date: 2006-03-22 19:31 Message: Logged In: YES user_id=1200846 Sorry, I found problem when tried more long text file... Please wait. I'll investigate more intensibly. ---------------------------------------------------------------------- Comment By: ocean-city (ocean-city) Date: 2006-03-22 19:13 Message: Logged In: YES user_id=1200846 Thank you for reply. How about this? (I'm a newbie, I hope this is right tex format but... can you confirm this? I created this patch by copy & paste from PyUnicode_DecodeUTF16Stateful and some modification) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-03-22 18:12 Message: Logged In: YES user_id=38388 One more nit: the doc patch is missing. Please add a patch for the API docs. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-03-22 18:11 Message: Logged In: YES user_id=38388 As I understand your comment, the mbcs codec will have a problem if the input string terminates with a lead byte. Could you add a comment regarding this to the patch ?! I can't test the patch, since I don't have a Japanese Windows to check on, but from looking at the patch, it seems OK. ---------------------------------------------------------------------- Comment By: ocean-city (ocean-city) Date: 2006-03-22 16:42 Message: Logged In: YES user_id=1200846 I forgot to mention this. "mbcs.patch" is for release24-maint branch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1455898&group_id=5470 _______________________________________________ Patches mailing list [email protected] http://mail.python.org/mailman/listinfo/patches
