Bugs item #1532726, was opened at 2006-08-02 06:20 Message generated for change (Comment added) made by ocean-city You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1532726&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Jan-Willem (jwnmulder) Assigned to: Nobody/Anonymous (nobody) Summary: incorrect behaviour of PyUnicode_EncodeMBCS? Initial Comment: Using python 2.4.3 This behaviour is not reproducable on a window or linux machine. I found the bug when trying to find a problem on python 2.4.3 ported to the xbox. running the next two commands test_string = 'encode me' print repr(test_string.encode('mbcs')) results on windows in : 'encode me' and on the xbox : 'encode me\\x00' The problem is that 'PyUnicode_EncodeMBCS' returns an PyStringObject that contains the data 'encode me' but with an object size of 10. string_repr(test_string) assumes the string contains a 0 character and encodes it as '\\x00' looking at the function 'PyUnicode_EncodeMBCS(const Py_UNICODE *p, int size, const char *errors)' there are basicly two functions { mbcssize = WideCharToMultiByte(CP_ACP, 0, p, size, NULL, 0, NULL, NULL); repr = PyString_FromStringAndSize(NULL, mbcssize); } WideCharToMultiByte returns the nummer of bytes needed for the buffer, because of the string termination this functions returns 10. PyString_FromStringAndSize assumes its second argument to be the number of needed characters, not bytes. So an easy fix would be to change repr = PyString_FromStringAndSize(NULL, mbcssize); in repr = PyString_FromStringAndSize(NULL, mbcssize - 1); Just checked the 2.4.3 svn trunk and it contains the same bug. ---------------------------------------------------------------------- Comment By: Hirokazu Yamamoto (ocean-city) Date: 2006-08-02 14:31 Message: Logged In: YES user_id=1200846 I think this is not related to that patch. On my win2000sp4, teminating null character is not passed to PyUnicode_EncodeMBCS. ////////////////////////////////////////////// // patch for debug (release24-maint branch) Index: Objects/unicodeobject.c =================================================================== --- Objects/unicodeobject.c (revision 51033) +++ Objects/unicodeobject.c (working copy) @@ -2782,6 +2782,20 @@ char *s; DWORD mbcssize; +{ /* debug */ + + int i; + + printf("------------> %d\n", size); + + for (i = 0; i < size; ++i) { + printf("%d ", (int)p[i]); + } + + printf("\n"); + +} /* debug */ + /* If there are no characters, bail now! */ if (size==0) return PyString_FromString(""); ////////////////////////////////// // a.py test_string = 'encode me' print repr(test_string.encode('mbcs')) ////////////////////////////////// // result R:\>py a.py ------------> 9 101 110 99 111 100 101 32 109 101 'encode me' [7660 refs] And I tried this. #include <windows.h> #include <stdio.h> #include <stdlib.h> void count(LPCWSTR w, int size) { char *buf; int i; const int ret = ::WideCharToMultiByte( CP_ACP, 0, w, size, NULL, 0, NULL, NULL ); if (ret == 0) { printf("error\n"); } else { printf("%d\n", ret); } buf = (char*)malloc(ret); ::WideCharToMultiByte( CP_ACP, 0, w, size, buf, ret, NULL, NULL ); for (i = 0; i < ret; ++i) { printf("%d ", (int)buf[i]); } printf("\n"); free(buf); } int main() { count(L"encode me", 9); count(L"encode me", 10); /* include null charater */ } /* 9 101 110 99 111 100 101 32 109 101 10 101 110 99 111 100 101 32 109 101 0 */ As stated in http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_2bj9.asp , WideCharToMultiByte never output null character if source string doesn't contain null character. So I think usage of WideCharToMultiByte is correct. I don't know why, but probably some behavior difference should exist between win2000 and xbox. (ie: xbox calls PyUnicode_EncodeMBCS with size 10 ... or WideCharToMultiByte on xbox outputs null character even if source string doesn't contain it?) Can you try above C code and debug patch on xbox? ---------------------------------------------------------------------- Comment By: Jan-Willem (jwnmulder) Date: 2006-08-02 06:30 Message: Logged In: YES user_id=770969 related to patch 1455898 ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1532726&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com