I found a bug in this iconv patch: 04-unzip60-alt-iconv-utf8. The problem is, this patch allocate buffer( which is for storing converted string ) twice the size of source string plus one byte. As seen in line 81-84 of the patch.
+ slen = strlen(string); + s = string; + dlen = buflen = 2*slen; + d = buf = malloc(buflen + 1); This cause conversion fails for some cases. Because, in some character encodings, it requires more than twice the storage to represent a given character in other encodings(especially UTF-8, Ubuntu's default encoding). For example, There are characters HALFWIDTH KATAKANA LETTER. In SHIFT_JIS and CP932 encoding, halfwidth katakana letters are represented in one octet. But, in UTF-8, it requires three octets. For example, 'ア' ( U+FF71: HALFWIDTH KATAKANA LETTER A) is encoded to 0xB1 in Shift_JIS and CP932. This is one octet. But in UTF-8, it is encoded to 0xEF, 0xBD, 0xB1. This is three octets. So, because current unzip just allocate twice the size of source string for buffer, it fails to handle zip file containing a file name consisting all or a lot of half width katakana letter. I suggest to change the size of buffer, four times the size of source string plus one byte. Because, Ubuntu's default encoding is UTF-8 and the largest valid UTF-8 sequence of one character is 4 octet. replace the line 83 of 04-unzip60-alt-iconv-utf8 to the following: + dlen = buflen = 4*slen; -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/580961 Title: unzip fails to deal correctly with filename encodings To manage notifications about this bug go to: https://bugs.launchpad.net/file-roller/+bug/580961/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
