[Bug 580961] Re: unzip fails to deal correctly with filename encodings

ryou ezoe Wed, 04 Apr 2012 14:06:19 -0700

I found a bug in this iconv patch: 04-unzip60-alt-iconv-utf8.
The problem is, this patch allocate buffer( which is for storing converted 
string ) twice the size of source string plus one byte.
As seen in line 81-84 of the patch.


+    slen = strlen(string);
+    s = string;
+    dlen = buflen = 2*slen;
+    d = buf = malloc(buflen + 1);

This cause conversion fails for some cases.
Because, in some character encodings, it requires more than twice the storage 
to represent a given character in other encodings(especially UTF-8, Ubuntu's 
default encoding).

For example, There are characters HALFWIDTH KATAKANA LETTER.
In SHIFT_JIS and CP932 encoding, halfwidth  katakana letters are represented in 
one octet.
But, in UTF-8, it requires three octets.

For example, 
'ｱ' ( U+FF71: HALFWIDTH KATAKANA LETTER A)
is encoded to 0xB1 in Shift_JIS and CP932.
This is one octet.
But in UTF-8, it is encoded to 0xEF, 0xBD, 0xB1.
This is three octets.

So, because current unzip just allocate twice the size of source string
for buffer, it fails to handle zip file containing a file name
consisting all or a lot of half width katakana letter.

I suggest to change the size of buffer, four times the size of source string 
plus one byte.
Because, Ubuntu's default encoding is UTF-8 and the largest valid UTF-8 
sequence of one character is 4 octet.

replace the line 83 of 04-unzip60-alt-iconv-utf8 to the following: 
+    dlen = buflen = 4*slen;

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/580961

Title:
  unzip fails to deal correctly with filename encodings

To manage notifications about this bug go to:
https://bugs.launchpad.net/file-roller/+bug/580961/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 580961] Re: unzip fails to deal correctly with filename encodings

Reply via email to