This is from one of my machine running LUbuntu:

$ export |grep LANG
declare -x LANG="en_US.UTF-8"

$ export |grep LC
declare -x LC_ADDRESS="en_US.UTF-8"
declare -x LC_IDENTIFICATION="en_US.UTF-8"
declare -x LC_MEASUREMENT="en_US.UTF-8"
declare -x LC_MONETARY="en_US.UTF-8"
declare -x LC_NAME="en_US.UTF-8"
declare -x LC_NUMERIC="en_US.UTF-8"
declare -x LC_PAPER="en_US.UTF-8"
declare -x LC_TELEPHONE="en_US.UTF-8"
declare -x LC_TIME="en_US.UTF-8"

$ unzip -h
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
...

Use the file from here: http://www1.axfc.net/uploader/Sc/so/325701.zip
(passwd: backer) (CP932)

$ unzip celluloid.zip 
Archive:  celluloid.zip
  inflating: celluloid/readme.txt    
  inflating: celluloid/В╣ВщВчВдВ╟.ust  
  inflating: celluloid/В╣ВщВчВдВ╟2Ф╘.ust  
  inflating: celluloid/В╣ВщВчВдВ╟СхГTГrСOВйВч.ust  

$ unzip -O cp932 celluloid.zip 
Archive:  celluloid.zip
  inflating: celluloid/readme.txt    
  inflating: celluloid/せるらうど.ust  
  inflating: celluloid/せるらうど2番.ust  
  inflating: celluloid/せるらうど大サビ前から.ust  

$ unzip -O cp936 celluloid.zip 
Archive:  celluloid.zip
  inflating: celluloid/readme.txt    
  inflating: celluloid/偣傞傜偆偳.ust  
  inflating: celluloid/偣傞傜偆偳2斣.ust  
  inflating: celluloid/偣傞傜偆偳戝僒價慜偐傜.ust  

$ unzip -O cp950 celluloid.zip 
Archive:  celluloid.zip
  inflating: celluloid/readme.txt    
  inflating: celluloid/�����炤��.ust  
  inflating: celluloid/�����炤��2��.ust  
  inflating: celluloid/�����炤�Ǒ��T�r�O����.ust  

Another file from here  http://3jf.wodemo.com/file/310894   (CP936)

$ unzip -L 王妃.zip 
Archive:  王妃.zip
  inflating: ═їх·_a.ust         
  inflating: ═їх·_b.ust         

$ unzip -O cp932 王妃.zip 
Archive:  王妃.zip
  inflating: ヘ銈A.ust          
  inflating: ヘ銈B.ust          

$ unzip -O cp936 王妃.zip 
Archive:  王妃.zip
  inflating: 王妃_A.ust            
  inflating: 王妃_B.ust            

$ unzip -O cp950 王妃.zip 
Archive:  王妃.zip
  inflating: 卼漦_A.ust            
  inflating: 卼漦_B.ust            

Actually, not all the wrong cases map to illegal UTF8 string (question
marks). I guess why an auto-detect is not so straight forward?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1422290

Title:
  Default charsets handling for Windows archives in CJKV+th locale

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/1422290/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to