[issue41928] ZipFile does not supports Unicode Path Extra Field (0x7075) zip header field

2021-01-21 Thread Andrea Giudiceandrea


Andrea Giudiceandrea  added the comment:

I submitted more than a month ago a PR that adds support for Unicode Path Extra 
Field in ZipFile.
The PR https://github.com/python/cpython/pull/23736 is awaiting a review in 
order to be merged.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41928] ZipFile does not supports Unicode Path Extra Field (0x7075) zip header field

2020-12-10 Thread Andrea Giudiceandrea


Change by Andrea Giudiceandrea :


--
keywords: +patch
pull_requests: +22595
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/23736

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41928] ZipFile does not supports Unicode Path Extra Field (0x7075) zip header field

2020-12-09 Thread Andrea Giudiceandrea


Change by Andrea Giudiceandrea :


--
nosy: +andreaerdna

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41928] ZipFile does not supports Unicode Path Extra Field (0x7075) zip header field

2020-10-04 Thread Ivan Sorokin

Ivan Sorokin  added the comment:

Grand unified algorithm to read filenames from zip files correctly:

1. Do zip entry have «Unicode Path Extra Field» (0x7075)? Use it for file name.
2. Is Unicode flag (0x800) set in «Flags» Field of zip entry? Assume «Filename» 
Field is in UTF-8.
3. Do «HostOS» Field of zip entry have values of 0 (FAT) or 11 (NTFS)? Assume 
«Filename» Field is in OEM charset corresponding to system locale.
4. Assume «Filename» Field is in UTF-8.

p7zip with oemcp patch (https://github.com/unxed/oemcp/) uses exactly this 
method, and is able to process all zip files in my test set correctly (my test 
set contains several zips generated by different packers on windows, macos, 
linux, and by online services). The same algorithm should be used in any zip 
unpacker wishing to process non-latin filenames as gently as possible.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41928] ZipFile does not supports Unicode Path Extra Field (0x7075) zip header field

2020-10-04 Thread Ivan Sorokin

New submission from Ivan Sorokin :

See attached sample. Well-known unzip command line tool lists its contents 
correctly:

$ unzip -l 23.zip
Archive:  23.zip
  Length  DateTimeName
-  -- -   
81408  2012-10-23 19:03   Β' ΦΑΣΗ ΠΕ06 ΣΧΟΛΕΙΑ ΕΑΕΠ (ΙΝΤ).xls
- ---
81408 1 file

But ZipFile lists the same file inside this archive as
ü' öÇæå Åä06 æòÄèäêÇ äÇäÅ (êîÆ).xls

It's because ZipFile completely ignores Unicode Path Extra Field (0x7075) zip 
header field.

See .ZIP specification for details on this field meaning and usage:
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

--
components: Library (Lib)
files: 23.zip
messages: 377931
nosy: ivan.sorokin.tech
priority: normal
severity: normal
status: open
title: ZipFile does not supports Unicode Path Extra Field (0x7075) zip header 
field
type: enhancement
versions: Python 3.10
Added file: https://bugs.python.org/file49491/23.zip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com