[issue45981] Get raw file name in bytes from ZipFile

2021-12-15 Thread Devourer Station
Devourer Station added the comment: I do think providing a rawfile field in the ZipInfo struct helps. As a library, ZipFile should let users know what they are dealing with. Users can get data from zip files, and ZipFile shouldn't corrupt them. I don't mean that we should provide everything in

[issue45981] Get raw file name in bytes from ZipFile

2021-12-05 Thread Daniel Hillier
Daniel Hillier added the comment: Handling different character sets is not completely supported yet. There are a couple of open issues relating to this: https://bugs.python.org/issue40407 (reading file names), https://bugs.python.org/issue41928 (support for reading and writing filenames usin

[issue45981] Get raw file name in bytes from ZipFile

2021-12-04 Thread Eric V. Smith
Eric V. Smith added the comment: UTF-16 uses null bytes. I'm sure there are other encodings that do, too. But I don't know if these encodings are permitted or common in zip files. -- ___ Python tracker

[issue45981] Get raw file name in bytes from ZipFile

2021-12-04 Thread Devourer Station
Devourer Station added the comment: Null bytes appear in abnormal zip files. (I haven't seen any multibyte encoding that represents a character with null bytes) But non-utf8 encodings are common in normal zip files, as windows uses different encodings for different language settings. (On the

[issue45981] Get raw file name in bytes from ZipFile

2021-12-04 Thread Eric V. Smith
Eric V. Smith added the comment: You would also need to decide what to do with these lines, just before the os.sep test: # Terminate the file name at the first null byte. Null bytes in file # names are used as tricks by viruses in archives. null_byte = filename.find(c

[issue45981] Get raw file name in bytes from ZipFile

2021-12-04 Thread Devourer Station
Devourer Station added the comment: In file Lib/zipfile.py: 1357> flags = centdir[5] 1358> if flags & 0x800: 1359># UTF-8 file names extension 1360>filename = filename.decode('utf-8') 1361> else: 1362># Historical ZIP filename encoding 1363>filename = filename.decode('cp437')

[issue45981] Get raw file name in bytes from ZipFile

2021-12-04 Thread Devourer Station
New submission from Devourer Station : It's quite annoying that ZipFile corrupts the filename by simply replacing '\\' with '/', not providing the raw file name in bytes to us. -- components: Library (Lib) messages: 407665 nosy: accelerator0099 priority: normal severity: normal status: