[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

Jason R. Coombs Sun, 21 Jun 2020 08:37:48 -0700


Jason R. Coombs <[email protected]> added the comment:


>>It seems you may have discovered a use-case that violates that expectation, a 
>>case where `/a.txt` is identical to `a.txt`.

> The thing is: it's not.

I think maybe you misunderstood. I mean that the zipfile you have seems to be 
treating `/a.txt` as a file `a.txt` at the root of the zipfile, identical to 
another zipfile that has an item named `a.txt`.

I'm not saying that zipfile.Path handles that situation; your example clearly 
contradicts that notion.

> I provided minimal example where archive created with zipfile.ZipFile itself 
> reproduces this behaviour. Just prerpend all paths with / an it does not work.

Thank you. I'm grateful for the minimal example. What I'm trying to assess here 
is the impact - how common is this use-case and should it be supported. One 
option here might be to document the library as not supporting files whose 
names begin with a leading slash.

Digging into [the 
spec](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT), Section 
4.4.17.1 explicitly states:

> The path stored MUST NOT contain a drive or device letter, or a leading slash.

It appears the file your client has sent and the minimal example you've 
generated represents an invalid zip file.

In [this branch](https://github.com/jaraco/zipp/tree/bugfix/bpo-41035), I 
started exploring what it would take to support this format. Unfortunately, 
just patching the namelist was not enough. Supporting this change interacts 
with behaviors across a number of methods, so would add substantial complexity 
to the implementation. It becomes inelegant to manage the position in the file 
(`.at` property) when there's ambiguity about the underlying format. It opens 
up lots of questions, like:

- should `at` include the leading slash?
- should the class support zip files with mixed leading and non-leading slashes?
- at what point does `Path` become aware of the format used?
- are there emergent performance concerns?

In other words, the design relies heavily on the assumption that there's one 
way to store a file and two ways to store a directory (explicitly and 
implicitly).

Based on these findings, I'm disinclined to support the format in the canonical 
Path implementation.

What I recommend is that you develop a subclass of zipfile.Path that supports 
the abnormal format, use that for your work, and publish it (perhaps here, 
perhaps as a library) for others with the same problem to use. If enough people 
report it having usefulness, then I'd definitely consider incorporating it into 
the library, either as a separate implementation or perhaps integrating it 
(especially if that can be done without substantially complicating the 
canonical implementation).

Alternately, ask if the client can generate valid zip files. I'm eager to hear 
your thoughts in light of my work. Can we close this as invalid?

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41035>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

Reply via email to