[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-22 Thread sorrow


sorrow  added the comment:

>`iteritems()`

I meant `iterdir()` of course.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-22 Thread sorrow


sorrow  added the comment:

Here's what I came up with:

```python
class ZipPath(zipfile.Path):
def __init__(self, root, at=""):
super().__init__(root, at)
if not at.startswith("/") and self.root.namelist()[0].startswith("/"):
self.at = f"/{at}"

def __repr__(self):
return (
f"{self.__class__.__name__}({self.root.filename!r}, "
f"{self.at.lstrip('/')!r})"
)

def __str__(self):
return posixpath.join(self.root.filename, self.at.lstrip("/"))

def _is_child(self, path):
return posixpath.dirname(path.at.strip("/")) == self.at.strip("/")

def _next(self, at):
return self.__class__(self.root, at)
```

Pretty simple. The main things are going on in `__init__` and `_is_child` 
methods. These changes are enough for `iteritems()` to work. I decided to 
include the leading slash in the `at`, but strip it for the outside world 
(__str__ and __repr__). Also, I had to override the `_next` method because it 
makes `Path` objects no matter what class it's called from.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-21 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

Yes, I generally agree with your assessment. Let me know if you have any 
questions about the implementation as you're exploring a solution.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-21 Thread Jason R. Coombs


Change by Jason R. Coombs :


--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-21 Thread sorrow


sorrow  added the comment:

>When the class is created? 

I mean the class instance

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-21 Thread sorrow


sorrow  added the comment:

>`Path` public AP.

API of course

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-21 Thread sorrow


sorrow  added the comment:

>how common is this use-case

I don't know about that. I just know that I have to make my program work with 
these files. Asking the clients to stop using this (presumably incorrect) 
format (or the program that makes it) is not an option.

>It appears the file your client has sent and the minimal example you've 
>generated represents an invalid zip file.

Well, I didn't know that. But it doesn't change the matter for me.

>What I recommend is that you develop a subclass of zipfile.Path that supports 
>the abnormal format, use that for your work, and publish it (perhaps here, 
>perhaps as a library) for others with the same problem to use.

I guess I'll have to go this way.

>It opens up lots of questions, like:

I'll share my considerations about how I see it should be implemented.

>- should `at` include the leading slash?

I think it doesn't matter, because `at` property is not part of `Path` public 
AP. Ideally, the files with leading slashes and without them should be treated 
exactly the same (as if they didn't have the leading slash). So, the only thing 
I'm concerned with is `at` argument of `Path.__init__`. It shouldn't require 
leading slash in any case.

>- should the class support zip files with mixed leading and non-leading 
>slashes?

No, this is definitely an error and shouldn't work.

>- at what point does `Path` become aware of the format used?

When the class is created? 

>- are there emergent performance concerns?

For me - no, there aren't. I don't know how this kind of questions is resolved 
in python community though.

>Can we close this as invalid?

I guess you can.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-21 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

>>It seems you may have discovered a use-case that violates that expectation, a 
>>case where `/a.txt` is identical to `a.txt`.

> The thing is: it's not.

I think maybe you misunderstood. I mean that the zipfile you have seems to be 
treating `/a.txt` as a file `a.txt` at the root of the zipfile, identical to 
another zipfile that has an item named `a.txt`.

I'm not saying that zipfile.Path handles that situation; your example clearly 
contradicts that notion.

> I provided minimal example where archive created with zipfile.ZipFile itself 
> reproduces this behaviour. Just prerpend all paths with / an it does not work.

Thank you. I'm grateful for the minimal example. What I'm trying to assess here 
is the impact - how common is this use-case and should it be supported. One 
option here might be to document the library as not supporting files whose 
names begin with a leading slash.

Digging into [the 
spec](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT), Section 
4.4.17.1 explicitly states:

> The path stored MUST NOT contain a drive or device letter, or a leading slash.

It appears the file your client has sent and the minimal example you've 
generated represents an invalid zip file.

In [this branch](https://github.com/jaraco/zipp/tree/bugfix/bpo-41035), I 
started exploring what it would take to support this format. Unfortunately, 
just patching the namelist was not enough. Supporting this change interacts 
with behaviors across a number of methods, so would add substantial complexity 
to the implementation. It becomes inelegant to manage the position in the file 
(`.at` property) when there's ambiguity about the underlying format. It opens 
up lots of questions, like:

- should `at` include the leading slash?
- should the class support zip files with mixed leading and non-leading slashes?
- at what point does `Path` become aware of the format used?
- are there emergent performance concerns?

In other words, the design relies heavily on the assumption that there's one 
way to store a file and two ways to store a directory (explicitly and 
implicitly).

Based on these findings, I'm disinclined to support the format in the canonical 
Path implementation.

What I recommend is that you develop a subclass of zipfile.Path that supports 
the abnormal format, use that for your work, and publish it (perhaps here, 
perhaps as a library) for others with the same problem to use. If enough people 
report it having usefulness, then I'd definitely consider incorporating it into 
the library, either as a separate implementation or perhaps integrating it 
(especially if that can be done without substantially complicating the 
canonical implementation).

Alternately, ask if the client can generate valid zip files. I'm eager to hear 
your thoughts in light of my work. Can we close this as invalid?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-21 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

I created [jaraco/zipp#56](https://github.com/jaraco/zipp/issues/56) to track 
the issue in the backport.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-20 Thread sorrow


sorrow  added the comment:

>It seems you may have discovered a use-case that violates that expectation, a 
>case where `/a.txt` is identical to `a.txt`.

The thing is: it's not.

>Can you tell me more about your use-case and why zipp.Path/zipfile.Path should 
>support it?

I received a .zip file and zipfile.Path in my code didn't work with it. I did 
some testing and discovered that id does not work properly with these archives. 
It cannot list the contents of such archive.

>Is this behavior a result of a real-world example

Yes, it is.

>(please share details about the origin)

I can't. First, the origin of this archive is not totally clear to me (and I do 
not want to investigate). And second, I'm under NDA.

I provided minimal example where archive created with zipfile.ZipFile itself 
reproduces this behaviour. Just prerpend all paths with / an it does not work.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-20 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

Thanks sorrow for filing a report.

I primarily developed this functionality. As I did, I found the 'zip' format to 
be under-specified, so I used real-world examples as models to infer a spec.

It seems you may have discovered a use-case that violates that expectation, a 
case where `/a.txt` is identical to `a.txt`.

My instinct is that `zipfile.Path` should support 99% of real-world use-cases 
and that other use-cases may not be supported or may require additional 
consideration (wrappers, subclasses) to support.

Can you tell me more about your use-case and why zipp.Path/zipfile.Path should 
support it? Is this behavior a result of a real-world example (please share 
details about the origin) or contrived?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-20 Thread Brett Cannon


Change by Brett Cannon :


--
nosy: +alanmcintyre, serhiy.storchaka, twouters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-20 Thread Brett Cannon


Change by Brett Cannon :


--
nosy: +jaraco

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-19 Thread sorrow


New submission from sorrow :

I encountered errors when I had to work with ZPI file where path start with "/"

--
components: Library (Lib)
messages: 371880
nosy: sorrow
priority: normal
severity: normal
status: open
title: zipfile.Path does not work properly with zip archives where paths start 
with /
type: behavior
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41035] zipfile.Path does not work properly with zip archives where paths start with /

2020-06-19 Thread sorrow


sorrow  added the comment:

>>> import zipfile
>>> import io
>>> data = io.BytesIO()
>>> zf = zipfile.ZipFile(data, 'w')
>>> zf.writestr('/a.txt', 'content of a')
>>> zf.filename = 'abcde.zip'
>>> root = zipfile.Path(zf)
>>> list(root.iterdir())
[]
>>> root.exists()
False

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com