On 5/23/26 14:55, Bill Barry wrote:

My python is not that good, but this fix from Claude does work.
> > The bug is subtle. The ZipInfo object (info) is being mutated under
the hood when it's passed to dst_zip.open(info, 'w').

Specifically, when you open a ZipInfo for writing into the destination
zip, the zipfile library updates fields on that object (like the
internal header_offset) to reflect its position in the new file. Since
Python passes objects by reference, the same info object that came
from src_zip.getinfo() gets modified. When you later try to use it
again to read from src_zip, the stale/corrupted offset causes the
"directory and header differ" error.

The fix is simple: pass a copy of the ZipInfo to the destination:


python
import copy

def create_dst(src_zip, archive_name_dst, member_name):
     print(archive_name_dst)
     dst_zip = zipfile.ZipFile(archive_name_dst, 'w',
                               compression=zipfile.ZIP_DEFLATED)
     info = src_zip.getinfo(member_name)
     src_member = src_zip.open(info)
     dst_member = dst_zip.open(copy.copy(info), 'w')  # <-- copy here
     dst_member.write(src_member.read())
     dst_member.close()
     src_member.close()
     dst_zip.close()
copy.copy() (a shallow copy) is sufficient since ZipInfo is a flat object.

Why it only fails on the 4th call: The first two calls both use
foo.txt and the third uses bar.txt, but each of those is the first
(and only) use of that particular info object in a given call — so the
mutation doesn't matter because the object isn't reused. The fourth
call reuses bar.txt's info object which was already mutated during the
third call, so the offset now points to the wrong place in src.zip.

An alternative that also works is to call src_zip.getinfo(member_name)
after closing dst_member, so you get a fresh object each time, but the
copy.copy() approach is cleaner.

Bill and Claude

Thank you Bill! (and begrudgingly Claude. I guess I'll have to tone down my anti-AI rhetoric.)

The reuse of the info object was something that I thought might be occurring, but my attempts to address it were insufficient. I'm not sure I entirely understand your explanation of why the 4th call fails. After reading your email, I tried to fix the problem without using copy, but surprisingly to me, the object seems to get reused as long as the same filename is passed to getinfo(). It appears that the info object gets reused even after a close.

Here's the new, also broken, create_dst() function. This gives the same exception as before.

def create_dst(src_zip, archive_name_dst, member_name):
    print(archive_name_dst)
    dst_zip = zipfile.ZipFile(archive_name_dst, 'w',
                              compression=zipfile.ZIP_DEFLATED)
    src_info = src_zip.getinfo(member_name)
    dst_info = src_zip.getinfo(member_name)
    print(id(src_info))
    print(id(dst_info))
    src_member = src_zip.open(src_info)
    dst_member = dst_zip.open(dst_info, 'w')
    dst_member.write(src_member.read())
    dst_member.close()
    src_member.close()
    dst_zip.close()


$ ./zipissue.py
dst0.zip
140551532133472
140551532133472
dst1.zip
140551532133472
140551532133472
dst2.zip
140551532133888
140551532133888
dst3.zip
140551532133888
140551532133888
Traceback (most recent call last):
...


It's still not clear to me why the 2nd call to create_dst doesn't fail, but it is clear that I need to use copy.copy.

galen
--
Galen Seitz
[email protected]

Reply via email to