On Sat, May 23, 2026 at 12:51 PM Galen Seitz <[email protected]> wrote:
>
> Hi,
>
> Are there any Python experts in the house? I've got some code that attempts
> to open an existing zip file and then copy one the files within the zip file
> to two new zip files. The code is throwing an exception indicating that the
> directory and header of the existing zip file differ.
>
> To illustrate the problem, I've reduced the code down to the following. This
> code creates a simple zip file that contains two files, foo.txt and bar.txt.
> It then attempts to create four new zip files. The first two zip files
> contain foo.txt, and the second two zip files should contain bar.txt.
> However,
> the fourth call to create_dst() results in an exception, and I can't figure
> what I'm doing wrong.
>
> Note that running this code will create src.zip and dst[0-3].zip in the
> current directory.
>
>
> #!/usr/bin/env python
>
> import zipfile
> import sys
>
>
> def create_dst(src_zip, archive_name_dst, member_name):
> print(archive_name_dst)
> dst_zip = zipfile.ZipFile(archive_name_dst, 'w',
> compression=zipfile.ZIP_DEFLATED)
> info = src_zip.getinfo(member_name)
> src_member = src_zip.open(info)
> dst_member = dst_zip.open(info, 'w')
> dst_member.write(src_member.read())
> dst_member.close()
> src_member.close()
> dst_zip.close()
>
>
> def main() -> int:
> # Create source zip
> new_zip = zipfile.ZipFile("src.zip", 'w',
> compression=zipfile.ZIP_DEFLATED)
> new_zip.writestr("foo.txt", "Hello")
> new_zip.writestr("bar.txt", "World")
> new_zip.close()
>
> # Open source zip
> src_zip = zipfile.ZipFile("src.zip")
>
> create_dst(src_zip, "dst0.zip", "foo.txt")
> create_dst(src_zip, "dst1.zip", "foo.txt")
> create_dst(src_zip, "dst2.zip", "bar.txt")
> create_dst(src_zip, "dst3.zip", "bar.txt")
>
> src_zip.close()
> return 0
>
>
> if __name__ == '__main__':
> sys.exit(main())
>
>
>
> $ python --version
> Python 3.11.2
>
> $ cat /etc/debian_version
> 12.13
>
> $ ./zipissue.py
> dst0.zip
> dst1.zip
> dst2.zip
> dst3.zip
> Traceback (most recent call last):
> File "/home/galens/jobs/mth/sw/pkgupdate/zipbug/./zipissue.py", line 41,
> in <module>
> sys.exit(main())
> ^^^^^^
> File "/home/galens/jobs/mth/sw/pkgupdate/zipbug/./zipissue.py", line 34,
> in main
> create_dst(src_zip, "dst3.zip", "bar.txt")
> File "/home/galens/jobs/mth/sw/pkgupdate/zipbug/./zipissue.py", line 12,
> in create_dst
> src_member = src_zip.open(info)
> ^^^^^^^^^^^^^^^^^^
> File "/usr/lib/python3.11/zipfile.py", line 1595, in open
> raise BadZipFile(
> zipfile.BadZipFile: File name in directory 'bar.txt' and header b'foo.txt'
> differ.
>
>
>
> Any hints or pointers would be much appreciated. Thanks!
>
>
> galen
> --
> Galen Seitz
> [email protected]
>
My python is not that good, but this fix from Claude does work.
The bug is subtle. The ZipInfo object (info) is being mutated under
the hood when it's passed to dst_zip.open(info, 'w').
Specifically, when you open a ZipInfo for writing into the destination
zip, the zipfile library updates fields on that object (like the
internal header_offset) to reflect its position in the new file. Since
Python passes objects by reference, the same info object that came
from src_zip.getinfo() gets modified. When you later try to use it
again to read from src_zip, the stale/corrupted offset causes the
"directory and header differ" error.
The fix is simple: pass a copy of the ZipInfo to the destination:
python
import copy
def create_dst(src_zip, archive_name_dst, member_name):
print(archive_name_dst)
dst_zip = zipfile.ZipFile(archive_name_dst, 'w',
compression=zipfile.ZIP_DEFLATED)
info = src_zip.getinfo(member_name)
src_member = src_zip.open(info)
dst_member = dst_zip.open(copy.copy(info), 'w') # <-- copy here
dst_member.write(src_member.read())
dst_member.close()
src_member.close()
dst_zip.close()
copy.copy() (a shallow copy) is sufficient since ZipInfo is a flat object.
Why it only fails on the 4th call: The first two calls both use
foo.txt and the third uses bar.txt, but each of those is the first
(and only) use of that particular info object in a given call — so the
mutation doesn't matter because the object isn't reused. The fourth
call reuses bar.txt's info object which was already mutated during the
third call, so the offset now points to the wrong place in src.zip.
An alternative that also works is to call src_zip.getinfo(member_name)
after closing dst_member, so you get a fresh object each time, but the
copy.copy() approach is cleaner.
Bill and Claude