On Sat, May 23, 2026 at 4:05 PM Galen Seitz <[email protected]> wrote:
>
> On 5/23/26 14:55, Bill Barry wrote:
>
> > My python is not that good, but this fix from Claude does work.
> > > The bug is subtle. The ZipInfo object (info) is being mutated under
> > the hood when it's passed to dst_zip.open(info, 'w').
> >
> > Specifically, when you open a ZipInfo for writing into the destination
> > zip, the zipfile library updates fields on that object (like the
> > internal header_offset) to reflect its position in the new file. Since
> > Python passes objects by reference, the same info object that came
> > from src_zip.getinfo() gets modified. When you later try to use it
> > again to read from src_zip, the stale/corrupted offset causes the
> > "directory and header differ" error.
> >
> > The fix is simple: pass a copy of the ZipInfo to the destination:
> >
> >
> > pythonCan you go through the original program, the one without your fix and
> > explain each place where the header_offset gets manipulated and why?
> > import copy
> >
> > def create_dst(src_zip, archive_name_dst, member_name):
> > print(archive_name_dst)
> > dst_zip = zipfile.ZipFile(archive_name_dst, 'w',
> > compression=zipfile.ZIP_DEFLATED)
> > info = src_zip.getinfo(member_name)
> > src_member = src_zip.open(info)
> > dst_member = dst_zip.open(copy.copy(info), 'w') # <-- copy here
> > dst_member.write(src_member.read())
> > dst_member.close()
> > src_member.close()
> > dst_zip.close()
> > copy.copy() (a shallow copy) is sufficient since ZipInfo is a flat object.
> >
> > Why it only fails on the 4th call: The first two calls both use
> > foo.txt and the third uses bar.txt, but each of those is the first
> > (and only) use of that particular info object in a given call — so the
> > mutation doesn't matter because the object isn't reused. The fourth
> > call reuses bar.txt's info object which was already mutated during the
> > third call, so the offset now points to the wrong place in src.zip.
> >
> > An alternative that also works is to call src_zip.getinfo(member_name)
> > after closing dst_member, so you get a fresh object each time, but the
> > copy.copy() approach is cleaner.
When you first open the src file the info.header_offset is 0 at the
beginning of the file.
When you run src_member = src_zip.open(info) the header_offset is
still 0 and foo.txt is at offset 0 checks to see that foo.txt is at
the offset you sent it and the test passes
Now you use the same info for the write. The write is always writing
to the beginning of a file becaus
When you first open the src file the info.header_offset is 0 at the
beginning of the file.
When you run src_member = src_zip.open(info) the header_offset is
still 0 and foo.txt is at offset 0 checks to see that foo.txt is at
the offset you sent it and the test passes
Now you use the same info for the write. The write is always writing
to the beginning of a file because it is only writing one file. After
the write it leaves the offset_header at 0 the place it wrote to.
Now header_offset is zero and you try to write foo.txt again. Itt
checks again to see if foo.txt and the header_offset match and they
do.
Now you go for bar.txt. info.header_offset is set to 44 where bar.txt
is. src_zip.open checks if bar.txt is at 44 which it is and you are
good.
Then the writee it is only writing one file. After the write it leaves
the offset_header at 0 the place it wrote to.
Now header_offset is zero and you try to write foo.txt again. Itt
checks again to see if foo.txt and the header_offset match and they
do.
Now you go for bar.txt. info.header_offset is set to 44 where bar.txt
is. src_zip.open checks if bar.txt is at 44 which it is and you are
good.
Then the write
> >
> > Bill and Claude
>
> Thank you Bill! (and begrudgingly Claude. I guess I'll have to tone
> down my anti-AI rhetoric.)
>
> The reuse of the info object was something that I thought might be
> occurring, but my attempts to address it were insufficient. I'm not
> sure I entirely understand your explanation of why the 4th call fails.
> After reading your email, I tried to fix the problem without using copy,
> but surprisingly to me, the object seems to get reused as long as the
> same filename is passed to getinfo(). It appears that the info object
> gets reused even after a close.Can you go through the original program, the
> one without your fix and explain each place where the header_offset gets
> manipulated and why?
>
> Here's the new, also broken, create_dst() function. This gives the same
> exception as before.
>
> def create_dst(src_zip, archive_name_dst, member_name):
> print(archive_name_dst)
When you first open the src file the info.header_offset is 0 at the
beginning of the file.
When you run src_member = src_zip.open(info) the header_offset is
still 0 and foo.txt is at offset 0 checks to see that foo.txt is at
the offset you sent it and the test passes
Now you use the same info for the write. The write is always writing
to the beginning of a file because it is only writing one file. After
the write it leaves the offset_header at 0 the place it wrote to.
Now header_offset is zero and you try to write foo.txt again. Itt
checks again to see if foo.txt and the header_offset match and they
do.
Now you go for bar.txt. info.header_offset is set to 44 where bar.txt
is. src_zip.open checks if bar.txt is at 44 which it is and you are
good.
Then the write
> dst_zip = zipfile.ZipFile(archive_name_dst, 'w',
> compression=zipfile.ZIP_DEFLATED)
When you first open the src file the info.header_offset is 0 at the
beginning of the file.
When you run src_member = src_zip.open(info) the header_offset is
still 0 and foo.txt is at offset 0 checks to see that foo.txt is at
the offset you sent it and the test passes
Now you use the same info for the write. The write is always writing
to the beginning of a file because it is only writing one file. After
the write it leaves the offset_header at 0 the place it wrote to.
Now header_offset is zero and you try to write foo.txt again. Itt
checks again to see if foo.txt and the header_offset match and they
do.
Now you go for bar.txt. info.header_offset is set to 44 where bar.txt
is. src_zip.open checks if bar.txt is at 44 which it is and you are
good.
Then the write
> src_info = src_zip.getinfo(member_name)
> dst_info = src_zip.getinfo(member_name)
> print(id(src_info))
> print(id(dst_info))
> src_member = src_zip.open(src_info)
> dst_member = dst_zip.open(dst_info, 'w')
> dst_member.write(src_member.read())
> dst_member.close()
> src_member.close()
> dst_zip.close()
>
>
> $ ./zipissue.py
> dst0.zip
> 140551532133472
When you first open the src file the info.header_offset is 0 at the
beginning of the file.
When you run src_member = src_zip.open(info) the header_offset is
still 0 and foo.txt is at offset 0 checks to see that foo.txt is at
the offset you sent it and the test passes
Now you use the same info for the write. The write is always writing
to the beginning of a file because it is only writing one file. After
the write it leaves the offset_header at 0 the place it wrote to.
Now header_offset is zero and you try to write foo.txt again. Itt
checks again to see if foo.txt and the header_offset match and they
do.
Now you go for bar.txt. info.header_offset is set to 44 where bar.txt
is. src_zip.open checks if bar.txt is at 44 which it is and you are
good.
Then the write
> 140551532133472
> dst1.zip
> 140551532133472
> 140551532133472
> dst2.zip
> 140551532133888
> 140551532133888
> dst3.zip
> 140551532133888
> 140551532133888
> Traceback (most recent call last):
> ...
When you first open the src file the info.header_offset is 0 at the
beginning of the file.
When you run src_member = src_zip.open(info) the header_offset is
still 0 and foo.txt is at offset 0 checks to see that foo.txt is at
the offset you sent it and the test passes
Now you use the same info for the write. The write is always writing
to the beginning of a file because it is only writing one file. After
the write it leaves the offset_header at 0 the place it wrote to.
Now header_offset is zero and you try to write foo.txt again. Itt
checks again to see if foo.txt and the header_offset match and they
do.
Now you go for bar.txt. info.header_offset is set to 44 where bar.txt
is. src_zip.open checks if bar.txt is at 44 which it is and you are
good.
Then the write
>
>
> It's still not clear to me why the 2nd call to create_dst doesn't fail,
> but it is clear that I need to use copy.copy.
>
The 2nd call did not fail because of luck. The header_offset for
foot.txt in src.zip is 0 and the header_offset that it wrote it to in
dst0.zip was also 0. The creation of dst0.zip overwrote the
header_offset but it overwrote it with the same value.
The 4th call is not so lucky. info = src_zip.getinfo("bar.txt") reads
the header_offset from a dictionary as 44. Then the creation of
dst3.zip writes bar.text to the beginning of dst3.zip with
header_offset 0 and overwrites the object in the dictionary. The next
time that info=src_ziop.getinfo("bar.txt") is called it gets the
overwritten value of 0, not the correct value of 44. It looks in the
zip file at offset 0 for bar.txt and sees that it is not bar.txt but
foo.txt and gives up.
Bill