Glenn Linderman schrieb:
It does seem, however, that there could be a large class of users of
A::Z that are read-only, and that solutions within A::Z would be better
overall than building caching structures on the outside.
Steffen, you mention "the array implementation" and replacing it with a
"hash-based" implementation. Is the root problem, then, some O(n^2)
algorithm for lookups by name searching through "the array
implementation" to find the name? Or what?
I don't understand where "this bites if somebody does a
$member->fileName("NewName").", nor why. A little more verbose
explanation of the problem might result in more ideas being generated.
And David mentions caching internally... but only for readonly use... my
point, perhaps already thought of and discarded for some valid reason,
is that perhaps it would be possible for A::Z to do the caching
internally, but support updates to the cache for modification
operations? Caches don't necessarily need to be read-only, they can be
updatable... if there is a central place where the "array" data
structure is maintained, could not that code be modified to also update
the (internal) cache, to allow higher performance interfaces?
The problem is that A::Z implements all matches on file names of Zip
members as a search in the array. It's O(n). That might seem decent, but
in case you wonder whether a specific file is in the Zip, you want a
O(1) hash lookup. Additionally, A::Z calls ->fileName() on every
file-object every time it looks for an object by its file name. That's
immensely costly.
I first replaced the array with a file_name => $file_obj hash. That was
a problem because the A::Z tests and possibly some user code rely on the
order of the zip members. Then, I hacked it to become a Tie::IxHash
(ordered hash) implementation. That worked until the following kicked in:
$file_obj has a file name! Associating it with a static copy of that
name outside of the object itself breaks its encapsulation because the
user can do "$file_obj->fileName($newName)" to set it. But that doesn't
change the association in the Archive object (which is now a
hash/IxHash). Working around that case requires that a $file_obj has a
reference to its "parent" Archive. But I'm not even sure that $file_obj
can't belong to several archives, etc. That's really, really tedious and
error prone to implement. (I'm not going to do it.)
Does that make more sense?
(David: Laugh at me now for not realizing the encapsulation issue earlier!)
Steffen