I'm parsing a package file, let's say foo.tar.gz. AutoDetectParser does the right thing in the sense that returns an XHTML file that contains entries for each file in the tar file which is in the gzip file. However, the metadata object returned by top AutoDetectParser contains only the metadata for the outermost package, i.e. the gzip. Obviously the metadata for each file within the tar exists, otherwise PackageParser wouldn't be able to use AutoDetectParser to correctly chain down within the file. (i.e. Somewhere foo.tar/foo.pdf is tagged as application/pdf to enable PDFParser to correctly convert it to text.)

Examining the XHTML returned, reveals nothing. It's just a bunch of <div class="package-entry">s delineating the different entries in the TAR. Is there a way to get the metadata for each entry within a package file, and I'm just missing it? If not, it seems like PackageParser could be modified to spit out a bunch of DIVs of the form: <div class="metadata" name="METADATA-KEY">METADATA-VALUE</div>


--
Jonathan Koren
jonat...@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/


Reply via email to