I've written readers for both from scratch. Tar isn't that bad since it's blocked - you read the header, skip forward N blocks, continue. The hardest part is setting up the decompression libraries if you want to support tar.gz or tar.bz2 files.
Zip files are more complex. You have (iirc) 5 control blocks - start of archive, start of file, end of file, start of index, end of archive, and the information in the control block is pretty limited. That's not a huge burden since there's support for extensions for things like the unix file metadata. One complication is that you need to support compression from the start. Zip files support two types of encryption. There's a really weak version that almost nobody supports and a much stronger modern version that's subject to license restrictions. (Some people use the weak version on embedded systems because of legal requirements to /do something/, no matter how lame.) There are third-party libraries, of course, but that introduces dependencies. Both formats are simple enough to write from scratch. I guess my bigger question is if there's an interest in either or both for "real" use. I'm doing this as an exercise but am willing to contrib the code if there's a general interest in it. (BTW the more complex object I'm working on is the .p12 keystore for digital certificates and private keys. We have everything we need in the openssl library so there's no additional third-party dependencies. I have a minimal FDW for the digital certificate itself and am now working on a way to access keys stored in a standard format on the filesystem instead of in the database itself. A natural fit is a specialized archive FDW. Unlike tar and zip it will have two payloads, the digital certificate and the (optionally encrypted) private key. It has searchable metadata, e.g., finding all records with a specific subject.) Bear On Mon, Aug 17, 2015 at 8:29 AM, Greg Stark <st...@mit.edu> wrote: > On Mon, Aug 17, 2015 at 3:14 PM, Bear Giles <bgi...@coyotesong.com> wrote: > > I'm starting to work on a tar FDW as a proxy for a much more specific > FDW. > > (It's the 'faster to build two and toss the first away' approach - tar > lets > > me get the FDW stuff nailed down before attacking the more complex > > container.) It could also be useful in its own right, or as the basis > for a > > zip file FDW. > > Hm. tar may be a bad fit where zip may be much easier. Tar has no > index or table of contents. You have to scan the entire file to find > all the members. IIRC Zip does have a table of contents at the end of > the file. > > The most efficient way to process a tar file is to describe exactly > what you want to happen with each member and then process it linearly > from start to end (or until you've found the members you're looking > for). Trying to return meta info and then go looking for individual > members will be quite slow and have a large startup cost. > > > -- > greg >