Don Cragun <don.cragun at sun.com> wrote:

> There are no extensible headers in cpio format.  The only place to

In theory, there is a way to extend the cpio header. Glen Fowler and
David Korn use this method to add a small amount of extra data to the
cpio header.

The method they use is to extend the space "occupied" by the file name
by setting c_namesize to a value > strlen(pathname) + 1 and to write
extensions into this space.

On POSIX cpio archives, you may add 262143 bytes - strlen(pathname) - 1.
On SVr4 cpio archives, you may add 4294967295 bytes - strlen(pathname) - 1.

I would guess that many cpio implementations will dump core in case that
this method is used to set c_namesize > 1024, but this is a location that
could be used to add a vendor fingerprint that allows to detect the
modified archive format in a reliable way. 

Glen Fowler uses:

        d<hex number>   For a "long" st_rdev in POSIX cpio archives
        g<hex number>   For a "long" group ID.
        s<hex number>   For a "long" file size.
        u<hex number>   For a "long" user ID.
        G<name>         For a group name.
        U<name>         For a user name.

All fields are '\0' terminated and the end of the list is a double '\0'.
"long" numbers are written out as intmax_t but it seems that they are
read back as native C "long" only.

I recommend to use:

        V<vendor>       In our case "VSUN" as a marker that this is
                        a cpio archive with Sun extensions.

and to report this to Glen Fowler and David Korn. The current pax
implementation from AT&T silently skips unknown fields.

> store data on where the holes go in cpio format is in the file data
> area.  The project team had the option of storing hole information
> followed by the complete file contents or storing the hold information
> followed by the file with the holes removed.  Since the cpio format
> only has 33 bits to store the size of the file data area, the project
> team chose to remove the holes to increase the size of a sparse file
> that can be archived in cpio format.  While it is true that it would be
> possible to encode holes data in a slightly more compact form, it is
> nice to have a common format for the holes data in the ustar/pax
> extended header records for sparse files and in the data area in the
> cpio file data area.

The decision to put the hole information into the file data area is OK for me
but the simplest way to "better" compress this information is to use 
data_offset/data_size instead of your current proposal.


> >If there are, then should we be deliberately incompatible?
>
> The star and the recent AT&T pax archivers encode sparse files using
> ustar/pax format archives.  The project team has not seen any other
> attempt to encode sparse files using cpio format.  So, there is no
> other known cpio format that handles this case.  We are not being
> deliberately incompatible; nothing else handles this case.

If you mark the cpio archives using the proposal from above and if you
use comma separated data_offset/data_size pairs to encode the hole list,
I would be willing to implement this in star too.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Reply via email to