727 Self Review]

Don Cragun Mon, 24 Nov 2008 16:20:51 -0800 (PST)

>Date: Mon, 24 Nov 2008 16:17:06 +0100
>From: Joerg.Schilling at fokus.fraunhofer.de (Joerg Schilling)
>
>Don Cragun <don.cragun at sun.com> wrote:
>
>> >>   PSARC case 2006/331 (Add holey file support to pax) created
>> >
>> >I did never see such a case and the Sun pax man page does neither 
>> >include "hole" nor "sparse".
>>
>> The case was approved before PSARC cases were handled in cases open to
>> people who are not Sun employees.  But, all of the important
>> information is included on Sun's current pax(1) man page.  The pax
>> utility added an extended header to ustar and pax archive format
>> archives as described in the USAGE section, where it says:
>
>I canot speak for _very_ recent Solaris versions but for a case from 2006,
>I would expect to see results before the latest (Build89) I checked.
>
>Build 89 comes with a pax man page from 2004


I don't know when the man page got out to OpenSolaris, but the pax
fixes went into Nevada build 54.

>
>> and in the EXTENDED DESCRIPTION, where it says:
>>
>>     "SUN.holesdata    A Solaris extension to pax extended  header
>>                       keywords. Specifies the data and hole pairs
>>                       for a sparse file.
>>
>>                      "In write or copy modes and when the  xustar
>>                       or pax format (see -x format) is specified,
>>                       pax  includes  a   SUN.holesdate   extended
>>                       header record if the underlying file system
>>                       supports the detection of files with  holes
>>                       (see  fpathconf(2))  and reports that there
>>                       is at least one  hole  in  the  file  being
>>                       archived.  value  consists  of  two or more
>>                       consecutive entries of the following form:
>>
>>                         SPACEdata_offsetSPACEhole_offset
>>
>>
>>                      "where the data and  hole  offsets  are  the
>>                       long  values  returned by passing SEEK_DATA
>>                       and SEEK_HOLE  to  lseek(2),  respectively.
>>                       For  example,  the  following  entry  is an
>>                       example of the SUN.holesdata entry  in  the
>>                       extended   header  for  a  file  with  data
>>                       offsets at bytes 0, 24576, and  49152,  and
>>                       hole  offsets  at  bytes  8192,  32768, and
>>                       49159: 49 SUN.holesdata= 0 8192 24576 32768
>>                       49152 49159:
>>
>>                         49 SUN.holesdata= 0 8192 24576 32768 49152 49159
>
>Looks like it indroduces the same problem as the cpio case.

As Cindy said in the case materials, the changes to cpio format
archives are similar to the changes approved by PSARC/2006/361.

>
>
>> >How do you intend to switch between the sparse support mode and the 
>> >non-sparse 
>> >mode in "copy mode"?
>>
>> There is no switch in copy mode.  If the source filesystem reports
>> holes in a file, the holes will be duplicated in the destination file
>> as long as the destination file is seekable.
>
>This should be marked as deficit in the man page.

That is not this case.  The original proposal for that case had options
to add holesdata extended headers or not when creating an archive and
to extract the file with or without holes reinstated.  The ARC directed
the project team to always store holes data when creating a ustar or
pax format archive and to always restore sparse files as sparse files
when they are extracted from an archive.

>
>
>> >>   In copy out mode (-o) the following new option arguments to the
>> >>   cpio -H option will be added to provide sparse file support:
>> >>           ascii_sparse    - assumes -c is specified.  Only available
>> >>                           in copy out (-o) mode.
>> >>           odc_sparse      - assumes -H odc is specified  Only available
>> >>                           in copy out (-o) mode.
>> >
>> >Adding sparse file support does not introduce a new archive format unless 
>> >you
>> >create a new archive format that may be detected by reading the first 
>> >archive
>> >header from a random archive.
>>
>> Correct.  When using -H ascii_sparse and -H odc_sparse, cpio uses ascii
>> and odc format archives, respectively; but it uses a different file
>> type when adding a sparse file to the archive.  If an archiver
>> understands cpio ascii and odc format archives, it will understand the
>> archives.  If an archiver doesn't recognize the extended file types,
>> the standards require that it extract the file data as a regular file
>> (which wlll contain the data needed to recreate the file contents with
>> holes in the proper positions).
>
>Does the code sets bit 17 and clears the file type bits or does it set bit 17 
>in
>addition?

It sets bit 17 in addition.  This allows cpio to handle sparse files of
other file types if the case ever comes up.

>
>> >
>> >If you like to avoid to to introduce a new option, you would need to 
>> >document 
>> >this as a dirty hack. BTW: Where is the new man page?
>>
>> Quoting from the references section of this case:
>>     5.4 PSARC/2008/727/materials/cpio.1: Updated cpio.1 man page
>
>OK, but I see no description of the archive format in this man page.

The archive format will either be documented in archive(4) or in pax(1)
where details of other formats are described in detail.  When that
happens, a cross reference will be added to the cpio(1) man page.

>
>
>> >>   The following will apply when either '-H ascii_sparse' or
>> >>   '-H odc_sparse' is specified with -o: 
>> >>           - The c_mode field will in the archive header will
>> >>             indicate that the file is a sparse file. In the old
>> >>             stat structure, the mode field is an unsigned short
>> >>             (16 bit) field.  To avoid conflicts with other file
>> >>             types, a high order bit (17) in the c_mode field of
>> >>             the header will be set.
>> >
>> >This is beyond the cpio specs. How do you plan to mark the archives 
>> >as "Sun cpio" specific to allow to avoid incorrect behavior for non-Sun 
>> >archives?
>>
>> It is indicated by the file type.
>
>As a result of not marking the archive, archivers that carefully implement add 
>on features depending on the archive format will not unpack the sparse files.
>
>star and AT&T pax will ignore bit 17, other archivers may include this
>bit in the file type with unkown results.

That is a bug in star and AT&T pax (as well as in Sun's current cpio
and pax).  A bug report will be filed to correct Sun's cpio and pax
when this case is resolved.

>
>Vendor unique extensions that do not use explicit vendor specific tags
>are something we had in the 1980s.
>
>
>> >>           - A string of the following format will be prepended to
>> >>             the compressed file data:
>> >>                   "%lu %llu%s", prepended_info_size,
>> >>                           expanded_file_size, data/hole_offsets
>> >
>> >Is this data _inside_ the file data area or is it in conflict with the 
>> >cpio extensions from David Korn and Glenn Fowler?
>>
>> It is inside the file data area as indicated above.  (The file size
>> field is the size of this header plus the size of the file contents
>> after removing the holes.)
>
>OK; how about marking the archive in the header area past the filename?

As I'm sure you know, the data describing the contents of the file
immediately follow the header in a cpio archive.  The standard does not
allow any new fields to be added to the cpio archive format.

>
>
>
>> >>           where data/hole_offsets contains 2 or more entries of the
>> >>           following format:
>> >>                   " %llu %llu", data_offset, hole_offset
>> >
>> >If you ever like to debug this, I would recommend to use:
>> >
>> >                    " %llu,%llu", data_offset, hole_offset
>> >
>> >to make the data parsable by human eyes..
>>
>> Maybe to European human eyes.  In the U.S., some possible data offset,
>> hole offset pairs could look like a single number with a the "," being
>> a thousands separator instead of as a pair separator.  Besides that it
>> matches the string given as the data in a ustar/pax SUN.holesdata
>> extended header.
>
>It seems that you are too US centric and thus do not see the problem of
>being unable to see number pairs in a possiblily extremely long data 
>stream.

As you know, the cpio header is a string of 76 octal digits followed
immediately by the pathname of the file (c_namesize bytes including a
trailing NUL byte) followed immediately by the contents of the file
(c_filesize bytes).  This header is not easy for humans to read.  This
header is not intended to be read by humans; it is intended to be read
by cpio and other archivers that want to extract data from cpio
archives.  The same is true of the holes data record both in the
ustar/pax archive format extended headers and in the initial portion of
the file data in the cpio archive format.

 - Don

>
 ... ... ...
>
>J?rg

Add sparse file support to cpio [PSARC/2008/727 Self Review]

Reply via email to