>Date: Sun, 23 Nov 2008 15:01:41 +0100
>From: Joerg.Schilling at fokus.fraunhofer.de (Joerg Schilling)
>
>Don Cragun <don.cragun at sun.com> wrote:
>
>> I'm sponsoring this case for Cynthia Eastham.
>>
>> Since this case follows the same general practices used when sparse
>> file support was added to the pax archiving utility, I'm marking this
>
>???
>
>AFAIK, the "pax" implementation that comes with Solaris does not support 
>sparse files.

It does.

>
>> case as close approved automatic.  If any members believe this needs to
>> be promoted to a fast track let me know.
>
>I see many small problems that need to be adressed before an implementation 
>starts.
>
>> Template Version: @(#)sac_nextcase %I% %G% SMI
>> This information is Copyright 2008 Sun Microsystems
>> 1. Introduction
>>     1.1. Project/Component Working Name:
>>       Add sparse file support to cpio
>>     1.2. Name of Document Author/Supplier:
>>       Author:  Cynthia Eastham
>>     1.3  Date of This Document:
>>      21 November, 2008
>> 4. Technical Description
>>      4.1 Details
>>
>>      PSARC case 2006/331 (Add holey file support to pax) created
>
>I did never see such a case and the Sun pax man page does neither 
>include "hole" nor "sparse".

The case was approved before PSARC cases were handled in cases open to
people who are not Sun employees.  But, all of the important
information is included on Sun's current pax(1) man page.  The pax
utility added an extended header to ustar and pax archive format
archives as described in the USAGE section, where it says:

    "When using the -x xustar and -x -pax archive formats, if the
     underlying  file system reports that the file being archived
     contains holes, the Solaris pax utility records the presence
     of  holes  in  an  extended  header  record when the file is
     archived. If this extended header record is associated  with
     a  file  in  the archive, those holes are recreated whenever
     that file is extracted from the archive. See  the  SEEK_DATA
     and SEEK_HOLE whence values in lseek(2). In all other cases,
     any NUL (\0) characters found in the archive is  written  to
     the file when it is extracted."

and in the EXTENDED DESCRIPTION, where it says:

    "SUN.holesdata    A Solaris extension to pax extended  header
                      keywords. Specifies the data and hole pairs
                      for a sparse file.

                     "In write or copy modes and when the  xustar
                      or pax format (see -x format) is specified,
                      pax  includes  a   SUN.holesdate   extended
                      header record if the underlying file system
                      supports the detection of files with  holes
                      (see  fpathconf(2))  and reports that there
                      is at least one  hole  in  the  file  being
                      archived.  value  consists  of  two or more
                      consecutive entries of the following form:

                        SPACEdata_offsetSPACEhole_offset


                     "where the data and  hole  offsets  are  the
                      long  values  returned by passing SEEK_DATA
                      and SEEK_HOLE  to  lseek(2),  respectively.
                      For  example,  the  following  entry  is an
                      example of the SUN.holesdata entry  in  the
                      extended   header  for  a  file  with  data
                      offsets at bytes 0, 24576, and  49152,  and
                      hole  offsets  at  bytes  8192,  32768, and
                      49159: 49 SUN.holesdata= 0 8192 24576 32768
                      49152 49159:

                        49 SUN.holesdata= 0 8192 24576 32768 49152 49159


                     "When extracting a file from an  archive  in
                      read  or  copy  modes, if a SUN.holesdata =
                      pair is found in the  extended  header  for
                      the  file,  then  the file is restored with
                      the holes identified using this  data.  For
                      example,  for the SUN.holesdata provided in
                      the example above, bytes from 0 to 8192 are
                      restored  as  data, a hole is created up to
                      the next data position (24576), bytes 24576
                      to 32768 is restored as data, and so forth."


>
>>      This case adds similar sparse file support to the cpio utility.
>
>Similar to what?

Similar to pax.

>
>
>>      In pass mode, (cpio -p), sparse files will be recreated at the
>>      destination with the same holes that were present in the source
>>      file, as long as the source file system supports reporting
>>      holes (as described by PSARC case 2004/770) and the destination
>>      file is seekable.  Otherwise, holes in sparse files will be
>>      filled with '\0' btyes in corresponding destination files as
>>      they are now.
>
>How do you intend to switch between the sparse support mode and the non-sparse 
>mode in "copy mode"?

There is no switch in copy mode.  If the source filesystem reports
holes in a file, the holes will be duplicated in the destination file
as long as the destination file is seekable.

>
>>      In copy out mode (-o) the following new option arguments to the
>>      cpio -H option will be added to provide sparse file support:
>>              ascii_sparse    - assumes -c is specified.  Only available
>>                              in copy out (-o) mode.
>>              odc_sparse      - assumes -H odc is specified  Only available
>>                              in copy out (-o) mode.
>
>Adding sparse file support does not introduce a new archive format unless you
>create a new archive format that may be detected by reading the first archive
>header from a random archive.

Correct.  When using -H ascii_sparse and -H odc_sparse, cpio uses ascii
and odc format archives, respectively; but it uses a different file
type when adding a sparse file to the archive.  If an archiver
understands cpio ascii and odc format archives, it will understand the
archives.  If an archiver doesn't recognize the extended file types,
the standards require that it extract the file data as a regular file
(which wlll contain the data needed to recreate the file contents with
holes in the proper positions).

>
>If you like to avoid to to introduce a new option, you would need to document 
>this as a dirty hack. BTW: Where is the new man page?

Quoting from the references section of this case:
    5.4 PSARC/2008/727/materials/cpio.1: Updated cpio.1 man page

>
>
>...
>
>>      The following will apply when either '-H ascii_sparse' or
>>      '-H odc_sparse' is specified with -o: 
>>              - The c_mode field will in the archive header will
>>                indicate that the file is a sparse file. In the old
>>                stat structure, the mode field is an unsigned short
>>                (16 bit) field.  To avoid conflicts with other file
>>                types, a high order bit (17) in the c_mode field of
>>                the header will be set.
>
>This is beyond the cpio specs. How do you plan to mark the archives 
>as "Sun cpio" specific to allow to avoid incorrect behavior for non-Sun 
>archives?

It is indicated by the file type.

>
>>              - the file size field of the header will be the size of
>>                the compressed sparse file (i.e., the size of the
>>                header below plus the size of the file contents after
>>                removing the holes).
>
>OK
>
>>              - A string of the following format will be prepended to
>>                the compressed file data:
>>                      "%lu %llu%s", prepended_info_size,
>>                              expanded_file_size, data/hole_offsets
>
>Is this data _inside_ the file data area or is it in conflict with the 
>cpio extensions from David Korn and Glenn Fowler?

It is inside the file data area as indicated above.  (The file size
field is the size of this header plus the size of the file contents
after removing the holes.)

>
>
>>              where data/hole_offsets contains 2 or more entries of the
>>              following format:
>>                      " %llu %llu", data_offset, hole_offset
>
>If you ever like to debug this, I would recommend to use:
>
>                       " %llu,%llu", data_offset, hole_offset
>
>to make the data parsable by human eyes..

Maybe to European human eyes.  In the U.S., some possible data offset,
hole offset pairs could look like a single number with a the "," being
a thousands separator instead of as a pair separator.  Besides that it
matches the string given as the data in a ustar/pax SUN.holesdata
extended header.

>
>But why don't you follow existing other implementations that use 
>offset/numbytes pairs for data chunks? This results in a lower archive size.

I'm not going to argue decisions that were agreed upon for PSARC
2006/361.  But, it follows naturally from the data provided by the
lseek(2) SEEK_HOLE and SEEK_DATA operations.

>
>
>>      When the c_mode field is set, cpio will detect the sparse file
>>      upon file extraction, and use the prepended sparse file
>>      information to restore the holes in the file if the
>>      destiination file is seekable.  If the destination file is not
>>      seekable, the sparse file information will be used to fill the
>>      holes with '\0' bytes.  Archivers that do not recognize the
>>      sparse file mode bit will restore the compressed file and its
>>      prepended data as a regular file.
>
>As it is unlikely that the first file in an archive is a sparse file, how
>do you intend to detect an archive that contain this Sun specific cpio 
>extension?

By the file type.

>
>How do you intend to switch between the sparse support mode and the non-sparse 
>mode in "extract mode"?

There is no switching.  If a file is archived as a sparse file, it will
be extracted as a sparse file.

 - Don

>
>
>J?rg

Reply via email to