>Date: Sun, 23 Nov 2008 15:01:41 +0100 >From: Joerg.Schilling at fokus.fraunhofer.de (Joerg Schilling) > >Don Cragun <don.cragun at sun.com> wrote: > >> I'm sponsoring this case for Cynthia Eastham. >> >> Since this case follows the same general practices used when sparse >> file support was added to the pax archiving utility, I'm marking this > >??? > >AFAIK, the "pax" implementation that comes with Solaris does not support >sparse files.
It does. > >> case as close approved automatic. If any members believe this needs to >> be promoted to a fast track let me know. > >I see many small problems that need to be adressed before an implementation >starts. > >> Template Version: @(#)sac_nextcase %I% %G% SMI >> This information is Copyright 2008 Sun Microsystems >> 1. Introduction >> 1.1. Project/Component Working Name: >> Add sparse file support to cpio >> 1.2. Name of Document Author/Supplier: >> Author: Cynthia Eastham >> 1.3 Date of This Document: >> 21 November, 2008 >> 4. Technical Description >> 4.1 Details >> >> PSARC case 2006/331 (Add holey file support to pax) created > >I did never see such a case and the Sun pax man page does neither >include "hole" nor "sparse". The case was approved before PSARC cases were handled in cases open to people who are not Sun employees. But, all of the important information is included on Sun's current pax(1) man page. The pax utility added an extended header to ustar and pax archive format archives as described in the USAGE section, where it says: "When using the -x xustar and -x -pax archive formats, if the underlying file system reports that the file being archived contains holes, the Solaris pax utility records the presence of holes in an extended header record when the file is archived. If this extended header record is associated with a file in the archive, those holes are recreated whenever that file is extracted from the archive. See the SEEK_DATA and SEEK_HOLE whence values in lseek(2). In all other cases, any NUL (\0) characters found in the archive is written to the file when it is extracted." and in the EXTENDED DESCRIPTION, where it says: "SUN.holesdata A Solaris extension to pax extended header keywords. Specifies the data and hole pairs for a sparse file. "In write or copy modes and when the xustar or pax format (see -x format) is specified, pax includes a SUN.holesdate extended header record if the underlying file system supports the detection of files with holes (see fpathconf(2)) and reports that there is at least one hole in the file being archived. value consists of two or more consecutive entries of the following form: SPACEdata_offsetSPACEhole_offset "where the data and hole offsets are the long values returned by passing SEEK_DATA and SEEK_HOLE to lseek(2), respectively. For example, the following entry is an example of the SUN.holesdata entry in the extended header for a file with data offsets at bytes 0, 24576, and 49152, and hole offsets at bytes 8192, 32768, and 49159: 49 SUN.holesdata= 0 8192 24576 32768 49152 49159: 49 SUN.holesdata= 0 8192 24576 32768 49152 49159 "When extracting a file from an archive in read or copy modes, if a SUN.holesdata = pair is found in the extended header for the file, then the file is restored with the holes identified using this data. For example, for the SUN.holesdata provided in the example above, bytes from 0 to 8192 are restored as data, a hole is created up to the next data position (24576), bytes 24576 to 32768 is restored as data, and so forth." > >> This case adds similar sparse file support to the cpio utility. > >Similar to what? Similar to pax. > > >> In pass mode, (cpio -p), sparse files will be recreated at the >> destination with the same holes that were present in the source >> file, as long as the source file system supports reporting >> holes (as described by PSARC case 2004/770) and the destination >> file is seekable. Otherwise, holes in sparse files will be >> filled with '\0' btyes in corresponding destination files as >> they are now. > >How do you intend to switch between the sparse support mode and the non-sparse >mode in "copy mode"? There is no switch in copy mode. If the source filesystem reports holes in a file, the holes will be duplicated in the destination file as long as the destination file is seekable. > >> In copy out mode (-o) the following new option arguments to the >> cpio -H option will be added to provide sparse file support: >> ascii_sparse - assumes -c is specified. Only available >> in copy out (-o) mode. >> odc_sparse - assumes -H odc is specified Only available >> in copy out (-o) mode. > >Adding sparse file support does not introduce a new archive format unless you >create a new archive format that may be detected by reading the first archive >header from a random archive. Correct. When using -H ascii_sparse and -H odc_sparse, cpio uses ascii and odc format archives, respectively; but it uses a different file type when adding a sparse file to the archive. If an archiver understands cpio ascii and odc format archives, it will understand the archives. If an archiver doesn't recognize the extended file types, the standards require that it extract the file data as a regular file (which wlll contain the data needed to recreate the file contents with holes in the proper positions). > >If you like to avoid to to introduce a new option, you would need to document >this as a dirty hack. BTW: Where is the new man page? Quoting from the references section of this case: 5.4 PSARC/2008/727/materials/cpio.1: Updated cpio.1 man page > > >... > >> The following will apply when either '-H ascii_sparse' or >> '-H odc_sparse' is specified with -o: >> - The c_mode field will in the archive header will >> indicate that the file is a sparse file. In the old >> stat structure, the mode field is an unsigned short >> (16 bit) field. To avoid conflicts with other file >> types, a high order bit (17) in the c_mode field of >> the header will be set. > >This is beyond the cpio specs. How do you plan to mark the archives >as "Sun cpio" specific to allow to avoid incorrect behavior for non-Sun >archives? It is indicated by the file type. > >> - the file size field of the header will be the size of >> the compressed sparse file (i.e., the size of the >> header below plus the size of the file contents after >> removing the holes). > >OK > >> - A string of the following format will be prepended to >> the compressed file data: >> "%lu %llu%s", prepended_info_size, >> expanded_file_size, data/hole_offsets > >Is this data _inside_ the file data area or is it in conflict with the >cpio extensions from David Korn and Glenn Fowler? It is inside the file data area as indicated above. (The file size field is the size of this header plus the size of the file contents after removing the holes.) > > >> where data/hole_offsets contains 2 or more entries of the >> following format: >> " %llu %llu", data_offset, hole_offset > >If you ever like to debug this, I would recommend to use: > > " %llu,%llu", data_offset, hole_offset > >to make the data parsable by human eyes.. Maybe to European human eyes. In the U.S., some possible data offset, hole offset pairs could look like a single number with a the "," being a thousands separator instead of as a pair separator. Besides that it matches the string given as the data in a ustar/pax SUN.holesdata extended header. > >But why don't you follow existing other implementations that use >offset/numbytes pairs for data chunks? This results in a lower archive size. I'm not going to argue decisions that were agreed upon for PSARC 2006/361. But, it follows naturally from the data provided by the lseek(2) SEEK_HOLE and SEEK_DATA operations. > > >> When the c_mode field is set, cpio will detect the sparse file >> upon file extraction, and use the prepended sparse file >> information to restore the holes in the file if the >> destiination file is seekable. If the destination file is not >> seekable, the sparse file information will be used to fill the >> holes with '\0' bytes. Archivers that do not recognize the >> sparse file mode bit will restore the compressed file and its >> prepended data as a regular file. > >As it is unlikely that the first file in an archive is a sparse file, how >do you intend to detect an archive that contain this Sun specific cpio >extension? By the file type. > >How do you intend to switch between the sparse support mode and the non-sparse >mode in "extract mode"? There is no switching. If a file is archived as a sparse file, it will be extracted as a sparse file. - Don > > >J?rg