>Date: Mon, 24 Nov 2008 16:17:06 +0100 >From: Joerg.Schilling at fokus.fraunhofer.de (Joerg Schilling) > >Don Cragun <don.cragun at sun.com> wrote: > >> >> PSARC case 2006/331 (Add holey file support to pax) created >> > >> >I did never see such a case and the Sun pax man page does neither >> >include "hole" nor "sparse". >> >> The case was approved before PSARC cases were handled in cases open to >> people who are not Sun employees. But, all of the important >> information is included on Sun's current pax(1) man page. The pax >> utility added an extended header to ustar and pax archive format >> archives as described in the USAGE section, where it says: > >I canot speak for _very_ recent Solaris versions but for a case from 2006, >I would expect to see results before the latest (Build89) I checked. > >Build 89 comes with a pax man page from 2004
I don't know when the man page got out to OpenSolaris, but the pax fixes went into Nevada build 54. > >> and in the EXTENDED DESCRIPTION, where it says: >> >> "SUN.holesdata A Solaris extension to pax extended header >> keywords. Specifies the data and hole pairs >> for a sparse file. >> >> "In write or copy modes and when the xustar >> or pax format (see -x format) is specified, >> pax includes a SUN.holesdate extended >> header record if the underlying file system >> supports the detection of files with holes >> (see fpathconf(2)) and reports that there >> is at least one hole in the file being >> archived. value consists of two or more >> consecutive entries of the following form: >> >> SPACEdata_offsetSPACEhole_offset >> >> >> "where the data and hole offsets are the >> long values returned by passing SEEK_DATA >> and SEEK_HOLE to lseek(2), respectively. >> For example, the following entry is an >> example of the SUN.holesdata entry in the >> extended header for a file with data >> offsets at bytes 0, 24576, and 49152, and >> hole offsets at bytes 8192, 32768, and >> 49159: 49 SUN.holesdata= 0 8192 24576 32768 >> 49152 49159: >> >> 49 SUN.holesdata= 0 8192 24576 32768 49152 49159 > >Looks like it indroduces the same problem as the cpio case. As Cindy said in the case materials, the changes to cpio format archives are similar to the changes approved by PSARC/2006/361. > > >> >How do you intend to switch between the sparse support mode and the >> >non-sparse >> >mode in "copy mode"? >> >> There is no switch in copy mode. If the source filesystem reports >> holes in a file, the holes will be duplicated in the destination file >> as long as the destination file is seekable. > >This should be marked as deficit in the man page. That is not this case. The original proposal for that case had options to add holesdata extended headers or not when creating an archive and to extract the file with or without holes reinstated. The ARC directed the project team to always store holes data when creating a ustar or pax format archive and to always restore sparse files as sparse files when they are extracted from an archive. > > >> >> In copy out mode (-o) the following new option arguments to the >> >> cpio -H option will be added to provide sparse file support: >> >> ascii_sparse - assumes -c is specified. Only available >> >> in copy out (-o) mode. >> >> odc_sparse - assumes -H odc is specified Only available >> >> in copy out (-o) mode. >> > >> >Adding sparse file support does not introduce a new archive format unless >> >you >> >create a new archive format that may be detected by reading the first >> >archive >> >header from a random archive. >> >> Correct. When using -H ascii_sparse and -H odc_sparse, cpio uses ascii >> and odc format archives, respectively; but it uses a different file >> type when adding a sparse file to the archive. If an archiver >> understands cpio ascii and odc format archives, it will understand the >> archives. If an archiver doesn't recognize the extended file types, >> the standards require that it extract the file data as a regular file >> (which wlll contain the data needed to recreate the file contents with >> holes in the proper positions). > >Does the code sets bit 17 and clears the file type bits or does it set bit 17 >in >addition? It sets bit 17 in addition. This allows cpio to handle sparse files of other file types if the case ever comes up. > >> > >> >If you like to avoid to to introduce a new option, you would need to >> >document >> >this as a dirty hack. BTW: Where is the new man page? >> >> Quoting from the references section of this case: >> 5.4 PSARC/2008/727/materials/cpio.1: Updated cpio.1 man page > >OK, but I see no description of the archive format in this man page. The archive format will either be documented in archive(4) or in pax(1) where details of other formats are described in detail. When that happens, a cross reference will be added to the cpio(1) man page. > > >> >> The following will apply when either '-H ascii_sparse' or >> >> '-H odc_sparse' is specified with -o: >> >> - The c_mode field will in the archive header will >> >> indicate that the file is a sparse file. In the old >> >> stat structure, the mode field is an unsigned short >> >> (16 bit) field. To avoid conflicts with other file >> >> types, a high order bit (17) in the c_mode field of >> >> the header will be set. >> > >> >This is beyond the cpio specs. How do you plan to mark the archives >> >as "Sun cpio" specific to allow to avoid incorrect behavior for non-Sun >> >archives? >> >> It is indicated by the file type. > >As a result of not marking the archive, archivers that carefully implement add >on features depending on the archive format will not unpack the sparse files. > >star and AT&T pax will ignore bit 17, other archivers may include this >bit in the file type with unkown results. That is a bug in star and AT&T pax (as well as in Sun's current cpio and pax). A bug report will be filed to correct Sun's cpio and pax when this case is resolved. > >Vendor unique extensions that do not use explicit vendor specific tags >are something we had in the 1980s. > > >> >> - A string of the following format will be prepended to >> >> the compressed file data: >> >> "%lu %llu%s", prepended_info_size, >> >> expanded_file_size, data/hole_offsets >> > >> >Is this data _inside_ the file data area or is it in conflict with the >> >cpio extensions from David Korn and Glenn Fowler? >> >> It is inside the file data area as indicated above. (The file size >> field is the size of this header plus the size of the file contents >> after removing the holes.) > >OK; how about marking the archive in the header area past the filename? As I'm sure you know, the data describing the contents of the file immediately follow the header in a cpio archive. The standard does not allow any new fields to be added to the cpio archive format. > > > >> >> where data/hole_offsets contains 2 or more entries of the >> >> following format: >> >> " %llu %llu", data_offset, hole_offset >> > >> >If you ever like to debug this, I would recommend to use: >> > >> > " %llu,%llu", data_offset, hole_offset >> > >> >to make the data parsable by human eyes.. >> >> Maybe to European human eyes. In the U.S., some possible data offset, >> hole offset pairs could look like a single number with a the "," being >> a thousands separator instead of as a pair separator. Besides that it >> matches the string given as the data in a ustar/pax SUN.holesdata >> extended header. > >It seems that you are too US centric and thus do not see the problem of >being unable to see number pairs in a possiblily extremely long data >stream. As you know, the cpio header is a string of 76 octal digits followed immediately by the pathname of the file (c_namesize bytes including a trailing NUL byte) followed immediately by the contents of the file (c_filesize bytes). This header is not easy for humans to read. This header is not intended to be read by humans; it is intended to be read by cpio and other archivers that want to extract data from cpio archives. The same is true of the holes data record both in the ustar/pax archive format extended headers and in the initial portion of the file data in the cpio archive format. - Don > ... ... ... > >J?rg