Don Cragun <don.cragun at sun.com> wrote: > >> PSARC case 2006/331 (Add holey file support to pax) created > > > >I did never see such a case and the Sun pax man page does neither > >include "hole" nor "sparse". > > The case was approved before PSARC cases were handled in cases open to > people who are not Sun employees. But, all of the important > information is included on Sun's current pax(1) man page. The pax > utility added an extended header to ustar and pax archive format > archives as described in the USAGE section, where it says:
I canot speak for _very_ recent Solaris versions but for a case from 2006, I would expect to see results before the latest (Build89) I checked. Build 89 comes with a pax man page from 2004 > and in the EXTENDED DESCRIPTION, where it says: > > "SUN.holesdata A Solaris extension to pax extended header > keywords. Specifies the data and hole pairs > for a sparse file. > > "In write or copy modes and when the xustar > or pax format (see -x format) is specified, > pax includes a SUN.holesdate extended > header record if the underlying file system > supports the detection of files with holes > (see fpathconf(2)) and reports that there > is at least one hole in the file being > archived. value consists of two or more > consecutive entries of the following form: > > SPACEdata_offsetSPACEhole_offset > > > "where the data and hole offsets are the > long values returned by passing SEEK_DATA > and SEEK_HOLE to lseek(2), respectively. > For example, the following entry is an > example of the SUN.holesdata entry in the > extended header for a file with data > offsets at bytes 0, 24576, and 49152, and > hole offsets at bytes 8192, 32768, and > 49159: 49 SUN.holesdata= 0 8192 24576 32768 > 49152 49159: > > 49 SUN.holesdata= 0 8192 24576 32768 49152 49159 Looks like it indroduces the same problem as the cpio case. > >How do you intend to switch between the sparse support mode and the > >non-sparse > >mode in "copy mode"? > > There is no switch in copy mode. If the source filesystem reports > holes in a file, the holes will be duplicated in the destination file > as long as the destination file is seekable. This should be marked as deficit in the man page. > >> In copy out mode (-o) the following new option arguments to the > >> cpio -H option will be added to provide sparse file support: > >> ascii_sparse - assumes -c is specified. Only available > >> in copy out (-o) mode. > >> odc_sparse - assumes -H odc is specified Only available > >> in copy out (-o) mode. > > > >Adding sparse file support does not introduce a new archive format unless you > >create a new archive format that may be detected by reading the first archive > >header from a random archive. > > Correct. When using -H ascii_sparse and -H odc_sparse, cpio uses ascii > and odc format archives, respectively; but it uses a different file > type when adding a sparse file to the archive. If an archiver > understands cpio ascii and odc format archives, it will understand the > archives. If an archiver doesn't recognize the extended file types, > the standards require that it extract the file data as a regular file > (which wlll contain the data needed to recreate the file contents with > holes in the proper positions). Does the code sets bit 17 and clears the file type bits or does it set bit 17 in addition? > > > >If you like to avoid to to introduce a new option, you would need to > >document > >this as a dirty hack. BTW: Where is the new man page? > > Quoting from the references section of this case: > 5.4 PSARC/2008/727/materials/cpio.1: Updated cpio.1 man page OK, but I see no description of the archive format in this man page. > >> The following will apply when either '-H ascii_sparse' or > >> '-H odc_sparse' is specified with -o: > >> - The c_mode field will in the archive header will > >> indicate that the file is a sparse file. In the old > >> stat structure, the mode field is an unsigned short > >> (16 bit) field. To avoid conflicts with other file > >> types, a high order bit (17) in the c_mode field of > >> the header will be set. > > > >This is beyond the cpio specs. How do you plan to mark the archives > >as "Sun cpio" specific to allow to avoid incorrect behavior for non-Sun > >archives? > > It is indicated by the file type. As a result of not marking the archive, archivers that carefully implement add on features depending on the archive format will not unpack the sparse files. star and AT&T pax will ignore bit 17, other archivers may include this bit in the file type with unkown results. Vendor unique extensions that do not use explicit vendor specific tags are something we had in the 1980s. > >> - A string of the following format will be prepended to > >> the compressed file data: > >> "%lu %llu%s", prepended_info_size, > >> expanded_file_size, data/hole_offsets > > > >Is this data _inside_ the file data area or is it in conflict with the > >cpio extensions from David Korn and Glenn Fowler? > > It is inside the file data area as indicated above. (The file size > field is the size of this header plus the size of the file contents > after removing the holes.) OK; how about marking the archive in the header area past the filename? > >> where data/hole_offsets contains 2 or more entries of the > >> following format: > >> " %llu %llu", data_offset, hole_offset > > > >If you ever like to debug this, I would recommend to use: > > > > " %llu,%llu", data_offset, hole_offset > > > >to make the data parsable by human eyes.. > > Maybe to European human eyes. In the U.S., some possible data offset, > hole offset pairs could look like a single number with a the "," being > a thousands separator instead of as a pair separator. Besides that it > matches the string given as the data in a ustar/pax SUN.holesdata > extended header. It seems that you are too US centric and thus do not see the problem of being unable to see number pairs in a possiblily extremely long data stream. > >But why don't you follow existing other implementations that use > >offset/numbytes pairs for data chunks? This results in a lower archive size. > > I'm not going to argue decisions that were agreed upon for PSARC > 2006/361. But, it follows naturally from the data provided by the > lseek(2) SEEK_HOLE and SEEK_DATA operations. I offered my help in special for tar/cpio specific archive format questions even before this case was aproved. So why didn't you ask me then? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily