On 23/02/12 17:09, Dan McGee wrote: > On Tue, Feb 21, 2012 at 10:02 PM, Allan McRae <al...@archlinux.org> wrote: >> When installing a package, write an mtree of the package files into >> the local database. This will be useful for doing validation of all >> files on a system. >> >> Signed-off-by: Allan McRae <al...@archlinux.org> >> --- >> >> Query: should we keep the info on .INSTALL and .CHANGELOG files? Changing a >> .INSTALL file would be an interesting tactic, but if someone is doing that >> then >> they can already adjust the mtree file... >> >> Also, from http://goo.gl/Uq6X5 it appears that this could be made more >> efficient >> by reusing the file descriptor, but I could not get that working after many, >> many, >> many attempts. > Did you rewind the file descriptor? You should just have to call > `lseek(fd, 0, SEEK_SET)` first. Of course, since the current version > of _alpm_open_archive does both the open() and archive_read_new() > business, the abstraction there would have to change.
Ah... lseek was the key. I can do that and make the abstraction to _alpm_open archive(). But it will not be needed if... > With that said, not having to decompress everything twice would also > be a win; I saw some chatter about this on IRC but I would definitely > prefer to not iterate again; removing the iteration from the diskspace > sped it up enough that I enabled that by default; I don't want to lose > those gains. I think this can be done. But it is far from simple. It involves us doing an archive_read_data() to read the data into a buffer, duplicating that buffer and then passing one copy to the archive_write_data() for the file on disk and the other to the write for the mtree archive. It means that we can not use the convenience function archive_read_extract() and that is a big convenience... archive_read_extract(), archive_read_extract_set_skip_file(): A convenience function that wraps the corresponding archive_write_disk(3) interfaces. The first call to archive_read_extract() creates a restore object using archive_write_disk_new(3) and archive_write_disk_set_standard_lookup(3), then transparently invokes archive_write_disk_set_options(3), archive_write_header(3), archive_write_data(3), and archive_write_finish_entry(3) to create the entry on disk and copy data into it. The flags argument is passed unmodified to archive_write_disk_set_options(3). So we would have to duplicate that entire functionality... <snip> >> + /* output the type, uid, gid, mode, size, time, md5 and link fields >> */ >> + archive_write_set_options(mtree, >> "use-set,!device,!flags,!gname,!nlink,!uname,md5"); > Did 'use-set' end up being a net-win on size and/or speed? The size is much small for the raw file when using 'use-set' but that difference entirely disappears when compressing with gzip. In the brief tests I did, the reading was slightly faster using 'use-set'. So, should I go ahead and write a version of archive_read_extract into a function that does both the extraction and mtree creation? Or do people see another way around this? Allan