On Tue, Dec 16, 2014 at 04:34:47PM -0600, Rob Landley wrote: > > > On 12/15/2014 10:28 PM, Isaac Dunham wrote: > > These have subtle variations in what they do: > > cpio archive creation: > > write garbage and continue on short read > > Corrupts one file, but allows to continue: among other things, > > this makes cpio -p keep going even if some of the files are on bad blocks. > > Not so much bad blocks as "file got truncated between the time we wrote > the header data and the time we read the file data". This is a problem, > and should cause it to exit with an error, but shouldn't cause archiving > an entire directory to abort. (Especially if the archive is being > generated to stdout, we _can't_ go back and fix the header info after > the fact, but there's an inherent race condition between the metadata > and the file data, especially when reading large files.) > > Another use case is archiving log files that constantly get appended to: > we need to read the amount the header said and not to the end of the file. > > > cpio archive extraction: > > die on short read, die on short write (reasonable) > > Because then we don't get the next file's metadata and can't move on. > (Premature end of file.) > > > tar copy_in_out(): > > die on short read, try to avoid but ignore short write (calls writeall()) > > It's in pending because I haven't cleaned it up yet, but should > presumably work like cpio.
I had been thinking about making it abort on short read or write. > > Aborts when hitting a bad block and creating archive > > as well as on extracting truncated archive. > > Blindly continues on running out of space. > > See "not cleaned up yet". If it's still in pending, there's generally a > reason. (I do glance over them when I merge stuff into pending. If it > looks like 15 minutes or less to fix it, I generally do.) > > > xsendfile(): > > no concept of file length, dies on write less than read. > > Will truncate files on bad blocks and continue. > > xread() and xwrite() both exit if they get an actual error. And xwrite() > exits if it can't write all the data. (It's on top of writeall() which > retries short writes until success or error; given a nonblock filehandle > it should spin.) > > See also "man 2 sendfile" which is on the todo list. > > > I'm wondering how best to generalize this. > > It seems that die on short read/write is currently the most relevant one. > > What's important is what's the _correct_ behavior. > > (Thunderbird giving me a spelling squiggle for nonblock, filehandle, and > todo just means its developers are clueless. But "behavior" is, in fact, > spelled that way and it should darn well cope.) > > > But ar would seem to want a slightly different approach for some > > functions, which would not be compatible with any of the current > > archivers (ar is the only non-streaming archiver so far): > > Because I want to eventually implement even things like gene2fs, > mkfatfs, and mkisofs as streaming archivers. (I can't do streaming zip > decompression because the metadata's at the end and I'd have to buffer > the whole archive anyway, but absent a reason like that being able to > pipe straight to/from gzip or ssh is worth a little extra effort.) ar could theoretically be extracted by a streaming archiver, but it cannot be written by one. There are two special entries that always happen at the start: one stores a symbol index, the other is a list of long file names. It is also *required* to support insertion at arbitrary locations (before or after specified files). Besides that, POSIX says "STDIN: Not used." Which I'm *quite* happy with. > > -create: > > die on short write (after deleting new archive/file?) > > There's existing logic in patch and sed -i to delete error path > temporary files via atexit(). I keep meaning to genericize that further > into something stackable. > > See tempfile_handler() in lib/lib.c and yes the static there should go > in struct toy_context, that's part of the cleanup I need to do. > > > indicate bytes written on short read > > indicate? > > This *roughly* corresponds to xsendfile(), but returning an off_t. > > ssize_t according to man 2 sendfile. I just hadn't yet because nothing > was using it. If I expand it, I'd want to move towards convergence with > the syscall... except that gratuitously wants one of its arguments to be > a network socket for NO OBVIOUS REASON. By "indicate bytes written" I mean "return the total number of bytes written". According to my man pages, "In Linux kernels before 2.6.33, out_fd must refer to a socket. Since Linux 2.6.33 it can be any file." Using sendfile will of course require a loop if you have a file larger than half the address space; I was hoping for something that won't croak if given a 9-gigabyte file on a 32 bit computer. ar was clearly *intended* to use 32-bit off_t, but the record can store any file size less than 10^10 bytes. > > When then the pipe improvements went into the kernel they were talking > about improving it to work with any two arbitrary filehandles, but I'd > need some sort of version probe to see whether I could use it or have to > fall back to the C implementation, and it's on the todo list... > > http://blog.superpat.com/2010/06/01/zero-copy-in-linux-with-sendfile-and-splice/ > > I dunno if the genericization work made it upstream or if it needs > splice() still. (There was talk about it on lwn.net at one point...) > > Dear thunderbird, genericization is so a word. Oh good grief it's got a > squiggle under thunderbird. I am not capitalizing the t. Deal with it. > > Ok, if the above two are separate sentences, full of squiggles. Together > as a paragraph: no squiggles. But "Ok" has a squiggle. I'm going to give > up trying to understand this email client now. Yeah, I gave up on spill chuckers years ago. I think the rules are something like "here's a massive list of words; anything lowercase that's in it can be capitalized after [?!.\n], anything uppercase must be matched exactly, and everything else is a mistake". Check if you have wamerican-huge installed and the language set to some variant of en_US...in all the dozens of places it should be set. > > I suppose I could use xsendfile() and then lseek() rather than refactoring > > xsendfile(). > > Is there a reason working like cpio is not an option? Because (0) I didn't want to refactor the input loop of cpio (it seems to have gotten a bit more daunting...), (1) we don't have to worry about things like passthrough mode, since we *always* are dealing with seekable files, (2) and it makes no sense to have ar create even partly corrupt libraries or archives. (its only common applications are static libraries and debian packages, where it should fail hard if it doesn't work perfectly.) GNU ar won't leave an archive around if you specify to archive a file that exists and one that doesn't--and that's the right course of action, for once. With an atexit() handler, I could simply use the extended xsendfile(). > You know how much data you're sending to it, right? I can see extending > xsendfile with an argument "send this many bytes", with -1 meaning "just > keep going until the source runs out". But the x prefix means "die if > you can't", and this might need more nuanced error handling than that... > > > -extract: > > die on short read (corrupt file), die on short write (out of space). > > Sounds about right. > > So I guess the sensible course is to write xcopyall() and make all the > > archivers use it where relevant. > > I lost the thread, you need it to do what? > > It sounds like you want xsendfile() to take a third argument, a max > length to copy with -1 meaning the current "until the end" behavior, and > then return the number of bytes copied? (With premature EOF being > perror_exit() territory?) Except this implies a non-x sendfile that > _won't_ exit for a short read, and that name's taken, but I can come up > with something. That's what I was wanting, complete with perror_exit on premature EOF; (x)copyall seems to be a reasonable name for it when sendfile is taken. I suppose that if xsendfile()/xcopyall with a positive length will always copy the full length, the return value can stay as void. Thanks, Isaac Dunham _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
