On 12/15/2014 10:28 PM, Isaac Dunham wrote: > This is mostly musings about design/refactoring that hasn't happened. > > I'm working on ar, and have gotten to the point that calls for read-write > loops. > Yes, I know they're fairly small, and I've written a few in the past, > and I could easily write another... > But by now, we have at least four: > -two in cpio > -one in tar: void copy_in_out(int in, int out, off_t size) > -one for cp/mv/...: xsendfile(int in, int out)
I've been focusing more on poll/select loops, as found in netcat, tail -f, telnet/telnetd... > I'm not counting cat, since it only does one byte at a time. Well, it has the _option_ of only doing one byte at a time. :) > The one in dd is also not relevant, since it's got a bunch of > special requirements. Ditto. > These have subtle variations in what they do: > cpio archive creation: > write garbage and continue on short read > Corrupts one file, but allows to continue: among other things, > this makes cpio -p keep going even if some of the files are on bad blocks. Not so much bad blocks as "file got truncated between the time we wrote the header data and the time we read the file data". This is a problem, and should cause it to exit with an error, but shouldn't cause archiving an entire directory to abort. (Especially if the archive is being generated to stdout, we _can't_ go back and fix the header info after the fact, but there's an inherent race condition between the metadata and the file data, especially when reading large files.) Another use case is archiving log files that constantly get appended to: we need to read the amount the header said and not to the end of the file. > cpio archive extraction: > die on short read, die on short write (reasonable) Because then we don't get the next file's metadata and can't move on. (Premature end of file.) > tar copy_in_out(): > die on short read, try to avoid but ignore short write (calls writeall()) It's in pending because I haven't cleaned it up yet, but should presumably work like cpio. > Aborts when hitting a bad block and creating archive > as well as on extracting truncated archive. > Blindly continues on running out of space. See "not cleaned up yet". If it's still in pending, there's generally a reason. (I do glance over them when I merge stuff into pending. If it looks like 15 minutes or less to fix it, I generally do.) > xsendfile(): > no concept of file length, dies on write less than read. > Will truncate files on bad blocks and continue. xread() and xwrite() both exit if they get an actual error. And xwrite() exits if it can't write all the data. (It's on top of writeall() which retries short writes until success or error; given a nonblock filehandle it should spin.) See also "man 2 sendfile" which is on the todo list. > I'm wondering how best to generalize this. > It seems that die on short read/write is currently the most relevant one. What's important is what's the _correct_ behavior. (Thunderbird giving me a spelling squiggle for nonblock, filehandle, and todo just means its developers are clueless. But "behavior" is, in fact, spelled that way and it should darn well cope.) > But ar would seem to want a slightly different approach for some > functions, which would not be compatible with any of the current > archivers (ar is the only non-streaming archiver so far): Because I want to eventually implement even things like gene2fs, mkfatfs, and mkisofs as streaming archivers. (I can't do streaming zip decompression because the metadata's at the end and I'd have to buffer the whole archive anyway, but absent a reason like that being able to pipe straight to/from gzip or ssh is worth a little extra effort.) > -create: > die on short write (after deleting new archive/file?) There's existing logic in patch and sed -i to delete error path temporary files via atexit(). I keep meaning to genericize that further into something stackable. See tempfile_handler() in lib/lib.c and yes the static there should go in struct toy_context, that's part of the cleanup I need to do. > indicate bytes written on short read indicate? > This *roughly* corresponds to xsendfile(), but returning an off_t. ssize_t according to man 2 sendfile. I just hadn't yet because nothing was using it. If I expand it, I'd want to move towards convergence with the syscall... except that gratuitously wants one of its arguments to be a network socket for NO OBVIOUS REASON. When then the pipe improvements went into the kernel they were talking about improving it to work with any two arbitrary filehandles, but I'd need some sort of version probe to see whether I could use it or have to fall back to the C implementation, and it's on the todo list... http://blog.superpat.com/2010/06/01/zero-copy-in-linux-with-sendfile-and-splice/ I dunno if the genericization work made it upstream or if it needs splice() still. (There was talk about it on lwn.net at one point...) Dear thunderbird, genericization is so a word. Oh good grief it's got a squiggle under thunderbird. I am not capitalizing the t. Deal with it. Ok, if the above two are separate sentences, full of squiggles. Together as a paragraph: no squiggles. But "Ok" has a squiggle. I'm going to give up trying to understand this email client now. > I suppose I could use xsendfile() and then lseek() rather than refactoring > xsendfile(). Is there a reason working like cpio is not an option? You know how much data you're sending to it, right? I can see extending xsendfile with an argument "send this many bytes", with -1 meaning "just keep going until the source runs out". But the x prefix means "die if you can't", and this might need more nuanced error handling than that... (I'm regularly tempted to add w versions to complement the x ones, that warn if it can't but continue anyway. But the caller would need error handling as is, so it wouldn't save much and would tempt callers into not handling errors, so I've stuck with the all-or-nothing approach at the expense of some redundancy, ala 179 calls to error_msg() or perror_msg() in toys/*/*.c.) > -extract: > die on short read (corrupt file), die on short write (out of space). Sounds about right. > So I guess the sensible course is to write xcopyall() and make all the > archivers use it where relevant. I lost the thread, you need it to do what? It sounds like you want xsendfile() to take a third argument, a max length to copy with -1 meaning the current "until the end" behavior, and then return the number of bytes copied? (With premature EOF being perror_exit() territory?) Except this implies a non-x sendfile that _won't_ exit for a short read, and that name's taken, but I can come up with something. No need for offset as an argument, we can lseek separately if we need that. (Do one thing and do it well...) > Thanks, > Isaac Dunham Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
