On Wed, Dec 17, 2014 at 01:45:19PM -0600, Rob Landley wrote:
>
>
> On 12/17/2014 02:20 AM, Isaac Dunham wrote:
> > On Tue, Dec 16, 2014 at 04:34:47PM -0600, Rob Landley wrote:
> >>> tar copy_in_out():
> >>> die on short read, try to avoid but ignore short write (calls
> >>> writeall())
> >>
> >> It's in pending because I haven't cleaned it up yet, but should
> >> presumably work like cpio.
> >
> > I had been thinking about making it abort on short read or write.
>
> Archiving directories that are in use (such as a user's entire home
> directory) is a reasonably common use case. Aborting on the first file
> that doesn't work like expected means you have to quiesce the machine to
> snapshot its state. You don't have to do that with busybox or ubuntu tar
> implementations.
>
> Doing that during extract is understandable because the data's already
> serialized so you can't recover from premature EOF. But doing that on
> create is just lazy. (It should exit with an error code at the end, but
> it should create the archive in the meantime.)
>
> Same for cp and mv and rm and such. Inability to deal properly with a
> file in the tree doesn't abort the tree traversal. (Some of this is
> explicit in posix. Tar is one posix dropped the ball on, but the right
> thing to do is still obvious.)
OK, then tar should do the same thing as cpio.
> > It is also *required* to support insertion at arbitrary locations
> > (before or after specified files).
>
> Meaning it reads the old one and writes a new one.
>
> > Besides that, POSIX says "STDIN: Not used."
> > Which I'm *quite* happy with.
>
> Posix doesn't mention sed -i either. Posix only mentions tar in the
> context of pax (and its version won't work on modern systems, the file
> length and symlink behavior is wrong), and it deprecated cpio in 2001
> and only ever documented the 6 byte (not 8 byte) version...
Show me *any* way to get an existing ar implementation to use stdin
or write to stdout, and I'll shut up about this ;).
GNU ar is the one GNU application I've found that treats '-' as a
literal file name; when I run ar rc /proc/self/fd/1, it dies due to
lseek failing.
I'm not arguing that GNU is always or even generally right, but there
comes a point when a feature is not worth the complications it
entails--and GNU generally ends up on the overfeatured side.
> >> ssize_t according to man 2 sendfile. I just hadn't yet because nothing
> >> was using it. If I expand it, I'd want to move towards convergence with
> >> the syscall... except that gratuitously wants one of its arguments to be
> >> a network socket for NO OBVIOUS REASON.
> >
> > By "indicate bytes written" I mean "return the total number of
> > bytes written".
> >
> > According to my man pages, "In Linux kernels before 2.6.33, out_fd
> > must refer to a socket. Since Linux 2.6.33 it can be any file."
>
> That's still recent enough (2010) I need a probe, but yeah we should use it.
>
> > Using sendfile will of course require a loop if you have a file larger
> > than half the address space;
>
> Why? If you enable long file support in libc (hardwired on in musl,
> present in 2.4, _not_ enabling that is pilot error) then size_t should
> be 64 bit?
size_t == maximum size *in memory* == long or unsigned long on *nix
(musl defines size_t the same as ssize_t, since C sporadically
intermixes the two...)
sendfile() works by mmap()ing infd and writing it to outfd in one syscall,
so it's memory-limited.
off_t == maximum size *on disk* == long long with long file support.
Hence the desire for off_t rather than size_t.
> > I was hoping for something that won't
> > croak if given a 9-gigabyte file on a 32 bit computer.
> > ar was clearly *intended* to use 32-bit off_t, but the record can store
> > any file size less than 10^10 bytes.
>
> lib/portability.h line 27:
>
> // Always use long file support.
> #define _FILE_OFFSET_BITS 64
>
> Limitations of the file format are another matter, but storing a single
> .o file larger than 2 gigs should never happen even on a 64 bit system.
> (Truncate and error_message("too long '%s'", filename); which
> automatically sets the error number we exit to 1 if it was 0. Doesn't
> set it if it was already nonzero, to preserve the specific error value
> if something cares.)
It *shouldn't* happen, and even Debian packages never get that big,
but if xsendfile is used elsewere we'd still need to call sendfile
in a loop.
> >>> I suppose I could use xsendfile() and then lseek() rather than
> >>> refactoring
> >>> xsendfile().
> >>
> >> Is there a reason working like cpio is not an option?
> >
> > Because (0) I didn't want to refactor the input loop of cpio (it seems to
> > have gotten a bit more daunting...), (1) we don't have to worry about
> > things like passthrough mode, since we *always* are dealing with
> > seekable files,
>
> ar c <(ssh user@addr cat filename.o)
>
> I have done stranger things. On a fairly regular basis, actually.
Doesn't work with anyone else's ar, and it *couldn't* work unless you
store the whole file then check how large it was and guess a file name.
> > (2) and it makes no sense to have ar create even
> > partly corrupt libraries or archives.
>
> Since this archiver has a very dominant primary use, you have a point.
> But for things like tar, cpio, mv, cp, rm, sed -i, filesystem
> generators... not so much.
>
> That argues _against_ trying to make it use a generic function that
> would go the other way.
Archivers would be using
xsendfile(archive, curfile, filelen);
//lskip(pad);
to extract.
I'm guessing ar can use the extended xsendfile() on create, as well.
> > (its only common applications are static libraries and debian packages,
> > where it should fail hard if it doesn't work perfectly.)
> > GNU ar won't leave an archive around if you specify to archive a file
> > that exists and one that doesn't--and that's the right course of action,
> > for once.
>
> It would be nice if
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html
> specified this, but yeah you're probably right.
>
> > With an atexit() handler, I could simply use the extended xsendfile().
>
> Right now only cp, tail, and bootchartd use it directly. (Which is odd
> because it was created for patch, but that uses replace_tempfile() in
> lib/lib.c which calls xsendfile() internally.)
>
> >> You know how much data you're sending to it, right? I can see extending
> >> xsendfile with an argument "send this many bytes", with -1 meaning "just
> >> keep going until the source runs out". But the x prefix means "die if
> >> you can't", and this might need more nuanced error handling than that...
> >>
> >>> -extract:
> >>> die on short read (corrupt file), die on short write (out of space).
> >>
> >> Sounds about right.
Thanks,
Isaac Dunham
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net