On Tue, 2009-11-24 at 17:00 -0500, Edward Ned Harvey wrote: > [...] when I use dd to > read from stdin, it can only read partial records, > [example command is:] > cat random-1G-file | time dd of=/dev/rmt/0n bs=256k ; time mt -f /dev/rmt/0 rewind
For the partial-record problem, try "obs=256k" instead -- note the "o". Here's why. A pipe has a limited capacity. Once it's full, the writer blocks until the reader has freed up some space. More to the point, a read from a pipe is like a read from a tty; if a read() call requests more bytes than the pipe has buffered at the moment, it returns only what's currently in the pipe, without blocking. read() only blocks if the pipe was *already* empty. And I presume that on OSol, a pipe's capacity is less than 256 KB. (On the Linux system that I'm writing this on, it's 64 KB.) The upshot of all this is that dd *can't* read 256 KB from the pipe in one go, but only in multiple chunks. Thus, to generate output records of that size, dd has to "reblock". But "bs=256k" tells it not to do that; it says to copy "each input block [...] to the output as a single block without aggregating short blocks" (quote from dd(1)). By contrast, specifying (or defaulting) ibs= and obs= separately tells dd that it should decouple read sizes from write sizes in order to force each write to be of the requested obs= size. (Handling of a partial final block is controlled by other options.) <aside> This is actually the original point of dd -- reblocking tapes, and doing the other arcane tapeish stuff that was the meat and potatoes of data processing back when UNIX was invented, but which UNIX itself, by its "all the world's a byte stream" philosophy, has since made thankfully obsolete. So people have found other uses for dd, and its many tape-wrangling capabilities go mostly unused and ignored :-/ (There are various stories of dd's origins. Most revolve around a "DD" command in IBM OS/360 JCL; hence the name and the atypical option syntax. One version I once heard -- no idea whether it's true -- is that dd's man page came first. According to this telling, the man page was a joke -- an attempt to imagine what "cat" would have to look like on an IBM mainframe, or some such thing. But then someone realized that the program itself would actually be useful...) </aside> > and then when it?s done, > it will never terminate properly. Meaning - ?time? reports the finished time > of the process, but then I have to hit Ctrl-C to progress any further. Not sure what this is about. Perhaps the "time" is confounding things, by holding one of the stdxxx file descriptors open. Try removing that, or moving it outside the pipeline: time sh -c 'cat ... | dd ...' Of course that includes the cat command in the timings :-( - Eric