On Tue, 2009-11-24 at 17:00 -0500, Edward Ned Harvey wrote:
> [...] when I use dd to
> read from stdin, it can only read partial records,
> [example command is:]
> cat random-1G-file | time dd of=/dev/rmt/0n bs=256k ;  time mt
-f /dev/rmt/0 rewind

For the partial-record problem, try "obs=256k" instead -- note
the "o".

Here's why.  A pipe has a limited capacity.  Once it's full, the
writer blocks until the reader has freed up some space.

More to the point, a read from a pipe is like a read from a tty;
if a read() call requests more bytes than the pipe has buffered
at the moment, it returns only what's currently in the pipe,
without blocking.  read() only blocks if the pipe was *already*
empty.

And I presume that on OSol, a pipe's capacity is less than 256
KB.  (On the Linux system that I'm writing this on, it's 64 KB.)

The upshot of all this is that dd *can't* read 256 KB from the
pipe in one go, but only in multiple chunks.

Thus, to generate output records of that size, dd has to
"reblock".  But "bs=256k" tells it not to do that; it says to
copy "each input block [...] to the output as a single block
without aggregating short blocks" (quote from dd(1)).

By contrast, specifying (or defaulting) ibs= and obs= separately
tells dd that it should decouple read sizes from write sizes in
order to force each write to be of the requested obs= size.
(Handling of a partial final block is controlled by other
options.)

<aside>
This is actually the original point of dd -- reblocking tapes,
and doing the other arcane tapeish stuff that was the meat and
potatoes of data processing back when UNIX was invented, but
which UNIX itself, by its "all the world's a byte stream"
philosophy, has since made thankfully obsolete.  So people have
found other uses for dd, and its many tape-wrangling capabilities
go mostly unused and ignored :-/

(There are various stories of dd's origins.  Most revolve around
a "DD" command in IBM OS/360 JCL; hence the name and the atypical
option syntax.  One version I once heard -- no idea whether it's
true -- is that dd's man page came first.  According to this
telling, the man page was a joke -- an attempt to imagine what
"cat" would have to look like on an IBM mainframe, or some such
thing.  But then someone realized that the program itself would
actually be useful...)
</aside>


> and then when it?s done,
> it will never terminate properly.  Meaning - ?time? reports the
finished time
> of the process, but then I have to hit Ctrl-C to progress any further.

Not sure what this is about.  Perhaps the "time" is confounding
things, by holding one of the stdxxx file descriptors open.  Try
removing that, or moving it outside the pipeline:
        time sh -c 'cat ... | dd ...'

Of course that includes the cat command in the timings :-(

  - Eric


Reply via email to