Simon Wong wrote:
ssh [EMAIL PROTECTED] "cat client.img.gz" | gunzip | dd of=/dev/hda1This seems to work reasonably well until it gets somewhere around the 1G mark at which point everything seems to have slowed down to a crawl.
That first 1GB is the oddity. It's just some buffers filling. The results after that are the "real" results where buffering no longer gives an advantage and you see the true sustained (lack of) throughput.
The result is that it takes nearly as long as when I don't compress it and just pull the whole 8G partition through ssh.
That would indicate that you're filling the bus to the hard disk, not the network between the CPUs. dd is writing a lot of empty blocks down to that disk. You might want to consider using tar is the filesystem has a lot of empty space.
Anyone got any clues on how to do this more on the fly so all data is passed through without any buffering?
That's a very handwaving question since you haven't told us anything about the computer. The bottleneck obviously varies with the hardware. Firstly, work out if what you are doing is possible. Look up the sustained throughput of your drive. Make sure your disk interface can supply that. What's the slowest network link -- 100Mbps or 1000Mbps? What's the length of the network link in 1000Km. For example, 8GB will take at least 160s to be written to a single disk, at least 640s to cross a fast ethernet link. Now set the machine for maximum performance. hdparm should report 32b UDMA, write caching, look ahead, APM disabled, fast AAM goodness. TCP buffers should be larger than the bandwidth-delay product. ssh shouldn't be doing (de)compression, since you've already compressed the file. Test. Are you CPU-bound, I/O-bound or network bound (top, vmstat, etc)? If you are, is that bound reasonable (eg, 90% of the sustained disk write rate)? You might want to try transferring from and to /dev/null to check network+CPU preformance. Note that ssh has a special performance issue -- it uses a 128MB window so the bandwidth- delay product needs to be under that for ssh to run at maximum speed. Now you're found the bottleneck, fix it. Repeat test until one of the hard bounds is reached. If possible, chose a better method to move away from that bound (eg, dd v tar; ssh v ftp of of a gpg-encrypted file). If you're going to be doing this a lot (eg, a disk image server) then you might want to thing about the disk subsystem. For example, using disk striping can double the sustained throughput. If you've only got a LAN and a desktop machine then working though this takes an afternoon. If you've got a supercomputer and a long-haul network it can take several weeks. Remember to record your results at each step. Let us know the interesting numbers (eg, what is the real throughput of your particular drive). There's surprisingly few "real" benchmarks out there, so you'll be helping a lot by posting your numbers. Cheers, Glen -- Glen Turner Tel: (08) 8303 3936 or +61 8 8303 3936 Australia's Academic & Research Network www.aarnet.edu.au -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
