Hey pluggers, question for you all. One thing that this smaller project I'm working on is going to need is some way of automatically transferring files across when a new system is added to the network. The actual method of booting isn't important right now (probably a flash drive or something). What I am facing is that I expect that there will be around 600GB of files that need to be written to each machine when it first boots. 600GB on a 1GbE network is nearly two hours if the math off the top of my head is correct. And that's if only the one system is doing any network I/O at that time. Any other network traffic would slow that down even further. I thought I might reduce that time (and network traffic) significantly by having a compressed archive of the various files (like the old .tar.bz2 files). But I know that bzip2 is not the best compressor anymore. It's not too bad, but there are better ones. So I ask what you guys would recommend as the compression system? The only restriction I have on it is that it must be able to either handle the peculiarities of Unix vs. Dos/Windows systems (i.e. ownerships, permissions, device files, and symlinks, like tar) OR it must be able to compress from/decompress to stdin/stdout (like bzip2). The goal here is to get the archive as tiny as possible so that it uses as little network traffic as possible during the extraction. And a two step process is unfortunately out of the question. The machines will only have either 750GB or 1TB hdds, which obviously won't work for extracting the tar to disk then extracting from the tar on disk. tar's extraction process would run out of space before it finished. Libraries aren't an issue because I could put the libraries in the nfs directory and call the compression program with LD_LIBRARY_PATH=<nfs path>, assuming I don't just build the program (assuming I can get the source) as static in the first place.
Any recommendations are welcome. The only archivers I've worked with in the past are the DOS style archivers (zip, arc, arj, lzh, & rar) which would not handle permissions, ownership, or symlinks at all. If it was just me, I'd say stick with .tar.bz2, as it works and works well. Maybe not the tightest, but it gets the job done well enough. However, for this guy, he seems to think that the network will be running at a rather high usage rate (he claims about 75% average bandwidth usage) so every packet is at a premium since it could easily saturate the network with only one computer. Add in a second or more, and it's REALLY full and everything slows down. Thanks folks! --- Dan /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
