[lopsa-tech] threadzip

Edward Ned Harvey Mon, 07 Dec 2009 21:45:12 -0800

If you're backing up some large data set, you're probably not going across a
slow network link.  You're probably going locally disk to disk (or disk to
tape), and you probably want the rate of compression to keep pace with the
hardware I/O.


 

In my backups, I found with compression enabled, jobs run slower.  At
minimum 9% longer (via lzop), typically 16% longer (via gzip --fast) or 37%
longer (via gzip), and worst case, 238% longer (via bzip2).  

 

The root cause is piping the data stream through a single processor to
compress serially.  So I created threadzip.

http://code.google.com/p/threadzip/

 

This project is in  its infancy now, but it is stable release 1.0.  It's
tiny, it's simple, and for these reasons, there's not much room for mistakes
in the code yet.

 

Based on the results below, the clear winners are:

If you care about speed:  threadzip

If you care about size:  pbzip2

 

164s       930MB                  tar cf - /usr/share | cat > /dev/null   

167s       377MB                  tar cf - /usr/share | threadzip.py -t 4
--fast > /dev/null

179s       433MB                  tar cf - /usr/share | lzop -c > /dev/null

190s       378MB                  tar cf - /usr/share | gzip --fast >
/dev/null

200s       301MB                  tar cf - /usr/share | pbzip2 -c >
/dev/null

225s       345MB                  tar cf - /usr/share | gzip > /dev/null

391s       300MB                  tar cf - /usr/share | bzip2 -c > /dev/null

_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

[lopsa-tech] threadzip

Reply via email to