Hello, we ran various tests with our expected hbase schema and ended up with snappy included in CDH3 update 1.
While GZ compressed data was half of LZO/Snappy compressed data, Snappy showed much better performance than GZ and LZO in our tests. Compression rate of LZO and Snappy is pretty much the same. LZO is a hassle deployment wise, because it needs to be installed separately, Snappy is a no-brainer with CDH3 update 1. Regards, Thomas -----Original Message----- From: Wayne [mailto:[email protected]] Sent: Mittwoch, 14. September 2011 14:34 To: [email protected] Subject: Compression I wanted to do a poll on what compression libraries people are using and why. We currently use lzo but are considering other alternatives for various reasons. We would like to move to CDH3 but adding lzo ourselves is a hassle we are not looking to take on. It kind of defeats the purpose os using CDH3 to begin with. We current run 20.0 append. I know there are a lot of variables that affect the best decision, but we are looking for general trends in the community. Is lzo still the most recommended? Is there benefit in using the lzo professional library and does anyone use this? Is snappy just as good as lzo and a lot easier to deal with in term of node build/releases? Does zlib/gzip have any traction? Compression ratios are important but as always performance/speed is our biggest requirement. What are people using and why? Where is the momentum going? Compression is a huge benefit of hadoop/hbase and having high compression ratios with solid performance is a major benefit. Any recommendations would be appreciated. Thanks.
