On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[email protected]> wrote: > > My understanding of compression in Accumulo 1.4.1 is that it is on by > default and that data is decompressed by the tablet server, so data on the > wire between server/client is decompressed. Is there a way to shift the > decompression from happening on the server to the client? I have a use case > where each Value in my table is relatively large (~ 8MB) and I can benefit > from compression over the wire. I don't have any server side iterators, so > the values don't need to be decompressed by the tablet server. Also, each > scan returns a few rows, so client-side decompression can be fast. > > The only way I can think of now is to disable compression on that table, and > handle compression/decompression in the application. But if there is a way > to do this in Accumulo, I'd prefer that. >
There are two levels of compression in Accumulo. First redundant parts of the key are not stored. If the row in a key is the same as the previous row, then its not stored again. The same is done for columns and time stamps. After the relative encoding is done a block of key values is then compressed with gzip. As data is read from an RFile, when the row of a key is the same as the previous key it will just point to the previous keys row. This is carried forward over the wire. As keys are transferred, duplicate fields in the key are not transferred. As far as decompressing on the client side vs server side, the server at least needs to decompress keys. On the server side you usually need to read from multiple sorted files and order the result. So you need to decompress keys on the server side to compare them. Also iterators on the server side need the keys and values decompressed. > Thanks, > Ameet
