In re-reading your response, I may have overlooked one key point. >> columns and time stamps. After the relative encoding is done a block >> of key values is then compressed with gzip.
Are the keys+values compressed together as one block? If thats the case, I can see why its not possible to only decompress keys and leave values compressed. Also, I've switched to double compression as per previous posts and its working nicely. I see about 10-15% more compression over just application level Value compression. Thanks for your responses, Ameet On Tue, Oct 2, 2012 at 2:30 PM, ameet kini <[email protected]> wrote: >> need to decompress keys on the server side to compare them. Also >> iterators on the server side need the keys and values decompressed. > > keys, I understand, but why do values need to be decompressed if there were > no user iterators installed on the server? Are there system iterators that > look inside the value? > > Ameet > > On Tue, Oct 2, 2012 at 2:24 PM, Keith Turner <[email protected]> wrote: >> >> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[email protected]> wrote: >> > >> > My understanding of compression in Accumulo 1.4.1 is that it is on by >> > default and that data is decompressed by the tablet server, so data on >> > the >> > wire between server/client is decompressed. Is there a way to shift the >> > decompression from happening on the server to the client? I have a use >> > case >> > where each Value in my table is relatively large (~ 8MB) and I can >> > benefit >> > from compression over the wire. I don't have any server side iterators, >> > so >> > the values don't need to be decompressed by the tablet server. Also, >> > each >> > scan returns a few rows, so client-side decompression can be fast. >> > >> > The only way I can think of now is to disable compression on that table, >> > and >> > handle compression/decompression in the application. But if there is a >> > way >> > to do this in Accumulo, I'd prefer that. >> > >> >> There are two levels of compression in Accumulo. First redundant >> parts of the key are not stored. If the row in a key is the same as >> the previous row, then its not stored again. The same is done for >> columns and time stamps. After the relative encoding is done a block >> of key values is then compressed with gzip. >> >> As data is read from an RFile, when the row of a key is the same as >> the previous key it will just point to the previous keys row. This is >> carried forward over the wire. As keys are transferred, duplicate >> fields in the key are not transferred. >> >> As far as decompressing on the client side vs server side, the server >> at least needs to decompress keys. On the server side you usually >> need to read from multiple sorted files and order the result. So you >> need to decompress keys on the server side to compare them. Also >> iterators on the server side need the keys and values decompressed. >> >> > Thanks, >> > Ameet > >
