On Tue, Oct 2, 2012 at 2:48 PM, ameet kini <[email protected]> wrote: > In re-reading your response, I may have overlooked one key point. > >>> columns and time stamps. After the relative encoding is done a block >>> of key values is then compressed with gzip. > > Are the keys+values compressed together as one block? If thats the > case, I can see why its not possible to only decompress keys and leave > values compressed.
yes, it currently compresses a sequence of key values into a single block. > > Also, I've switched to double compression as per previous posts and > its working nicely. I see about 10-15% more compression over just > application level Value compression. > > Thanks for your responses, > Ameet > > On Tue, Oct 2, 2012 at 2:30 PM, ameet kini <[email protected]> wrote: >>> need to decompress keys on the server side to compare them. Also >>> iterators on the server side need the keys and values decompressed. >> >> keys, I understand, but why do values need to be decompressed if there were >> no user iterators installed on the server? Are there system iterators that >> look inside the value? >> >> Ameet >> >> On Tue, Oct 2, 2012 at 2:24 PM, Keith Turner <[email protected]> wrote: >>> >>> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[email protected]> wrote: >>> > >>> > My understanding of compression in Accumulo 1.4.1 is that it is on by >>> > default and that data is decompressed by the tablet server, so data on >>> > the >>> > wire between server/client is decompressed. Is there a way to shift the >>> > decompression from happening on the server to the client? I have a use >>> > case >>> > where each Value in my table is relatively large (~ 8MB) and I can >>> > benefit >>> > from compression over the wire. I don't have any server side iterators, >>> > so >>> > the values don't need to be decompressed by the tablet server. Also, >>> > each >>> > scan returns a few rows, so client-side decompression can be fast. >>> > >>> > The only way I can think of now is to disable compression on that table, >>> > and >>> > handle compression/decompression in the application. But if there is a >>> > way >>> > to do this in Accumulo, I'd prefer that. >>> > >>> >>> There are two levels of compression in Accumulo. First redundant >>> parts of the key are not stored. If the row in a key is the same as >>> the previous row, then its not stored again. The same is done for >>> columns and time stamps. After the relative encoding is done a block >>> of key values is then compressed with gzip. >>> >>> As data is read from an RFile, when the row of a key is the same as >>> the previous key it will just point to the previous keys row. This is >>> carried forward over the wire. As keys are transferred, duplicate >>> fields in the key are not transferred. >>> >>> As far as decompressing on the client side vs server side, the server >>> at least needs to decompress keys. On the server side you usually >>> need to read from multiple sorted files and order the result. So you >>> need to decompress keys on the server side to compare them. Also >>> iterators on the server side need the keys and values decompressed. >>> >>> > Thanks, >>> > Ameet >> >>
