Stargate is an HBase client. What it gets from the cluster is uncompressed cell data, like any other client. SNAPPY compression happens at a lower layer, the HFile layer, which is completely a server side thing. I think you are seeing base64 encoding and assuming compression. Stargate encodes keys and values in base64 because keys and values in HBase are not strings, they are byte[]. When you request XML or JSON representations, we have to pessimistically assume any key or value may contain non-ASCII non-XML-safe characters. We won't use base64 encoding for the protobuf (Accept: application/protobuf) or binary (Accept: application/octet-stream) representations.
On Tue, Sep 8, 2015 at 8:45 AM, David Weiser <[email protected]> wrote: > I have a table which uses SNAPPY compression: > > ```bash > curl http://my-host/my-table/schema > { NAME=> 'my-table', IS_META => 'false', IS_ROOT => 'false', COLUMNS => [ { > NAME => 'd', BLOCKSIZE => '65536', BLOOMFILTER => 'NONE', MIN_VERSIONS => > '0', KEEP_DELETED_CELLS => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => > 'true', COMPRESSION => 'SNAPPY', VERSIONS => '1', REPLICATION_SCOPE => '0', > TTL => '2592000', DATA_BLOCK_ENCODING => 'NONE', IN_MEMORY => 'false' }, { > NAME => 'h', BLOCKSIZE => '65536', BLOOMFILTER => 'NONE', MIN_VERSIONS => > '0', KEEP_DELETED_CELLS => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => > 'true', COMPRESSION => 'SNAPPY', VERSIONS => '1', REPLICATION_SCOPE => '0', > TTL => '2592000', DATA_BLOCK_ENCODING => 'NONE', IN_MEMORY => 'false' }, { > NAME => 'j', BLOCKSIZE => '65536', BLOOMFILTER => 'NONE', MIN_VERSIONS => > '0', KEEP_DELETED_CELLS => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => > 'true', COMPRESSION => 'SNAPPY', VERSIONS => '1', REPLICATION_SCOPE => '0', > TTL => '31104000', DATA_BLOCK_ENCODING => 'NONE', IN_MEMORY => 'false' } ] > }% > ``` > > When I GET a row, I notice that the cell value is still compressed: > > ```bash > $ curl -H "Accept: application/json" http://my-host/my-table/row-key/d: > > {"Row":[{"key":"somesuperfunkysnappyencodedstring==","Cell":[{"column":"ZDo=","timestamp":1440632833058,"$":"dHJ1ZQ=="}]}]}% > ``` > > Naively, I'd expect that when I GET a cell, that the cell value would be > decoded before it is sent over the wire. However this is not the case. > > I've tried to google-fu various HTTP headers to instruct Stargate to decode > the values before it sends it back, but I've discovered nothing. > > Is there some other way to get Stargate to decompress the cell values for > me? Or do I need to do that myself? > > -- > Thanks, > dw > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
