Stargate is an HBase client. What it gets from the cluster is uncompressed
cell data, like any other client. SNAPPY compression happens at a lower
layer, the HFile layer, which is completely a server side thing. I think
you are seeing base64 encoding and assuming compression. Stargate encodes
keys and values in base64 because keys and values in HBase are not strings,
they are byte[]. When you request XML or JSON representations, we have to
pessimistically assume any key or value may contain non-ASCII non-XML-safe
characters. We won't use base64 encoding for the protobuf (Accept:
application/protobuf) or binary (Accept: application/octet-stream)
representations.



On Tue, Sep 8, 2015 at 8:45 AM, David Weiser <[email protected]> wrote:

> I have a table which uses SNAPPY compression:
>
> ```bash
> curl http://my-host/my-table/schema
> { NAME=> 'my-table', IS_META => 'false', IS_ROOT => 'false', COLUMNS => [ {
> NAME => 'd', BLOCKSIZE => '65536', BLOOMFILTER => 'NONE', MIN_VERSIONS =>
> '0', KEEP_DELETED_CELLS => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE =>
> 'true', COMPRESSION => 'SNAPPY', VERSIONS => '1', REPLICATION_SCOPE => '0',
> TTL => '2592000', DATA_BLOCK_ENCODING => 'NONE', IN_MEMORY => 'false' }, {
> NAME => 'h', BLOCKSIZE => '65536', BLOOMFILTER => 'NONE', MIN_VERSIONS =>
> '0', KEEP_DELETED_CELLS => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE =>
> 'true', COMPRESSION => 'SNAPPY', VERSIONS => '1', REPLICATION_SCOPE => '0',
> TTL => '2592000', DATA_BLOCK_ENCODING => 'NONE', IN_MEMORY => 'false' }, {
> NAME => 'j', BLOCKSIZE => '65536', BLOOMFILTER => 'NONE', MIN_VERSIONS =>
> '0', KEEP_DELETED_CELLS => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE =>
> 'true', COMPRESSION => 'SNAPPY', VERSIONS => '1', REPLICATION_SCOPE => '0',
> TTL => '31104000', DATA_BLOCK_ENCODING => 'NONE', IN_MEMORY => 'false' } ]
> }%
> ```
>
> When I GET a row, I notice that the cell value is still compressed:
>
> ```bash
> $ curl -H "Accept: application/json" http://my-host/my-table/row-key/d:
>
> {"Row":[{"key":"somesuperfunkysnappyencodedstring==","Cell":[{"column":"ZDo=","timestamp":1440632833058,"$":"dHJ1ZQ=="}]}]}%
> ```
>
> Naively, I'd expect that when I GET a cell, that the cell value would be
> decoded before it is sent over the wire.  However this is not the case.
>
> I've tried to google-fu various HTTP headers to instruct Stargate to decode
> the values before it sends it back, but I've discovered nothing.
>
> Is there some other way to get Stargate to decompress the cell values for
> me?  Or do I need to do that myself?
>
> --
> Thanks,
> dw
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to