Re: Question about compression

Jean-Daniel Cryans Fri, 06 Jul 2012 14:54:01 -0700

Inline.

J-D


On Fri, Jul 6, 2012 at 3:21 AM, Christian Schäfer <[email protected]> wrote:
> a) Where does compression (like snappy) actually occur.
>
> I set snappy to a column family and filled it with some data (30 MB) -> 
> 640x480 array of 11 Bit values.
>
> After flushing the memstore the size of the data kept exactly the same but 
> flushing was 10x faster than flushing of the table without compression.
>
> So it's "only" the transfer that is compressed? Or are there possibilities to 
> apply compression to the HFiles?

The files are compressed on flush/compact and it's done per 64KB
block. I doubt it the file was the same size as the memstore, look at
your log where it gives the numbers for each flush.

>
> (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems quite 
> tedious to me)

Stop everything, deploy new version, restart.

>
>
> b) Are there some possibilities to apply delta-compression to HBase to 
> minimize disk usage due to duplicated data?
>
> Has it to be added or even built or is it already included in HBase?

The first hit when googling "hbase delta compression" returns this:
https://issues.apache.org/jira/browse/HBASE-4218

As you can see it was included in 0.94 (no clue how that translates
for CDH... CDH5??)

There is also prefix compression in the pipeline:
https://issues.apache.org/jira/browse/HBASE-4676

Hope this helps,

J-D

Re: Question about compression

Reply via email to