Inline. J-D
On Fri, Jul 6, 2012 at 3:21 AM, Christian Schäfer <[email protected]> wrote: > a) Where does compression (like snappy) actually occur. > > I set snappy to a column family and filled it with some data (30 MB) -> > 640x480 array of 11 Bit values. > > After flushing the memstore the size of the data kept exactly the same but > flushing was 10x faster than flushing of the table without compression. > > So it's "only" the transfer that is compressed? Or are there possibilities to > apply compression to the HFiles? The files are compressed on flush/compact and it's done per 64KB block. I doubt it the file was the same size as the memstore, look at your log where it gives the numbers for each flush. > > (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems quite > tedious to me) Stop everything, deploy new version, restart. > > > b) Are there some possibilities to apply delta-compression to HBase to > minimize disk usage due to duplicated data? > > Has it to be added or even built or is it already included in HBase? The first hit when googling "hbase delta compression" returns this: https://issues.apache.org/jira/browse/HBASE-4218 As you can see it was included in 0.94 (no clue how that translates for CDH... CDH5??) There is also prefix compression in the pipeline: https://issues.apache.org/jira/browse/HBASE-4676 Hope this helps, J-D
