Re: Hbase performance with HDFS

Arvind Jayaprakash Mon, 11 Jul 2011 06:34:45 -0700

On Jul 07, Andrew Purtell wrote:
>> Since HDFS is mostly write once how are updates/deletes handled?
>
>Not mostly, only write once.
>
>Deletes are just another write, but one that writes tombstones
>"covering" data with older timestamps. 
>
>When serving queries, HBase searches store files back in time until it
>finds data at the coordinates requested or a tombstone.
>
>The process of compaction not only merge sorts a bunch of accumulated
>store files (from flushes) into fewer store files (or one) for read
>efficiency, it also performs housekeeping, dropping data "covered" by
>the delete tombstones. Incidentally this is also how TTLs are
>supported: expired values are dropped as well.


Just wanted to talk about WAL. My understanding is that updates are
journalled onto HDFS by sequentially recording them as they happen per
region. This is where the need for HDFS append comes in, something that
I don't recollect seeing in the GFS paper.

Despite having support for append in HDFS, it is still expensive to
update it on every byte and here is where the wal flushing policies come
in.

Re: Hbase performance with HDFS

Reply via email to