Re: Regarding data storage in HBase

Doug Meil Thu, 19 Jan 2012 10:14:16 -0800

Hi there-

re: #1


HDFS

See http://hbase.apache.org/book.html#regions.arch

Also see http://hbase.apache.org/book.html#trouble.namenode.hbase.objects
for what the directory structure looks like in HDFS.


Re #2:

Flushes are written as StoreFiles in HDFS.

See http://hbase.apache.org/book.html#regions.arch

Also see the section on "Region-RegionServer Locality"

re: #3

Flushed files, the total size of StoreFiles per region.


See http://hbase.apache.org/book.html#regions.arch

#4.  Not entirely sure about what you are asking, but see the WAL section
in the Regions section.




On 1/19/12 6:34 AM, "Praveen Sripati" <[email protected]> wrote:

>Hi,
>
>According to the `Hadoop - The Definitive Guide`
>
>Writes arriving at a regionserver are first appended to a commit log and
>then are added to an in-memory memstore. When a memstore fills, its
>content
>is flushed to the filesystem.
>The commit log is hosted on HDFS, so it remains available through a
>regionserver crash.
>
>Couple of questions
>
>1. When the memstore fills, is it flushed to HDFS or local file system?
>
>2. If the region size (hbase.hregion.max.filesize) is set to 200MB and the
>HDFS Block Size is set to 64MB, will the region be split across 4 data
>nodes? I know that this doesn't make sense to split a single regions data
>across nodes in HDFS, but how is it handled in HBase?
>
>3. Is region size (hbase.hregion.max.filesize) the size of commit log or
>the size of the file that has been flushed?
>
>4. The commit log might become big over time, is there similar concept of
>checkpoint in HBase for the commit logs?
>
>I am familiar with HDFS and trying to map it to HBase.
>
>Regards,
>Praveen

Re: Regarding data storage in HBase

Reply via email to