Hi there- re: #1
HDFS See http://hbase.apache.org/book.html#regions.arch Also see http://hbase.apache.org/book.html#trouble.namenode.hbase.objects for what the directory structure looks like in HDFS. Re #2: Flushes are written as StoreFiles in HDFS. See http://hbase.apache.org/book.html#regions.arch Also see the section on "Region-RegionServer Locality" re: #3 Flushed files, the total size of StoreFiles per region. See http://hbase.apache.org/book.html#regions.arch #4. Not entirely sure about what you are asking, but see the WAL section in the Regions section. On 1/19/12 6:34 AM, "Praveen Sripati" <[email protected]> wrote: >Hi, > >According to the `Hadoop - The Definitive Guide` > >Writes arriving at a regionserver are first appended to a commit log and >then are added to an in-memory memstore. When a memstore fills, its >content >is flushed to the filesystem. >The commit log is hosted on HDFS, so it remains available through a >regionserver crash. > >Couple of questions > >1. When the memstore fills, is it flushed to HDFS or local file system? > >2. If the region size (hbase.hregion.max.filesize) is set to 200MB and the >HDFS Block Size is set to 64MB, will the region be split across 4 data >nodes? I know that this doesn't make sense to split a single regions data >across nodes in HDFS, but how is it handled in HBase? > >3. Is region size (hbase.hregion.max.filesize) the size of commit log or >the size of the file that has been flushed? > >4. The commit log might become big over time, is there similar concept of >checkpoint in HBase for the commit logs? > >I am familiar with HDFS and trying to map it to HBase. > >Regards, >Praveen
