Thanks Doug, I read the regions section from the book like you recommended but I still have some questions left.
When running a massive write job, the regionserver log show the memsize that is flushed. The problem is that most of the time the memsize is either much smaller then the memstore.flush.size configured (resulting in writing more files, which leads to frequent compactions) or bigger than memstore.flush.size * memstore.block.multiplier (resulting in Blocking updates for 'IPC Server handler # on <port>...). In some cases I also see HBaseServer throwing a ClosedChannelException: "WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler <handler #> on <port #> caught: java.nio.channels.ClosedChannelException" I guess these problems are also the cause for long (few minutes) pauses and in extreme cases Full GC during the write jobs. Any ideas anyone ? In general, I did some digging and couldn't find much about the write process in HBase from a "memory usage" point of view... besides the configurations description - maybe worth adding to the book. Thank you for all your help, Amit. On Mon, Jun 11, 2012 at 3:22 PM, Doug Meil <[email protected]>wrote: > > Hi there- > > Your understanding is on track. > > > You probably want to read this section.. > > http://hbase.apache.org/book.html#regions.arch > > Š as it covers those topics in more detail. > > > > > On 6/10/12 1:02 PM, "Amit Sela" <[email protected]> wrote: > > >Hi all, > > > >I'm trying to better understand what's going on in the region server > >during > >write to HBase. > > > >As I understand the process: > > > >1. Data is written to memstore. > >2. Once the memstore has reached hbase.hregion.memstore.flush.size -> > >memstore executes flush and writes a new StoreFile. > >3. The number of StoreFiles increases until a compaction is triggered. > > > >To my understanding, the compaction is triggered after a compaction check > >is done by either CheckCompaction thread running in the background or by > >the flush memstore executed. > >The compaction triggered will be a minor compaction BUT it could promote > >to > >major if it includes all store files. > >When will it NOT include all store files ? say I set compactionThreshld to > >3, then when the 3rd (or 4th) flush is executed, a compaction wiil be > >triggered and will promote to major since it includes all store files. > > > >Is this right ? can anyone elaborate ? > > >
