LSM tree are the basis for reducing random I/O which is a huge performance factor with big data system. A good overview can be found in HBase in action book, from Lars George. The basic idea is that you have an in memory structure for the latest changes and a structure stored on files, The files content is always ordered by key, and each row the file is jus the row_key, Column family identifier, column name, timestamp and the value (+ a marker). When the memory is full, the memory structure is flushed to disk, when there are a certain amount of files on filesystem the files are merged to bigger ones, since the files are ordered the merge is very fast, (like merge in mergesort algo)
On Sun, Dec 8, 2013 at 8:42 AM, Ted Yu <[email protected]> wrote: > Searching for 'lsm tree hbase' would give you several articles. > > I am in China - the search results are mostly in Chinese. > > You should be able to read this: > > http://stackoverflow.com/questions/13762992/log-structured-merge-tree-in-hbase > > Cheers > > > On Wed, Dec 4, 2013 at 6:49 PM, AnilKumar B <[email protected]> wrote: > > > Hi, > > > > We are trying to understand how and where exactly LSM tress are used in > > HBase. Currently as per our understanding, while flushing memstore to > Store > > files and while HFile compaction it is used. And sits on top of HFiles at > > memstore level. > > > > Is this understanding correct. Can you please give more insight on this? > > How exactly is the merging done? > > > > Thanks & Regards, > > B Anil Kumar. > > >
