I found the following snippet on HBase book [0] It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are very dear to you because this will greatly increase StoreFile size.
Does this validate above hypothesis #2. [0]: http://hbase.apache.org/book.html#schema.versions On Tue, Nov 29, 2016 at 4:07 PM, Sachin Jain <[email protected]> wrote: > Hi, > > I am curious to understand the impact of having large number of versions > in HBase. Suppose I want to maintain previous 100 versions for a row/cell. > > My thoughts are:- > > Having large number of versions means more number of HFiles > More number of HFiles can increase lookup time of a rowKey. > > Hypothesis 1 : Region server has to check each HFile for the presence of > that rowKey and then based on timestamp it will accumulate the latest > version. > > Hypothesis 2 : Region server may not scan each HFile. Based on last > creation date of HFile,as soon as it gets rowKey in the last created HFile > it will not scan HFiles further. Because we are interested in latest > version only and we have got in the file recently created. > > Want to confirm what is true among 1 and 2. > > Similarly, large number of versions can also degrade the performance of > full scan for joins etc. > > Thanks > -Sachin >
