Hi, I am curious to understand the impact of having large number of versions in HBase. Suppose I want to maintain previous 100 versions for a row/cell.
My thoughts are:- Having large number of versions means more number of HFiles More number of HFiles can increase lookup time of a rowKey. Hypothesis 1 : Region server has to check each HFile for the presence of that rowKey and then based on timestamp it will accumulate the latest version. Hypothesis 2 : Region server may not scan each HFile. Based on last creation date of HFile,as soon as it gets rowKey in the last created HFile it will not scan HFiles further. Because we are interested in latest version only and we have got in the file recently created. Want to confirm what is true among 1 and 2. Similarly, large number of versions can also degrade the performance of full scan for joins etc. Thanks -Sachin
