Hi,

I am curious to understand the impact of having large number of versions in
HBase. Suppose I want to maintain previous 100 versions for a row/cell.

My thoughts are:-

Having large number of versions means more number of HFiles
More number of HFiles can increase lookup time of a rowKey.

  Hypothesis 1 : Region server has to check each HFile for the presence of
that rowKey and then based on timestamp it will accumulate the latest
version.

  Hypothesis 2 : Region server may not scan each HFile. Based on last
creation date of HFile,as soon as it gets rowKey in the last created HFile
it will not scan HFiles further. Because we are interested in latest
version only and we have got in the file recently created.

Want to confirm what is true among 1 and 2.

Similarly, large number of versions can also degrade the performance of
full scan for joins etc.

Thanks
-Sachin

Reply via email to