Hi, We are seeing some unfortunately low performance in the memstore - we have researched some of the previous JIRA(s) and seen some inefficiencies in the ConcurrentSkipListMap. The symptom is a RegionServer hitting 100 % cpu at weird points in time - the bug is hard to reproduce and there isn't like a huge # of extra reads going to that region server or any substantial hotspot happening. The region server recovers the moment, we flush the memstores or restart the region server. Our queries retrieve wide rows which are upto 10-20 columns. A stack trace shows two things:
1) Time spent inside MemstoreScanner.reseek() and inside the ConcurrentSkipListMap 2) The reseek() is being called at the "SEEK_NEXT" column inside StoreScanner - this is understandable since the rows contain many columns and StoreScanner iterates one KeyValue at a time. So, I was looking at the code and it seems that every single time there is a reseek call on the same memstore scanner, we make a fresh call to build an iterator() on the skip list set - this means we an additional skip list lookup for every column retrieved. SkipList lookups are O(n) and not O(1). Related JIRA HBASE 3855 made the reseek() scan some KVs and if that number if exceeded, do a lookup. However, it seems this behaviour was reverted by HBASE 4195 and every next row/next column is now a reseek() and a skip list lookup rather than being an iterator. Are there any strong reasons against having the previous behaviour of scanning a small # of keys before degenerating to a skip list lookup ? Seems like it would really help for sequential memstore scans and for memstore gets with wide tables (even 10-20 columns). Thanks Varun
