The new splitted region might be moved due to load balancing. Aren't you experiencing the classic hot spotting? Only 1 RS getting all write traffic? Just place a preceding byte before the time stamp and round robin each put on values 1-num of region servers.
On Wednesday, June 19, 2013, yun peng wrote: > Hi, All, > Our use case requires to persist a stream into system like HBase. The > stream data is in format of <timestamp, value>. In other word, timestamp is > used as rowkey. We want to explore whether HBase is suitable for such kind > of data. > > The problem is that the domain of row key (or timestamp) grow constantly. > For example, given 3 nodes, n1 n2 n3, they are resp. hosting row key > partition [0,4], [5, 9], [10,12]. Currently it is the last node n3 who is > busy receiving upcoming writes (of row key 13 and 14). This continues until > the region reaches max size 5 (that is, partition grows to [10,14]) and > potentially splits. > > I am not expert on HBase split, but I am wondering after split, will the > new writes still go to node n3 (for [10,14]) or the write stream can be > intelligently redirected to other less busy node, like n1. > > In case HBase can't do things like this, how easy is it to extend HBase for > such functionality? Thanks... > Yun >
