Re: 答复: Possibility of using timestamp as row key in HBase

yun peng Thu, 20 Jun 2013 10:47:08 -0700

As I understand right, HBaseWD essentially implements a salting rowkey...
as in HBase Definitive Book... Thanks for the note though.




On Thu, Jun 20, 2013 at 12:37 AM, Bing Jiang <[email protected]>wrote:

> Maybe you can try HBaseWD,
>
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> https://github.com/sematext/HBaseWD
>
>
> 2013/6/20 谢良 <[email protected]>
>
> > Or maybe you could try to revert your rowkey:)
> > ________________________________________
> > 发件人: yun peng [[email protected]]
> > 发送时间: 2013年6月20日 5:59
> > 收件人: [email protected]
> > 主题: Re: Possibility of using timestamp as row key in HBase
> >
> > Thanks for the reply. The idea is interesting, but in practice, our
> client
> > don't know in advance how many data should be put to one RS. The data
> write
> > is redirected to next RS, only when current RS is initialising a flush()
> > and begins to block the stream..
> >
> > The real problem is not about splitting existing region, but instead
> about
> > adding a new region (or new key range).
> > In the original example, before node n3 overflows, the system is like
> > n1 [0,4],
> > n2 [5,9],
> > n3 [10,14]
> > then n3 start to flush() (say Memstore.size = 5) which may block the
> write
> > stream to n3. We want the subsequent write stream to redirect back to,
> say
> > n1. so now n1 is accepting 15, 16... for range [15,19].
> >
> > As I understand it right, the above behaviour should change HBase's
> normal
> > way to manage region-key mapping. And we want to know how much effort to
> > put to change HBase?
> >
> > Besides, I found Chapter 9 Advanced usage in Definitive Book talks a bit
> > about this issue. And they are based on the idea of adding prefix or
> hash.
> > In their terminology, we need the "sequential key" approach, but with
> > managed region mapping.
> >
> > Yun
> >
> >
> >
> > On Wed, Jun 19, 2013 at 5:26 PM, Asaf Mesika <[email protected]>
> > wrote:
> >
> > > You can use prefix split policy. Put the Same prefix for the data you
> > need
> > > in the same region and thus achieve locality of this data and also
> haves
> > a
> > > good load of your data and avoid split policy.
> > > I'm not sure you really need the requirement you described below
> unless I
> > > didn't follow your business requirements very well
> > >
> > > On Thursday, June 20, 2013, yun peng wrote:
> > >
> > > > It is our requirement that one batch of data writes (say of Memstore
> > > size)
> > > > should be in one RS. And
> > > > salting prefix, while even the load, may not have this property.
> > > >
> > > > Our problem is really how to manipulate/customise the mapping of row
> > key
> > > > (or row key range) to the region servers,
> > > > so that after one region overflows and starts to flush, the write
> > stream
> > > > can be automatically redirected to next region server,
> > > > like in a round robin way?
> > > >
> > > > Is it possible to customize such policy on hmaster? Or there is a
> > > similiar
> > > > way as what CoProcessor does on region servers...
> > > >
> > > >
> > > > On Wed, Jun 19, 2013 at 4:58 PM, Asaf Mesika <[email protected]
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > The new splitted region might be moved due to load balancing.
> Aren't
> > > you
> > > > > experiencing the classic hot spotting? Only 1 RS getting all write
> > > > traffic?
> > > > > Just place a preceding byte before the time stamp and round robin
> > each
> > > > put
> > > > > on values 1-num of region servers.
> > > > >
> > > > > On Wednesday, June 19, 2013, yun peng wrote:
> > > > >
> > > > > > Hi, All,
> > > > > > Our use case requires to persist a stream into system like HBase.
> > The
> > > > > > stream data is in format of <timestamp, value>. In other word,
> > > > timestamp
> > > > > is
> > > > > > used as rowkey. We want to explore whether HBase is suitable for
> > such
> > > > > kind
> > > > > > of data.
> > > > > >
> > > > > > The problem is that the domain of row key (or timestamp) grow
> > > > constantly.
> > > > > > For example, given 3 nodes, n1 n2 n3, they are resp. hosting row
> > key
> > > > > > partition [0,4], [5, 9], [10,12]. Currently it is the last node
> n3
> > > who
> > > > is
> > > > > > busy receiving upcoming writes (of row key 13 and 14). This
> > continues
> > > > > until
> > > > > > the region reaches max size 5 (that is, partition grows to
> [10,14])
> > > and
> > > > > > potentially splits.
> > > > > >
> > > > > > I am not expert on HBase split, but I am wondering after split,
> > will
> > > > the
> > > > > > new writes still go to node n3 (for [10,14]) or the write stream
> > can
> > > be
> > > > > > intelligently redirected to other less busy node, like n1.
> > > > > >
> > > > > > In case HBase can't do things like this, how easy is it to extend
> > > HBase
> > > > > for
> > > > > > such functionality? Thanks...
> > > > > > Yun
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Bing Jiang
> Tel：(86)134-2619-1361
> weibo: http://weibo.com/jiangbinglover
> BLOG: http://blog.sina.com.cn/jiangbinglover
> National Research Center for Intelligent Computing Systems
> Institute of Computing technology
> Graduate University of Chinese Academy of Science
>

Re: 答复: Possibility of using timestamp as row key in HBase

Reply via email to