Re: Creating HBase table with presplits

Saad Mufti Fri, 02 Dec 2016 05:49:15 -0800

One way to do this without knowing your data (still need some idea of size
of keyspace) is to prepend a fixed numeric prefix from a suitable range
based on a good hash like MD5. For example, let us say you can predict your
data will fit in about 1024 regions. You can decide to prepend a prefix
from 0000 to 1024 to all you keys based on a suitable hash.

The pros:

1. you get to pre-split without knowing your keyspace
2. very hard if not impossible for unknown data providers to send you data
in some order that generates hotspots (unless of course the same key is
repeated over and over, still have to watch out for that)

The cons:

1. lose the ability to do scan in "natural" sorted order of your keyspace
as that order is not preserved anymore in HBase
2. if you miscalculate your keyspace size by a lot, you are stuck with the
hash function and range you selected even if you later get more regions
unless you're willing to do complete migration to a new table

Hope above helps.

----
Saad

On Tue, Nov 29, 2016 at 4:28 AM, Sachin Jain <[email protected]>
wrote:

> Thanks Dave for your suggestions!
> Will let you know if I find some approach to tackle this situation.
>
> Regards
>
> On Mon, Nov 28, 2016 at 9:05 PM, Dave Latham <[email protected]> wrote:
>
> > If you truly have no way to predict anything about the distribution of
> your
> > data across the row key space, then you are correct that there is no way
> to
> > presplit your regions in an effective way.  Either you need to make some
> > starting guess, such as a small number of uniform splits, or wait until
> you
> > have some information about what the data will look like.
> >
> > Dave
> >
> > On Mon, Nov 28, 2016 at 12:42 AM, Sachin Jain <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I was going though pre-splitting a table article [0] and it is
> mentioned
> > > that it is generally best practice to presplit your table. But don't we
> > > need to know the data in advance in order to presplit it.
> > >
> > > Question: What should be the best practice when we don't know what data
> > is
> > > going to be inserted into HBase. Essentially I don't know the key range
> > so
> > > if I specify wrong splits, then either first or last split can be a hot
> > > region in my system.
> > >
> > > [0]: https://hbase.apache.org/book.html#rowkey.regionsplits
> > >
> > > Thanks
> > > -Sachin
> > >
> >
>

Re: Creating HBase table with presplits

Reply via email to