Thanks Saad!!

This is exactly similar to what I had planned to implement i.e to map your
known keyspack to known keyspace by using a hash algorithm like MD5. Then
split the table. Thanks once again!!


On Fri, Dec 2, 2016 at 7:18 PM, Saad Mufti <[email protected]> wrote:

> Forgot to mention in above example you would presplit into 1024 regions,
> starting from "0000" to "1023" (start keys).
>
> Cheers.
>
> ----
> Saad
>
>
> On Fri, Dec 2, 2016 at 8:47 AM, Saad Mufti <[email protected]> wrote:
>
> > One way to do this without knowing your data (still need some idea of
> size
> > of keyspace) is to prepend a fixed numeric prefix from a suitable range
> > based on a good hash like MD5. For example, let us say you can predict
> your
> > data will fit in about 1024 regions. You can decide to prepend a prefix
> > from 0000 to 1024 to all you keys based on a suitable hash.
> >
> > The pros:
> >
> > 1. you get to pre-split without knowing your keyspace
> > 2. very hard if not impossible for unknown data providers to send you
> data
> > in some order that generates hotspots (unless of course the same key is
> > repeated over and over, still have to watch out for that)
> >
> > The cons:
> >
> > 1. lose the ability to do scan in "natural" sorted order of your keyspace
> > as that order is not preserved anymore in HBase
> > 2. if you miscalculate your keyspace size by a lot, you are stuck with
> the
> > hash function and range you selected even if you later get more regions
> > unless you're willing to do complete migration to a new table
> >
> > Hope above helps.
> >
> > ----
> > Saad
> >
> >
> > On Tue, Nov 29, 2016 at 4:28 AM, Sachin Jain <[email protected]>
> > wrote:
> >
> >> Thanks Dave for your suggestions!
> >> Will let you know if I find some approach to tackle this situation.
> >>
> >> Regards
> >>
> >> On Mon, Nov 28, 2016 at 9:05 PM, Dave Latham <[email protected]>
> wrote:
> >>
> >> > If you truly have no way to predict anything about the distribution of
> >> your
> >> > data across the row key space, then you are correct that there is no
> >> way to
> >> > presplit your regions in an effective way.  Either you need to make
> some
> >> > starting guess, such as a small number of uniform splits, or wait
> until
> >> you
> >> > have some information about what the data will look like.
> >> >
> >> > Dave
> >> >
> >> > On Mon, Nov 28, 2016 at 12:42 AM, Sachin Jain <
> [email protected]>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I was going though pre-splitting a table article [0] and it is
> >> mentioned
> >> > > that it is generally best practice to presplit your table. But don't
> >> we
> >> > > need to know the data in advance in order to presplit it.
> >> > >
> >> > > Question: What should be the best practice when we don't know what
> >> data
> >> > is
> >> > > going to be inserted into HBase. Essentially I don't know the key
> >> range
> >> > so
> >> > > if I specify wrong splits, then either first or last split can be a
> >> hot
> >> > > region in my system.
> >> > >
> >> > > [0]: https://hbase.apache.org/book.html#rowkey.regionsplits
> >> > >
> >> > > Thanks
> >> > > -Sachin
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to