RegionSplitPolicy only allows you to customize split point (row key). All rows above this split point will go to the first daughter region, below - to the second.
The answer on original question is - No, you can not have your custom policy based on a second part of a key. -Vlad On Fri, May 22, 2015 at 2:43 AM, Michael Segel <michael_se...@hotmail.com> wrote: > This is why I created HBASE-12853. > > So you don’t have to specify a custom split policy. > > Of course the simple solutions are often passed over because of NIH. ;-) > > To be blunt… You encapsulate the bucketing code so that you have a single > API in to HBase regardless of the type of storage underneath. > KISS is maintained and you stop people from attempting to do stupid > things. (cc’ing dev@hbase) As a product owner, (read PMC / committers) > you want to keep people from mucking about in the internals. While its > true that its open source, and you will have some who want to muck around, > you also have to consider the corporate users who need something that is > reliable and less customized so that its supportable. This is the vendor’s > dilemma. (hint Cloudera , Horton, IBM, MapR) You’re selling support to > HBase and if a customer starts to overload internals with their own code, > good luck in supporting it. This is why you do things like 12853 because > it makes your life easier. > > This isn’t a sexy solution. Its core engineering work. > > HTH > > -Mike > > > On May 22, 2015, at 4:22 AM, Shushant Arora <shushantaror...@gmail.com> > wrote: > > > > since custom split policy is based on second part i.e guid so key with > > first part as 2015-05-22 00:01:02 will be in which region how will that > be > > identified? > > > > > > On Fri, May 22, 2015 at 1:12 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > >> The custom split policy needs to respect the fact that timestamp is the > >> leading part of the rowkey. > >> > >> This would avoid the overlap you mentioned. > >> > >> Cheers > >> > >> > >> > >>> On May 21, 2015, at 11:55 PM, Shushant Arora < > shushantaror...@gmail.com> > >> wrote: > >>> > >>> guid change with every key, patterns is > >>> 2015-05-22 00:02:01#AB12EC77778888945 > >>> 2015-05-22 00:02:02#CD9870001234AB457 > >>> > >>> When we specify custom split algorithm , it may happen that keys of > same > >>> sorting order range say (1-7) lies in region R1 as well as in region > R2? > >>> Then how .META. table will make further lookups at read time, say I > >> search > >>> for key 3, then will it search in both the regions R1 and R2 ? > >>> > >>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >>>> > >>>> Does guid change with every key ? > >>>> > >>>> bq. use second part of key > >>>> > >>>> I don't think so. Suppose first row in the parent region is > >>>> '1432104178817#321'. After split, the first row in first daughter > region > >>>> would still be '1432104178817#321'. Right ? > >>>> > >>>> Cheers > >>>> > >>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora < > >> shushantaror...@gmail.com > >>>> wrote: > >>>> > >>>>> Can I avoid hotspot of region with custom region split policy in > hbase > >>>>>> 0.96 . > >>>>> > >>>>> Key is of the form timestamp#guid. > >>>>> So can I have custom region split policy and use second part of key > >> (i.e) > >>>>> guid as region split criteria and avoid hot spot?? > >>>> > >> > >