On Tue, Jun 14, 2011 at 1:41 AM, Shuja Rehman <[email protected]> wrote: > Well...There are couple of reasons > > 1- The data is coming from different regions of country and i want to > distribute the data w.r.t regions. e.g > RegionServer1-RegsionServer4 contain east region data only. > RegionServer2-RegionServer6 contain west region data only. >
Can you do this with a table per region? Otherwise, prefix the key w/ region. This won't be perfect in that the boundary won't be clean but perhaps sufficient? > 2- The cluster is combination of different machines w.r.t hardware (RAM, > Processor Speed,Number of Cores). Some tables are access frequently and some > access for fewer time so i want to place the most accessed tables on the > machines with highest RAM and processing speeds. e.g create table1, colFam1 > @10.10.10.2,10.10.10.3,10.10.10.10.4 (list of region servers) > In general, a heterogeneous cluster is probably going to cause you headache; rare has hbase run on a cluster that was not homogeneous so my guess is that you'll run into 'interesting' issues. Currently the levers are not exposed for manually balancing the cluster. Our balancer *should* do this for you factoring in the machine resources but currently it does not. One thing you could do is turn the balancer off and do the balancing yourself externally. You can move regions either via the shell or script. > 4- I need to implement different priority scanning so the highest priority > query should be serve through good machines and this can be done if i able > to place the priority data on good machines. e.g if time= busy hours then > place data at good region servers.else if time=night then place data at > normal servers. > > HBase will never let you do this. It won't scale. St.Ack
