Hi Arun, rowKeys. Hbase decide which data is stored which region by rowkeys. the RegionSplitter uses MD5 algorithm to generate region starting keys of MD5 checksum.
Talat 2015-05-12 15:48 GMT+03:00 Arun Patel <[email protected]>: > Thank you. This helps. > > So, when I pre-split regions with below command, SPLITALGO is creating the > rowkey boundaries for each region? > > create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'} > > I am failing to understand HexStringSplit. As per documentation,The format > of a HexStringSplit region boundary is the ASCII representation of an MD5 > checksum, or any other uniformly distributed hexadecimal value. > > My Question is MD5 Checksum of what? > > Regards, > Arun > > > > > > On Mon, May 11, 2015 at 8:57 PM, Nick Dimiduk <[email protected]> wrote: > >> On Mon, May 11, 2015 at 3:38 PM, Arun Patel <[email protected]> >> wrote: >> >> > 1) I have a 10 node HBase cluster. When I create a table in HBase, >> > how many regions will be allocated by default? >> >> >> In HBase, the number of region servers is orthogonal to table partitions. >> These two operational details are related but managed independently. >> >> I looked at the HBase Master UIand it seems regions are not allocated to >> > all the Regionservers by >> > default. How can I allocate the regions in all Region Servers? >> >> >> HBase will evenly balance the regions of all tables it's hosting across all >> region servers in the cluster. If you have fewer regions than region >> servers, some servers will have no regions to host. >> >> Basically, This distributes the data in a better way If I am using a slated >> > key. My requirement is to distribute the data across the cluster using >> > salted keys. But, Having few regions is a constraint? >> > >> >> You're moving in the right direction. The next step would be to split your >> table according to some prefix value, presumably related to your "salting" >> choice. This will depend on what value you're prepending to the row keys >> and the cardinality of those values. Apache Phoenix does this, for example, >> with a fixed byte prefix and an one pre-split per salt-byte value (i.e., 0, >> 1, 2, 3, ... 15). >> >> 2) How does the rowkey to region mapping works? In Cassandra, we have a >> > concept of assigning token range for each node. Rowkey will be assigned >> to >> > a node based on the token range. How does this work in HBase? >> >> >> HBase is ordered and range-partitioned. Basically, your row keys are sorted >> and region boundaries are determined at points within that range. So if you >> have rows 'a' - 'z', HBase will define regions as contiguous segments of >> this range, 'a' - 'f', and 'g' - 'k' for example. The range of a region is >> dictated primarily by the amount of data contained therein. When a region >> becomes too big, it will be split in half and two child regions are created >> (i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region splits, >> the children are independent and can be moved to other region servers. >> >> I explain a bit of this and more in my talk "HBase for Architects". I link >> to a video from my blog [0]. As Michael mentioned, there's more detail >> published in both our book [1], as well as our other books [2], [3]. >> >> Welcome to HBase ;) >> -n >> >> [0]: http://www.n10k.com/blog/hbase-for-architects-redux/ >> [1]: https://hbase.apache.org/book.html#regions.arch >> [2]: http://www.manning.com/dimidukkhurana/ >> [3]: http://shop.oreilly.com/product/0636920033943.do >> -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
