Re: Regions and Rowkeys

Talat Uyarer Tue, 12 May 2015 06:13:41 -0700

Hi Arun,

rowKeys. Hbase decide which data is stored which region by rowkeys.
the RegionSplitter uses MD5 algorithm to generate region starting keys
of MD5 checksum.


Talat



2015-05-12 15:48 GMT+03:00 Arun Patel <[email protected]>:
> Thank you.  This helps.
>
> So, when I pre-split regions with below command, SPLITALGO is creating the
> rowkey boundaries for each region?
>
> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
>
> I am failing to understand HexStringSplit.  As per documentation,The format
> of a HexStringSplit region boundary is the ASCII representation of an MD5
> checksum, or any other uniformly distributed hexadecimal value.
>
> My Question is MD5 Checksum of what?
>
> Regards,
> Arun
>
>
>
>
>
> On Mon, May 11, 2015 at 8:57 PM, Nick Dimiduk <[email protected]> wrote:
>
>> On Mon, May 11, 2015 at 3:38 PM, Arun Patel <[email protected]>
>> wrote:
>>
>> > 1) I have a 10 node HBase cluster.  When I create a table in HBase,
>> > how many regions will be allocated by default?
>>
>>
>> In HBase, the number of region servers is orthogonal to table partitions.
>> These two operational details are related but managed independently.
>>
>> I looked at the HBase Master UIand it seems regions are not allocated to
>> > all the Regionservers by
>> > default.  How can I allocate the regions in all Region Servers?
>>
>>
>> HBase will evenly balance the regions of all tables it's hosting across all
>> region servers in the cluster. If you have fewer regions than region
>> servers, some servers will have no regions to host.
>>
>> Basically, This distributes the data in a better way If I am using a slated
>> > key. My requirement is to distribute the data across the cluster using
>> > salted keys.  But, Having few regions is a constraint?
>> >
>>
>> You're moving in the right direction. The next step would be to split your
>> table according to some prefix value, presumably related to your "salting"
>> choice. This will depend on what value you're prepending to the row keys
>> and the cardinality of those values. Apache Phoenix does this, for example,
>> with a fixed byte prefix and an one pre-split per salt-byte value (i.e., 0,
>> 1, 2, 3, ... 15).
>>
>> 2) How does the rowkey to region mapping works?  In Cassandra, we have a
>> > concept of assigning token range for each node.  Rowkey will be assigned
>> to
>> > a node based on the token range.  How does this work in HBase?
>>
>>
>> HBase is ordered and range-partitioned. Basically, your row keys are sorted
>> and region boundaries are determined at points within that range. So if you
>> have rows 'a' - 'z', HBase will define regions as contiguous segments of
>> this range, 'a' - 'f', and 'g' - 'k' for example. The range of a region is
>> dictated primarily by the amount of data contained therein. When a region
>> becomes too big, it will be split in half and two child regions are created
>> (i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region splits,
>> the children are independent and can be moved to other region servers.
>>
>> I explain a bit of this and more in my talk "HBase for Architects". I link
>> to a video from my blog [0]. As Michael mentioned, there's more detail
>> published in both our book [1], as well as our other books [2], [3].
>>
>> Welcome to HBase ;)
>> -n
>>
>> [0]: http://www.n10k.com/blog/hbase-for-architects-redux/
>> [1]: https://hbase.apache.org/book.html#regions.arch
>> [2]: http://www.manning.com/dimidukkhurana/
>> [3]: http://shop.oreilly.com/product/0636920033943.do
>>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Regions and Rowkeys

Reply via email to