Hi Arun, Please read this document http://hbase.apache.org/book.html#rowkey.design i think it will be help figure out to rowkey design.
Talat On May 12, 2015 6:00 PM, "Arun Patel" <arunp.bigd...@gmail.com> wrote: > Thank you all. This info is really helpful. I have a follow up question > related to my use case. > > I need to create a table called as LOGS to log event info generated from > multiple services that I am calling. > > For each rowkey (a random UUID generated in this case), multiple services > are called and success or failure status is logged into LOGS table. > So, My data is something like this... > > Rowkey Filename Service Message > 312sdasd31244 file1 service1 success > 312sdasd31244 file1 service2 success > 312sdasd31244 file2 service1 failure: Reason for failure > .... > .. > . > 789sdfsf34234 file1 service1 success > 789sdfsf34234 file1 service2 success > 789sdfsf34234 file2 service3 failure: Reason for failure > ... > .. > . > > > This log info will be accessed by a polling service to track the progress > of Rowkey using REST API. > > So, Basically polling service will do GET on rowkey with filter on filename > something like this... > > get 'LOGS', '312sdasd31244', {FILTER => "ColumnPrefixFilter('file1')"} > > So, My question is how the rowkey to be designed? Salting may not help > because the access pattern is not random. I will be scanning a range of > rows. > > What are the other factors I need to consider to make this really effective > for this use case? > > Regards, > Arun > > > On Tue, May 12, 2015 at 9:58 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > Arun: > > See the following for details: > > > > http://hbase.apache.org/book.html#_determining_split_points > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html > > > > Cheers > > > > On Tue, May 12, 2015 at 6:11 AM, Talat Uyarer <ta...@uyarer.com> wrote: > > > > > Hi Arun, > > > > > > rowKeys. Hbase decide which data is stored which region by rowkeys. > > > the RegionSplitter uses MD5 algorithm to generate region starting keys > > > of MD5 checksum. > > > > > > Talat > > > > > > > > > > > > 2015-05-12 15:48 GMT+03:00 Arun Patel <arunp.bigd...@gmail.com>: > > > > Thank you. This helps. > > > > > > > > So, when I pre-split regions with below command, SPLITALGO is > creating > > > the > > > > rowkey boundaries for each region? > > > > > > > > create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'} > > > > > > > > I am failing to understand HexStringSplit. As per documentation,The > > > format > > > > of a HexStringSplit region boundary is the ASCII representation of an > > MD5 > > > > checksum, or any other uniformly distributed hexadecimal value. > > > > > > > > My Question is MD5 Checksum of what? > > > > > > > > Regards, > > > > Arun > > > > > > > > > > > > > > > > > > > > > > > > On Mon, May 11, 2015 at 8:57 PM, Nick Dimiduk <ndimi...@gmail.com> > > > wrote: > > > > > > > >> On Mon, May 11, 2015 at 3:38 PM, Arun Patel < > arunp.bigd...@gmail.com> > > > >> wrote: > > > >> > > > >> > 1) I have a 10 node HBase cluster. When I create a table in > HBase, > > > >> > how many regions will be allocated by default? > > > >> > > > >> > > > >> In HBase, the number of region servers is orthogonal to table > > > partitions. > > > >> These two operational details are related but managed independently. > > > >> > > > >> I looked at the HBase Master UIand it seems regions are not > allocated > > to > > > >> > all the Regionservers by > > > >> > default. How can I allocate the regions in all Region Servers? > > > >> > > > >> > > > >> HBase will evenly balance the regions of all tables it's hosting > > across > > > all > > > >> region servers in the cluster. If you have fewer regions than region > > > >> servers, some servers will have no regions to host. > > > >> > > > >> Basically, This distributes the data in a better way If I am using a > > > slated > > > >> > key. My requirement is to distribute the data across the cluster > > using > > > >> > salted keys. But, Having few regions is a constraint? > > > >> > > > > >> > > > >> You're moving in the right direction. The next step would be to > split > > > your > > > >> table according to some prefix value, presumably related to your > > > "salting" > > > >> choice. This will depend on what value you're prepending to the row > > keys > > > >> and the cardinality of those values. Apache Phoenix does this, for > > > example, > > > >> with a fixed byte prefix and an one pre-split per salt-byte value > > > (i.e., 0, > > > >> 1, 2, 3, ... 15). > > > >> > > > >> 2) How does the rowkey to region mapping works? In Cassandra, we > > have a > > > >> > concept of assigning token range for each node. Rowkey will be > > > assigned > > > >> to > > > >> > a node based on the token range. How does this work in HBase? > > > >> > > > >> > > > >> HBase is ordered and range-partitioned. Basically, your row keys are > > > sorted > > > >> and region boundaries are determined at points within that range. So > > if > > > you > > > >> have rows 'a' - 'z', HBase will define regions as contiguous > segments > > of > > > >> this range, 'a' - 'f', and 'g' - 'k' for example. The range of a > > region > > > is > > > >> dictated primarily by the amount of data contained therein. When a > > > region > > > >> becomes too big, it will be split in half and two child regions are > > > created > > > >> (i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region > > splits, > > > >> the children are independent and can be moved to other region > servers. > > > >> > > > >> I explain a bit of this and more in my talk "HBase for Architects". > I > > > link > > > >> to a video from my blog [0]. As Michael mentioned, there's more > detail > > > >> published in both our book [1], as well as our other books [2], [3]. > > > >> > > > >> Welcome to HBase ;) > > > >> -n > > > >> > > > >> [0]: http://www.n10k.com/blog/hbase-for-architects-redux/ > > > >> [1]: https://hbase.apache.org/book.html#regions.arch > > > >> [2]: http://www.manning.com/dimidukkhurana/ > > > >> [3]: http://shop.oreilly.com/product/0636920033943.do > > > >> > > > > > > > > > > > > -- > > > Talat UYARER > > > Websitesi: http://talat.uyarer.com > > > Twitter: http://twitter.com/talatuyarer > > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 > > > > > >