Thanks Ted for your inputs I have write some algorithm to convert my some String to single char like # $ ! etc and its my salt so based on these I know whats my salt as my input data was so random and I need to know in advance what is my rowkey (Hash like Md5 generates long string , which coz some performance impact because my rowkey was getting log)
In my lab testing I found that number of region created but one region start row ! was empty As i observe i create my table with pre split table with these char and data did't come which starts with ! is their any way to distribute data equally to all region and I know what what is my salt is its fix like !@#$% Thanks Manjeet On Sat, Sep 10, 2016 at 7:04 AM, Ted Yu <[email protected]> wrote: > Given ingestion rate of 7GB / hour, you would have ~ 5000GB data per month. > That's about 500 regions. > You would run out of ASCII character in position of #. > > Since mobile number is personal identification information, it is not > prudent to directly use it in row key. > You can search for commonly accepted practice on the internet. > If you use hash of mobile number, you would avoid hot spotting. > > More detail of your use would be helpful in providing better answer. > > Cheers > > On Fri, Sep 9, 2016 at 6:09 PM, Manjeet Singh <[email protected]> > wrote: > > > Thanks Ted for links links will help to determine how region split what > > should be the size etc which will really helpful > > but can you correct me if I am not wrong does my understanding was > correct > > as I asked in trailing mail? > > I know what will be the salt based on my Mobile number coming in my data > > So assume for mobile number 9999999999 is # > > so my rowkey is #_9999999999 > > As i know in advance what is my exact rowkey i can distribute my data on > > cluster to avoid HOTSpoting and i want to distribute my data equally on > > cluster > > So it is mandatory condition to create table according to my splits? > > > > Thanks > > Manjeet > > > > On Sat, Sep 10, 2016 at 6:26 AM, Ted Yu <[email protected]> wrote: > > > > > Please take a look at: > > > > > > http://hbase.apache.org/book.html#table_schema_rules_of_thumb > > > http://hbase.apache.org/book.html#arch.regions.size > > > http://hbase.apache.org/book.html#ops.capacity.regions > > > http://hbase.apache.org/book.html#ops.capacity.regions.total > > > > > > On Fri, Sep 9, 2016 at 5:35 PM, Manjeet Singh < > > [email protected]> > > > wrote: > > > > > > > Yeah its in weekdays > > > > Yeah default is 10 gb so what is the way/forumla to knw what shuld be > > the > > > > size of RS > > > > On 9 Sep 2016 19:03, "Ted Yu" <[email protected]> wrote: > > > > > > > > > Can you clarify whether the incoming data rate is for weekdays ? > > > > > > > > > > At 6-7 Gb /Hour, you need to set larger region size. > > > > > Default is 10GB. > > > > > > > > > > If you know roughly how the key space would be filled, presplit > your > > > > table > > > > > accordingly. > > > > > > > > > > On Thu, Sep 8, 2016 at 11:24 PM, Manjeet Singh < > > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > Hi All > > > > > > > > > > > > I have some basic question can anyone help me out > > > > > > > > > > > > Q1. this is my understanding To perform splitting I need to > create > > > > table > > > > > > like below > > > > > > create 'test_table','c1', SPLITS=>['#", '!', '$''] > > > > > > > > > > > > and I have to design row key in this way > > > > > > #_123456789 > > > > > > !_123456789 > > > > > > $_123456789 > > > > > > > > > > > > so my data distributed on cluster > > > > > > > > > > > > My requirement is very simple I want to equally distributed data > on > > > > > regions > > > > > > as per my rowkey only > > > > > > > > > > > > So please correct me if I am missing any thing? > > > > > > > > > > > > > > > > > > Q2 If i have 5 regions on my each region server and I give 100 MB > > > space > > > > > by > > > > > > using hbase.hregion.max.filesize property > > > > > > > > > > > > what will happen when my all regions fill with 100 MB data > > > > > > Please note I have cron job secluded on every weekend and my > > Incoming > > > > > data > > > > > > rate is 6-7 Gb /Hour. so my region get filled very fast > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > Manjeet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > luv all > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > luv all > > > -- luv all
