Thanks Ted for your inputs
I have write some algorithm to convert my some String to single char like #
$ ! etc and its my salt so based on these I know whats my salt
as my input data was so random and I need to know in advance what is my
rowkey (Hash like Md5 generates long string , which coz some performance
impact because my rowkey was getting log)

In my lab testing I found that number of region created but one region
start row ! was empty
As i observe i create my table with pre split table with these char and
data did't come which starts with !

is their any way to distribute data equally to all region and I know what
what is my salt is its fix like !@#$%

Thanks
Manjeet

On Sat, Sep 10, 2016 at 7:04 AM, Ted Yu <[email protected]> wrote:

> Given ingestion rate of 7GB / hour, you would have ~ 5000GB data per month.
> That's about 500 regions.
> You would run out of ASCII character in position of #.
>
> Since mobile number is personal identification information, it is not
> prudent to directly use it in row key.
> You can search for commonly accepted practice on the internet.
> If you use hash of mobile number, you would avoid hot spotting.
>
> More detail of your use would be helpful in providing better answer.
>
> Cheers
>
> On Fri, Sep 9, 2016 at 6:09 PM, Manjeet Singh <[email protected]>
> wrote:
>
> > Thanks Ted for links links will help to determine how region split what
> > should be the size etc which will really helpful
> > but can you correct me if I am not wrong does my understanding was
> correct
> > as I asked in trailing mail?
> > I know what will be the salt based on my Mobile number coming in my data
> > So assume for mobile number 9999999999 is #
> > so my rowkey is #_9999999999
> > As i know in advance what is my exact rowkey i can distribute my data on
> > cluster to avoid HOTSpoting and i want to distribute my data equally on
> > cluster
> > So it is mandatory condition to create table according to my splits?
> >
> > Thanks
> > Manjeet
> >
> > On Sat, Sep 10, 2016 at 6:26 AM, Ted Yu <[email protected]> wrote:
> >
> > > Please take a look at:
> > >
> > > http://hbase.apache.org/book.html#table_schema_rules_of_thumb
> > > http://hbase.apache.org/book.html#arch.regions.size
> > > http://hbase.apache.org/book.html#ops.capacity.regions
> > > http://hbase.apache.org/book.html#ops.capacity.regions.total
> > >
> > > On Fri, Sep 9, 2016 at 5:35 PM, Manjeet Singh <
> > [email protected]>
> > > wrote:
> > >
> > > > Yeah its in weekdays
> > > > Yeah default is 10 gb so what is the way/forumla to knw what shuld be
> > the
> > > > size of RS
> > > > On 9 Sep 2016 19:03, "Ted Yu" <[email protected]> wrote:
> > > >
> > > > > Can you clarify whether the incoming data rate is for weekdays ?
> > > > >
> > > > > At 6-7 Gb /Hour, you need to set larger region size.
> > > > > Default is 10GB.
> > > > >
> > > > > If you know roughly how the key space would be filled, presplit
> your
> > > > table
> > > > > accordingly.
> > > > >
> > > > > On Thu, Sep 8, 2016 at 11:24 PM, Manjeet Singh <
> > > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All
> > > > > >
> > > > > > I have some basic question can anyone help me out
> > > > > >
> > > > > > Q1. this is my understanding To perform splitting  I need to
> create
> > > > table
> > > > > > like below
> > > > > > create 'test_table','c1', SPLITS=>['#", '!', '$'']
> > > > > >
> > > > > > and I have to design row key in this way
> > > > > > #_123456789
> > > > > > !_123456789
> > > > > > $_123456789
> > > > > >
> > > > > > so my data distributed on cluster
> > > > > >
> > > > > > My requirement is very simple I want to equally distributed data
> on
> > > > > regions
> > > > > > as per my rowkey only
> > > > > >
> > > > > > So please correct me if I am missing any thing?
> > > > > >
> > > > > >
> > > > > > Q2 If i have 5 regions on my each region server and I give 100 MB
> > > space
> > > > > by
> > > > > > using  hbase.hregion.max.filesize property
> > > > > >
> > > > > > what will happen when my all regions fill with 100 MB data
> > > > > > Please note I have cron job secluded on every weekend and my
> > Incoming
> > > > > data
> > > > > > rate is 6-7 Gb /Hour. so my region get filled very fast
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Manjeet
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > luv all
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > luv all
> >
>



-- 
luv all

Reply via email to